{"id":76,"date":"2025-10-01T19:10:00","date_gmt":"2025-10-01T17:10:00","guid":{"rendered":"https:\/\/cornerstone.web.roma2.infn.it\/?p=76"},"modified":"2025-11-01T19:16:06","modified_gmt":"2025-11-01T18:16:06","slug":"ensuring-quality-data-refinement-work","status":"publish","type":"post","link":"https:\/\/cornerstone.web.roma2.infn.it\/?p=76","title":{"rendered":"Ensuring Quality: Data Refinement Work"},"content":{"rendered":"\n<p>An important update from the CORNERSTONE team: <strong>Dr. Stefano Scardigli<\/strong> has completed a critical analysis to ensure the reliability of our solar flare prediction dataset.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Challenge of &#8220;Clean&#8221; Data<\/h2>\n\n\n\n<p>Machine learning algorithms are only as good as the data they learn from. While we&#8217;ve assembled years of solar observations, not all measurements are created equal. Dr. Scardigli&#8217;s recent work focused on identifying and addressing quality issues that could compromise prediction accuracy.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What Was Discovered?<\/h3>\n\n\n\n<p>Through meticulous analysis of the magnetic field features (SHARP parameters) extracted from NASA&#8217;s Solar Dynamics Observatory data, Dr. Scardigli identified several critical issues:<\/p>\n\n\n\n<p><strong>Redundant Information<\/strong>: Some of the 17 magnetic field parameters were highly correlated with each other\u2014meaning they provided overlapping information. This redundancy can confuse machine learning algorithms and reduce prediction accuracy.<\/p>\n\n\n\n<p><strong>Parameter Criticalities<\/strong>: Certain measurements, particularly those related to the number of reliable pixels in an observation (CMASK parameter), showed variability that could create false correlations. These needed special normalization procedures.<\/p>\n\n\n\n<p><strong>Data Quality Variations<\/strong>: Analysis revealed differences between &#8220;near real-time&#8221; (NRT) data\u2014available quickly for operational forecasting\u2014and &#8220;definitive&#8221; (DEF) data\u2014the higher-quality version processed weeks later.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Solution: Cleaned Datasets<\/h3>\n\n\n\n<p>Working closely with the CORNERSTONE team and project coordination, Dr. Scardigli developed refined, &#8220;clean&#8221; versions of both the near real-time and definitive datasets. This work involved:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Removing redundant features that don&#8217;t add predictive value<\/li>\n\n\n\n<li>Implementing normalization strategies to handle problematic parameters<\/li>\n\n\n\n<li>Establishing quality filters to exclude unreliable observations<\/li>\n\n\n\n<li>Creating separate, optimized datasets for both operational (NRT) and research (DEF) applications<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Why This Matters<\/h3>\n\n\n\n<p>For machine learning applications in space weather forecasting, data quality is paramount. Predictions need to be both accurate and timely\u2014forecasters use near real-time data even though it has lower quality than definitive data, because operational forecasting requires immediate information.<\/p>\n\n\n\n<p>Dr. Scardigli&#8217;s quality control work ensures that:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Machine learning algorithms train on reliable, non-redundant information<\/li>\n\n\n\n<li>Prediction models can work with both real-time and high-quality retrospective data<\/li>\n\n\n\n<li>The dataset provides physically meaningful features without spurious correlations<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">The Path Forward<\/h3>\n\n\n\n<p>These cleaned datasets now serve as the foundation for developing and testing machine learning algorithms within CORNERSTONE. By investing time in rigorous data preparation and quality control, we&#8217;re building prediction systems that forecasters can trust for protecting critical infrastructure from solar storms.<\/p>\n\n\n\n<p>This meticulous attention to data quality exemplifies the importance of database handling expertise in modern scientific research\u2014where the ability to prepare and validate data is just as crucial as the algorithms that analyze it.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><em>CORNERSTONE is funded under MUR &#8211; PRIN 2022 PNRR (P2022RKXH9 &#8211; CUP: E53D23021410001)<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>An important update from the CORNERSTONE team: Dr. Stefano Scardigli has completed a critical analysis to ensure the reliability of our solar flare prediction dataset. The Challenge of &#8220;Clean&#8221; Data Machine learning algorithms are only as good as the data they learn from. While we&#8217;ve assembled years of solar observations, not all measurements are created [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":77,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,4],"tags":[12],"class_list":["post-76","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data","category-project-action","tag-data"],"_links":{"self":[{"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/posts\/76","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=76"}],"version-history":[{"count":1,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/posts\/76\/revisions"}],"predecessor-version":[{"id":78,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/posts\/76\/revisions\/78"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/media\/77"}],"wp:attachment":[{"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=76"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=76"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=76"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}