{"id":79,"date":"2025-10-06T19:23:27","date_gmt":"2025-10-06T17:23:27","guid":{"rendered":"https:\/\/cornerstone.web.roma2.infn.it\/?p=79"},"modified":"2025-11-01T19:29:48","modified_gmt":"2025-11-01T18:29:48","slug":"from-data-to-predictions-the-machine-learning-pipeline","status":"publish","type":"post","link":"https:\/\/cornerstone.web.roma2.infn.it\/?p=79","title":{"rendered":"From Data to Predictions: the Machine Learning Pipeline"},"content":{"rendered":"\n<p><strong>Building on a Strong Foundation<\/strong><\/p>\n\n\n\n<p>Exciting progress from the CORNERSTONE team! <strong>Dr. Simone Chierichini<\/strong> has developed a complete machine learning pipeline that transforms the carefully curated dataset prepared by Dr. Stefano Scardigli into a powerful solar flare forecasting system. This work exemplifies the synergistic collaboration at the heart of the CORNERSTONE project\u2014where expert data preparation meets cutting-edge artificial intelligence.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Power of Clean Data Meets Advanced AI<\/h2>\n\n\n\n<p>Remember Dr. Scardigli&#8217;s meticulous work cleaning and preparing 15 years of solar observations? That foundation has proven invaluable. Dr. Chierichini built his machine learning system directly on top of these refined SHARP datasets, demonstrating how quality data preparation enables advanced AI applications.<\/p>\n\n\n\n<p>The synergy is clear: Scardigli provided the &#8220;what to learn from&#8221; (clean, reliable measurements of solar magnetic fields), and Chierichini built the &#8220;how to learn&#8221; (sophisticated AI algorithms that can recognize patterns leading to solar flares).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Understanding the Technology: Transformers for Time Series<\/h2>\n\n\n\n<p>Dr. Chierichini employed <strong>Transformer-based architectures<\/strong>\u2014a revolutionary AI technology that has transformed how machines understand sequences and patterns. While Transformers became famous for powering language models like ChatGPT, they&#8217;re equally powerful for analyzing time-series data like solar observations.<\/p>\n\n\n\n<p><strong>What makes Transformers special?<\/strong><\/p>\n\n\n\n<p>Unlike older approaches that process data points one-by-one (like reading a book word-by-word), Transformers use an &#8220;attention mechanism&#8221; that can simultaneously look at all data points and identify which ones are most important for making predictions. Imagine being able to instantly recognize that a specific magnetic field configuration from 12 hours ago is crucial for predicting a flare, while filtering out less relevant measurements\u2014that&#8217;s the power of attention.<\/p>\n\n\n\n<p>For solar flare prediction, this is transformative. The Sun&#8217;s magnetic field evolves over hours and days, and certain patterns that emerge early can signal an impending flare. Transformers excel at capturing these long-range dependencies\u2014connections between observations separated by significant time periods\u2014which older models (like LSTMs) struggled to maintain.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A Complete, Modular Pipeline<\/h2>\n\n\n\n<p>Dr. Chierichini didn&#8217;t just build a model\u2014he created an entire infrastructure for solar flare forecasting research:<\/p>\n\n\n\n<p><strong>Dataset Management Framework<\/strong>: A specialized system that handles the complexities of SHARP time-series data, including:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Efficient loading and preprocessing of years of observations<\/li>\n\n\n\n<li>Strategies to prevent &#8220;data leakage&#8221; (ensuring the model doesn&#8217;t cheat by seeing future information)<\/li>\n\n\n\n<li>Solutions for class imbalance (far more non-flaring regions than flaring ones)<\/li>\n\n\n\n<li>Region-aware splitting that properly separates training, validation, and test data<\/li>\n<\/ul>\n\n\n\n<p><strong>Training Pipeline<\/strong>: Built in PyTorch (a leading AI framework), the system is modular and flexible\u2014allowing researchers to easily experiment with different:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Model architectures<\/li>\n\n\n\n<li>Training strategies<\/li>\n\n\n\n<li>Evaluation approaches<\/li>\n\n\n\n<li>Loss functions designed specifically for imbalanced datasets (like &#8220;curriculum focal loss&#8221;)<\/li>\n<\/ul>\n\n\n\n<p><strong>Rigorous Evaluation<\/strong>: Performance is assessed using metrics standard in space weather forecasting, including the True Skill Statistic (TSS) and Heidke Skill Score (HSS)\u2014measures that account for the rarity of major flares and provide meaningful assessment of prediction capability.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why This Matters<\/h2>\n\n\n\n<p>This pipeline represents a significant step toward operational solar flare forecasting. By combining Scardigli&#8217;s meticulously prepared datasets with Chierichini&#8217;s sophisticated machine learning infrastructure, CORNERSTONE is building the foundation for AI systems that could provide earlier, more accurate warnings of dangerous solar activity.<\/p>\n\n\n\n<p>The modular design means other researchers can build upon this work, testing new architectures and approaches without having to recreate the entire infrastructure. The careful attention to evaluation metrics ensures that improvements are real and meaningful for space weather applications.<\/p>\n\n\n\n<p>Together, these contributions\u2014from data preparation through model development\u2014demonstrate how interdisciplinary collaboration drives progress in space weather forecasting. The result is a reproducible, extensible system that moves us closer to protecting satellites, power grids, and astronauts from the Sun&#8217;s most powerful eruptions.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><em>CORNERSTONE is funded under MUR &#8211; PRIN 2022 PNRR (P2022RKXH9 &#8211; CUP: E53D23021410001)<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Building on a Strong Foundation Exciting progress from the CORNERSTONE team! Dr. Simone Chierichini has developed a complete machine learning pipeline that transforms the carefully curated dataset prepared by Dr. Stefano Scardigli into a powerful solar flare forecasting system. This work exemplifies the synergistic collaboration at the heart of the CORNERSTONE project\u2014where expert data preparation [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":80,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,4],"tags":[12,5],"class_list":["post-79","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data","category-project-action","tag-data","tag-flare"],"_links":{"self":[{"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/posts\/79","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=79"}],"version-history":[{"count":1,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/posts\/79\/revisions"}],"predecessor-version":[{"id":81,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/posts\/79\/revisions\/81"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=\/wp\/v2\/media\/80"}],"wp:attachment":[{"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=79"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=79"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cornerstone.web.roma2.infn.it\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=79"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}