Explainable machine learning for Earth observation and modelling.
Ocean Sciences, Machine learning
Research area
Machine learning and its applications have experienced rapid growth in recent years. Achievements in natural language processing and high accuracy in classification tasks have fueled this interest. However, hard sciences still face significant limitations. One such example in Physics and its related fields, such as Physics Informed Neural Networks (PINNs). While these models often yield promising results, they tend to be overly complex and poorly engineered, resulting in limited applicability to large and intricate systems. This approach falls short of the fundamental principle of science, which requires the results to be explainable. To address this gap, we aim to develop a machine learning model specifically designed for data-driven discovery and solution of complex systems of equations. This model should be explainable from the ground up and capable of working with intricate systems, with a particular focus on oceanography, especially biogeochemical systems.
Project goals
The objective is to develop a novel machine learning model capable of reconstructing the symbolic representation of systems of differential equations and efficiently solving them, starting only from data and prior knowledge when feasible. To achieve this, the model should draw upon prior information when available, imposing constraints on the form of the equations or interactions between the involved components. Additionally, it should be flexible in exploring all possible solutions when only data is ready for use. One significant challenge faced by current models is their lack of explainability, which hinders their deployment in the hard sciences, where usually results are to be treated with the utmost care. Another crucial issue is that these architectures often struggle to handle large problems typical of complex systems. We believe that addressing these challenges will have a profound impact on the discovery of novel analytical models across various domains. Ultimately, our goal is to be able to manage any type of differential equation and any system’s size. Such a tool could assist in identifying missing elements in ongoing studies, suggesting new research paths, or, in the best case, completely uncovering new laws.
Computational approach
The main challenge is to develop a new interpretable and expressive machine learning model for solving complex differential equations. Striking a balance between interpretability and expressiveness is crucial, as existing approaches prioritize one over the other. To address this imbalance, an architecture that encodes prior knowledge, enforces structural constraints, and explores the solution space is necessary. Domain-specific inductive biases, such as conservation laws and sparsity constraints, should be integrated to ensure the model adheres to the underlying system’s properties. Novel training strategies, combining data-driven learning with unsupervised approaches, are also crucial. This includes physics-informed loss functions, unsupervised techniques, and optimizing model parameters. Ensuring scalability to large, high-dimensional systems requires advanced optimization techniques, adaptive sparsity constraints, and efficient GPU-accelerated computations, enabling the development of a generalizable machine learning model that interprets and scales effectively across diverse scientific domains, such as physics, biology, engineering, and climate modeling, while facilitating the discovery of novel differential equations and providing deeper insights into complex system dynamics.
This diagram illustrates the process that the entire project would follow. Measurements provide real-world data which, when processed with our machine learning model, outputs a set of equations that describe the system in detail.
Matteo Gallo
Istituto Nazionale di Oceanografia e di Geofisica Sperimentale
My name is Matteo Gallo, I hold a Bachelor’s degree in Physics and a Master’s degree in Physics of Complex Systems, awarded cum laude with an honorable mention. During my Master’s studies, I actively participated in the Machine Learning Journal Club and the Quantum Computing Journal Club, engaging in discussions and practical projects. I am also part of the Silicon Valley Fellowship, a program aimed at bridging the gap between startup founders and domain experts in artificial intelligence. Currently, I am a first-year PhD student in Data Science and Applied Artificial Intelligence, collaborating with the National Institute of Oceanography and Applied Geophysics (OGS) and the University of Trieste. My research focuses on building novel explainable machine learning models for applications in science.