Distributed Deep Learning Pipeline for Seismo-volcanic data (Etna)
Solid Earth Sciences, Seismology
Research area
The project creates a distributed workflow operating on High Performance Computing platforms to process complex seismic waveforms recorded in volcanic areas. The use case was a dataset of waveforms recorded in the Etna area (Sicily) and tested on the Leonardo platform, with a possible extension to Galileo100. The workflow standardizes the waveforms into a uniform format and then applies a set of pre-trained Deep Learning models provided by the SeisBench toolbox (Woollam et al., 2022) to perform the P-S phases on the waveforms. The predicted picks are then linked to the predicted earthquakes by the GaMMA association tool (Zhu et al., 2022).
Project goals
This project enables data analysis of complex seismic waveforms in volcanic areas where the signals are influenced by multiple seismic sources mainly related to the presence of fluids and the brittle fractures under stress within the volcanic structure. Conventional picking methods and phase associators can encounter difficulties with such complex waveforms. The ultimate goal is to find an automatic way to distinguish between the two main families of seismic signals caused by Volcano-Tectonic earthquakes and Long-Period events using machine learning models and seismological techniques.
Computational approach
Seismological waveforms are recorded by multiple networks and occupy storage space in the order of terabytes annually, so this can be considered a big data problem. Processing them can be a computational and temporal challenge. The application of Deep Learning models in conjunction with the parallelisation strategies enabled by High Performance Computing architectures offer significant advantages in terms of acceleration and deliver remarkable results compared to operations performed by humans. However, a major challenge is still the classification of seismic phases and earthquakes, which currently requires comparison tests with labelled seismic truth datasets.
Key results
During this project, the automated workflow for seismic phase detection and association into earthquakes was developed and optimized for the Leonardo platform. The resulting speedup enabled processing of a large waveform dataset recorded in the Etna area and allowed benchmark tests of several deep learning models throughout the automated workflow. A method for classifying volcanic signals among the deep learning-predicted events was added to the workflow. This automatic classification further characterized the performance of the selected models, providing valuable insight into the influence of model architectures and training datasets on seismic processing and cross-domain inference for complex seismic areas such as Mount Etna. Ultimately, the positive results demonstrated the significant potential of human-supervised automated workflows in volcano monitoring.
Resource usage
(Group 1) The multi-GPU, multi-core architecture of the Leonardo platform was used to parallelize the loading and inference of input waveforms by the deep learning models for phase picking. This resulted in a speedup of up to approximately 4.5 times in the total runtimes of the automated workflow compared to single-core, non-parallelized executions.
What's next
The next step is to apply the acquired skills in HPC platforms and machine-learning automated tools to other seismic environments in the Italian territory, in accordance with my current research fellowship. I will again use the Cineca HPC resources for the significant speedup provided by its parallelized, multi-GPU architecture.
Andrea Carducci
Istituto Nazionale di Oceanografia e di Geofisica Sperimentale
I am an applied geologist and seismologist, have a PhD in Earthquake and Environmental Hazards and work as a PostDoc fellow at OGS in Trieste. In December 2024, I completed a Master's degree in High Performance Computing at the International Centre for Theoretical Physics in Trieste. My thesis was about creating a workflow for phase picking and association on CINECA's Leonardo platform. My current project deals with the application of deep learning models and unsupervised learning for the processing of seismological data.

