A Global-scale Database of Seismic Phases from Cloud-based Picking at Petabyte Scale

Authors

DOI:

https://doi.org/10.26443/seismica.v4i2.1738

Abstract

We present the first global-scale database of 4.3 billion P- and S-wave picks extracted from 1.3 PB continuous seismic data via a cloud-native workflow. Using cloud computing services on Amazon Web Services, we launched ~145,000 containerized jobs on continuous records from 47,354 stations spanning 2002-2025, completing in under three days. Phase arrivals were identified with a deep learning model, PhaseNet, through an open-source Python ecosystem for deep learning, SeisBench. To visualize and gain a global understanding of these picks, we present preliminary results about pick time series revealing Omori-law aftershock decay, seasonal variations linked to noise levels, and dense regional coverage that will enhance earthquake catalogs and machine-learning datasets. We provide all picks in a publicly queryable database, providing a powerful resource for researchers studying seismicity around the world. This report provides insights into the database and the underlying workflow, demonstrating the feasibility of petabyte-scale seismic data mining on the cloud and of providing intelligent data products to the community in an automated manner.

References

Allen, R. (1982). Automatic phase pickers: Their present use and future prospects. Bulletin of the Seismological Society of America, 72(6B), S225–S242. https://doi.org/https://doi.org/10.1785/BSSA07206B0225

Beyreuther, M., Barsch, R., Krischer, L., Megies, T., Behr, Y., & Wassermann, J. (2010). ObsPy: A Python toolbox for seismology. Seismological Research Letters, 81(3), 530–533. https://doi.org/https://doi.org/10.1785/gssrl.81.3.530

Bornstein, T., Lange, D., Münchmeyer, J., Woollam, J., Rietbrock, A., Barcheck, G., Grevemeyer, I., & Tilmann, F. (2024). PickBlue: Seismic phase picking for ocean bottom seismometers with deep learning. Earth and Space Science, 11(1), e2023EA003332. https://doi.org/https://doi.org/10.1029/2023EA003332

Gentemann, C. L., Holdgraf, C., Abernathey, R., Crichton, D., Colliander, J., Kearns, E. J., Panda, Y., & Signell, R. P. (2021). Science storms the cloud. AGU Advances, 2(2), e2020AV000354. https://doi.org/https://doi.org/10.1029/2020AV000354

Hibert, C., Mangeney, A., Grandjean, G., Baillard, C., Rivet, D., Shapiro, N. M., Satriano, C., Maggi, A., Boissier, P., Ferrazzini, V., & others. (2014). Automated identification, location, and volume estimation of rockfalls at Piton de la Fournaise volcano. Journal of Geophysical Research: Earth Surface, 119(5), 1082–1105. https://doi.org/https://doi.org/10.1002/2013JF002970

Journeau, C., Thomas, A., Abercrombie, R., Hirao, B., Toomey, D., Hooft, E., Liu, M., Barbot, S., & Kuna, V. (2025). OBS Data Mining and Earthquake Swarms Analysis Reveal the Complex Structure and Dynamics of the Blanco Fracture Zone [Techreport]. Copernicus Meetings. https://doi.org/https://doi.org/10.5194/egusphere-egu25-14331

Krauss, Z., Ni, Y., Henderson, S., & Denolle, M. (2023). Seismology in the cloud: guidance for the individual researcher. Seismica, 2(2). https://doi.org/https://doi.org/10.26443/seismica.v2i2.979

Lin, J.-T., Thomas, A. M., Bachelot, L., Toomey, D., Searcy, J., & Melgar, D. (2024). Detection of Hidden Low-Frequency Earthquakes in Southern Vancouver Island with Deep Learning. Seismica, 2(4). https://doi.org/https://doi.org/10.26443/seismica.v2i4.1134

Liu, T., Münchmeyer, J., Laurenti, L., Marone, C., de Hoop, M. V., & Dokmanić, I. (2024). SeisLM: a Foundation Model for Seismic Waveforms. ArXiv Preprint ArXiv:2410.15765. https://doi.org/https://doi.org/10.48550/arXiv.2410.15765

MacCarthy, J., Marcillo, O., & Trabant, C. (2020). Seismology in the Cloud: A New Streaming Workflow. Seismological Research Letters, 91(3), 1804–1812. https://doi.org/https://doi.org/10.1785/0220190357

McBrearty, I. W., & Beroza, G. C. (2023). Earthquake phase association with graph neural networks. Bulletin of the Seismological Society of America, 113(2), 524–547. https://doi.org/https://doi.org/10.1785/0120220182

Michelini, A., Cianetti, S., Gaviano, S., Giunchi, C., Jozinović, D., & Lauciani, V. (2021). INSTANCE–the Italian seismic dataset for machine learning. Earth System Science Data, 13(12), 5509–5544. https://doi.org/https://doi.org/10.5194/essd-13-5509-2021

Mousavi, S. M., Ellsworth, W. L., Zhu, W., Chuang, L. Y., & Beroza, G. C. (2020). Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nature Communications, 11(1), 3952. https://doi.org/https://doi.org/10.1038/s41467-020-17591-w

Mousavi, S. M., Sheng, Y., Zhu, W., & Beroza, G. C. (2019). STanford EArthquake Dataset (STEAD): A global data set of seismic signals for AI. IEEE Access, 7, 179464–179476. https://doi.org/https://doi.org/10.1109/ACCESS.2019.2947848

Mousavi, S. M., Zhu, W., Sheng, Y., & Beroza, G. C. (2019). CRED: A deep residual network of convolutional and recurrent units for earthquake signal detection. Scientific Reports, 9(1), 10267. https://doi.org/https://doi.org/10.1038/s41598-019-45748-1

Münchmeyer, J. (2024). PyOcto: A high-throughput seismic phase associator. Seismica, 3(1). https://doi.org/https://doi.org/10.26443/seismica.v3i1.1130

Münchmeyer, J., Giffard-Roisin, S., Malfante, M., Frank, W., Poli, P., Marsan, D., & Socquet, A. (2024). Deep learning detects uncataloged low-frequency earthquakes across regions. Seismica, 3(1). https://doi.org/https://doi.org/10.26443/seismica.v3i1.1185

Münchmeyer, J., Molina-Ormazabal, D., Marsan, D., Langlais, M., Baez, J.-C., Heit, B., González-Vidal, D., Moreno, M., Tilmann, F., Lange, D., & others. (2025). Characterizing the Atacama segment of the Chile subduction margin (24 S–31 S) with> 165,000 earthquakes. Journal of Geophysical Research: Solid Earth, 130(7), e2025JB031256. https://doi.org/https://doi.org/10.1029/2025JB031256

Münchmeyer, J., Woollam, J., Rietbrock, A., Tilmann, F., Lange, D., Bornstein, T., Diehl, T., Giunchi, C., Haslinger, F., Jozinović, D., & others. (2022). Which picker fits my data? A quantitative evaluation of deep learning based seismic pickers. Journal of Geophysical Research: Solid Earth, 127(1), e2021JB023499. https://doi.org/https://doi.org/10.1029/2021JB023499

Ni, Y., Denolle, M. A., Münchmeyer, J., Wang, Y., Feng, K.-F., Suarez, C. G. J., Thomas, A. M., Trabant, C., Hamilton, A., & Mencin, D. (2025). A Review of Cloud Computing and Storage in Seismology. Geophysical Journal International, ggaf322. https://doi.org/https://doi.org/10.1093/gji/ggaf322

Ni, Y., Hutko, A., Skene, F., Denolle, M., Malone, S., Bodin, P., Hartog, R., & Wright, A. (2023). Curated Pacific Northwest AI-ready Seismic Dataset. Seismica, 2(1). https://doi.org/https://doi.org/10.26443/seismica.v2i1.368

Norman, M., Kellen, V., Smallen, S., DeMeulle, B., Strande, S., Lazowska, E., Alterman, N., Fatland, R., Stone, S., Tan, A., Yelick, K., Van Dusen, E., & Mitchell, J. (2021). CloudBank: Managed Services to Simplify Cloud Access for Computer Science Research and Education. Practice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions. https://doi.org/https://doi.org/10.1145/3437359.3465586

Park, Y., Beroza, G. C., & Ellsworth, W. L. (2022). Basement Fault Activation before Larger Earthquakes in Oklahoma and Kansas. The Seismic Record, 2(3), 197–206. https://doi.org/https://doi.org/10.1785/0320220020

Perol, T., Gharbi, M. J., & Denolle, M. (2018). Convolutional neural network for earthquake detection and location. Science Advances, 4(2), e1700578. https://doi.org/https://doi.org/10.1126/sciadv.1700578

Retailleau, L., Saurel, J.-M., Zhu, W., Satriano, C., Beroza, G. C., Issartel, S., Boissier, P., Team, O., Team, O., & others. (2022). A wrapper to use a machine-learning-based algorithm for earthquake monitoring. Seismological Research Letters, 93(3), 1673–1682. https://doi.org/https://doi.org/10.1785/0220210279

Ross, Z. E., Meier, M.-A., Hauksson, E., & Heaton, T. H. (2018). Generalized seismic phase detection with deep learning. Bulletin of the Seismological Society of America, 108(5A), 2894–2901. https://doi.org/https://doi.org/10.1785/0120180080

Ross, Z. E., Meier, M.-A., Hauksson, E., & Heaton, T. H. (2020). P-wave arrival picking and first-motion polarity determination with deep learning. Journal of Geophysical Research: Solid Earth, 125(4), e2019JB018663. https://doi.org/https://doi.org/10.1029/2017JB015251

Ross, Z. E., Trugman, D. T., Hauksson, E., & Shearer, P. M. (2019). Searching for hidden earthquakes in Southern California. Science, 364(6442), 767–771. https://doi.org/https://doi.org/10.1126/science.aaw6888

Ross, Z. E., Yue, Y., Meier, M.-A., Hauksson, E., & Heaton, T. H. (2019). PhaseLink: A deep learning approach to seismic phase association. Journal of Geophysical Research: Solid Earth, 124(1), 856–869. https://doi.org/https://doi.org/10.1029/2018JB016674

Sun, W.-F., Pan, S.-Y., Huang, C.-M., Guan, Z.-K., Yen, I.-C., Ho, C.-W., Chi, T.-C., Ku, C.-S., Huang, B.-S., Fu, C.-C., & others. (2024). Deep learning-based earthquake catalog reveals the seismogenic structures of the 2022 MW 6.9 Chihshang earthquake sequence. Terrestrial, Atmospheric and Oceanic Sciences, 35(1), 5. https://doi.org/https://doi.org/10.1007/s44195-024-00063-9

Utsu, T. (1961). A statistical study on the occurrence of aftershocks. Geophys. Mag., 30, 521–605.

Walter, J. I., Ogwari, P., Thiel, A., Ferrer, F., & Woelfel, I. (2021). easyQuake: Putting machine learning to work for your regional seismic network or local earthquake study. Seismological Society of America, 92(1), 555–563. https://doi.org/https://doi.org/10.1785/0220200226

Wang, X., Liu, F., Su, R., Wang, Z., Bai, L., & Ouyang, W. (2025). SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model. ArXiv Preprint ArXiv:2502.19960. https://doi.org/https://doi.org/10.48550/arXiv.2502.19960

West, K., Lehmann, F., Bountris, V., Leser, U., Elkhatib, Y., & Thamsen, L. (2025). Exploring the Potential of Carbon-Aware Execution for Scientific Workflows. ArXiv Preprint ArXiv:2503.13705. https://doi.org/https://doi.org/10.48550/arXiv.2503.13705

Woollam, J., Münchmeyer, J., Tilmann, F., Rietbrock, A., Lange, D., Bornstein, T., Diehl, T., Giunchi, C., Haslinger, F., Jozinović, D., & others. (2022). SeisBench—A toolbox for machine learning in seismology. Seismological Society of America, 93(3), 1695–1709. https://doi.org/https://doi.org/10.1785/0220210324

Yeck, W. L., Patton, J. M., Ross, Z. E., Hayes, G. P., Guy, M. R., Ambruz, N. B., Shelly, D. R., Benz, H. M., & Earle, P. S. (2021). Leveraging deep learning in global 24/7 real-time earthquake monitoring at the National Earthquake Information Center. Seismological Society of America, 92(1), 469–480. https://doi.org/https://doi.org/10.1785/0220200178

Yu, E., Bhaskaran, A., Chen, S., Ross, Z. E., Hauksson, E., & Clayton, R. W. (2021). Southern California Earthquake Data Now Available in the AWS Cloud. Seismological Research Letters, 92(5), 3238–3247. https://doi.org/https://doi.org/10.1785/0220210039

Zawacki, E. E., Bendick, R., & Woodward, R. L. (2023). Advancing geophysics: IRIS and UNAVCO merge to form EarthScope Consortium. Wiley Online Library. https://doi.org/https://doi.org/10.1029/2023CN000227

Zhang, M., Liu, M., Feng, T., Wang, R., & Zhu, W. (2022). LOC-FLOW: An end-to-end machine learning-based high-precision earthquake location workflow. Seismological Society of America, 93(5), 2426–2438. https://doi.org/https://doi.org/10.1785/0220220019

Zhang, X., & Zhang, M. (2024). Universal neural networks for real-time earthquake early warning trained with generalized earthquakes. Communications Earth & Environment, 5(1), 528. https://doi.org/https://doi.org/10.1038/s43247-024-01718-8

Zhong, Y., & Tan, Y. J. (2024). Deep-Learning-Based Phase Picking for Volcano-Tectonic and Long-Period Earthquakes. Geophysical Research Letters, 51(12), e2024GL108438. https://doi.org/https://doi.org/10.1029/2024GL108438

Zhu, W., & Beroza, G. C. (2019). Phasenet: a deep-neural-network-based seismic arrival time picking method. Geophysical Journal International, 216(1), 261–273. https://doi.org/https://doi.org/10.1093/gji/ggy423

Zhu, W., Hou, A. B., Yang, R., Datta, A., Mousavi, S. M., Ellsworth, W. L., & Beroza, G. C. (2023). QuakeFlow: a scalable machine-learning-based earthquake monitoring workflow with cloud computing. Geophysical Journal International, 232(1), 684–693. https://doi.org/https://doi.org/10.1093/gji/ggac355

Zhu, W., McBrearty, I. W., Mousavi, S. M., Ellsworth, W. L., & Beroza, G. C. (2022). Earthquake phase association using a Bayesian Gaussian mixture model. Journal of Geophysical Research: Solid Earth, 127(5), e2021JB023249. https://doi.org/https://doi.org/10.1029/2021JB023249

Zhu, W., Wang, H., Rong, B., Yu, E., Zuzlewski, S., Tepp, G., Taira, T., Marty, J., Husker, A., & Allen, R. M. (2025). California Earthquake Dataset for Machine Learning and Cloud Computing. https://doi.org/https://doi.org/10.48550/arXiv.2502.11500

Downloads

Published

2025-09-06

How to Cite

Ni, Y., Denolle, M., Thomas, A., Hamilton, A., Münchmeyer, J., Wang, Y., Bachelot, L., Trabant, C., & Mencin, D. (2025). A Global-scale Database of Seismic Phases from Cloud-based Picking at Petabyte Scale. Seismica, 4(2). https://doi.org/10.26443/seismica.v4i2.1738

Issue

Section

Reports (excl. Fast Reports)