How reproducible and reliable is geophysical research?
A review of the availability and accessibility of data and software for research published in journals
DOI:
https://doi.org/10.26443/seismica.v2i1.278Keywords:
reproducibility, data availability, FAIRAbstract
Geophysical research frequently makes use of agreed-upon methodologies, formally published software, and bespoke code to process and analyse data. The reliability and repeatability of these methods is vital in maintaining the integrity of research findings and thereby avoiding the dissemination of unreliable results. In recent years there has been increased attention on aspects of reproducibility, which includes data availability, across scientific disciplines. This review considers aspects of reproducibility of geophysical studies relating to their publication in peer reviewed journals. For 100 geophysics journals it considers the extent to which reproducibility in geophysics is the focus of published literature. For 20 geophysical journals it considers a) journal policies on the requirements for providing code, software, and data for submission; and b) the availability of data and software associated for 200 published journal articles. The findings show that: 1) between 1991 and 2021 there were 72 articles with reproducibility in the title and 417 with reliability, with an overall increase in the number of articles with reproducibility or reliability as the subject over the same period; 2) while 60% of journals have a definition of research data, only 20% of journals have a requirement for a data availability statement; and 3) despite ~86% of sampled journal articles including a data availability statement, only 54% of articles have the original data accessible via data repositories or web servers, and only 49% of articles name software used. It is suggested that despite journals and authors working towards improving the availability of data and software, frequently they are not identified, or easily accessible, therefore limiting the possibility of reproducing studies.
References
AGU. (n.d.). Data and Software for Authors. https://www.agu.org/Publish-with-AGU/Publish/Author-Resources/Data-and-Software-for-Authors#availability
American Journal of Political Science. (2019). A.J.P.S. Verification Policy. In American Journal of Political Science. https://ajps.org/ajps-verification-policy/
Arnold, B., Bowler, L., Gibson, S., Herterich, P., Higman, R., Krystalli, A., Morley, A., O’Reilly, M., & Whitaker, K. (2019). The Turing Way: A handbook for reproducible data science. Zenodo. https://doi.org/10.5281/zenodo.3233853
Behnke, J., Mitchell, A., & Ramapriyan, H. (2019). NASA’s Earth Observing Data and Information System – Near-Term Challenges. 18(1), 1. https://doi.org/10.5334/dsj-2019-040
Beyreuther, M., Barsch, R., Krischer, L., Megies, T., Behr, Y., & Wassermann, J. (2010). ObsPy: A Python Toolbox for Seismology. Seismological Research Letters, 81(3), 530–533. https://doi.org/10.1785/gssrl.81.3.530
Boeker, M., Vach, W., & Motschall, E. (2013). Google Scholar as replacement for systematic literature searches: Good relative recall and precision are not enough. BMC Medical Research Methodology, 13(1), 131. https://doi.org/10.1186/1471-2288-13-131
Borgman, C. L. (2010). Research Data: Who Will Share What, with Whom, When, and Why? SSRN Electronic Journal. https://doi.org/10.2139/ssrn.1714427
British Geophysical Association. (2014). What is geophysics? https://geophysics.org.uk/what-is-geophysics/
Caelleigh, A. S. (1993). Role of the journal editor in sustaining integrity in research. Academic Medicine, 68(9), 23–29. https://doi.org/10.1097/00001888-199309000-00030
Carr, T. R., Buchanan, R. C., Adkins-Heljeson, D., Mettille, T. D., & Sorensen, J. (1997). The future of scientific communication in the earth sciences: The impact of the internet. Computers & Geosciences, 23(5), 503–512. https://doi.org/10.1016/S0098-3004(97)00032-0
Childe, S. J. (2006). What is the role of a research journal? Production Planning & Control, 17(5), 439–439. https://doi.org/10.1080/09537280600888862
Christensen, G., Dafoe, A., Miguel, E., Moore, D. A., & Rose, A. K. (2019). A study of the impact of data sharing on article citations using journal policies as a natural experiment. PLOS ONE, 14(12), 225883. https://doi.org/10.1371/journal.pone.0225883
de Groot, P., & Bril, B. (2005). The open source model in geosciences and OpendTect in particular. In SEG Technical Program Expanded Abstracts 2005 (pp. 802–805). Society of Exploration Geophysicists. https://doi.org/10.1190/1.2148280
Dembe, A. E., Partridge, J. S., & Geist, L. C. (2011). Statistical software applications used in health services research: Analysis of published studies in the U.S. BMC Health Services Research, 11(1), 252. https://doi.org/10.1186/1472-6963-11-252
European Commission. (2016). G20 Leaders’ Communique Hangzhou Summit. https://ec.europa.eu/commission/presscorner/detail/en/STATEMENT_16_2967
Evangelou, E., Trikalinos, T. A., & Ioannidis, J. P. (2005). Unavailability of online supplementary scientific information from articles published in major journals. The FASEB Journal, 19(14), 1943–1944. https://doi.org/10.1096/fj.05-4784lsf
Figshare. (n.d.). Figshare API User Documentation. https://doi.org/10.6084/m9.figshare.4880372.v2
Geophysics. (n.d.). GEOPHYSICS instructions to authors. https://library.seg.org/page/gpysa7/ifa/instructions
Glynn, E., Fitzgerald, B., & Exton, C. (2005). Commercial adoption of open source software: an empirical study. 2005 International Symposium on Empirical Software Engineering, 2005, 10 pp.-. https://doi.org/10.1109/ISESE.2005.1541831
Gomes, D. G. E., Pottier, P., Crystal-Ornelas, R., Hudgins, E. J., Foroughirad, V., Sánchez-Reyes, L. L., Turba, R., Martinez, P. A., Moreau, D., Bertram, M., Smout, C., & Gaynor, K. (2022). Why don’t we share data and code? MetaArXiv. https://doi.org/10.31222/osf.io/gaj43
Goodman, S. N., Fanelli, D., & Ioannidis, J. P. (2016). What does research reproducibility mean? Science Translational Medicine, 8(341), 341 12-341 12. https://doi.org/10.1126/scitranslmed.aaf5027
Hager, B. H., & Clayton, R. W. (1989). Constraints on the structure of mantle convection using seismic observations, flow models, and the geoid. https://resolver.caltech.edu/CaltechAUTHORS:20121002-141328164
Hamman, J. (2017). xarray: N-D labeled Arrays and Datasets in Python (Vol. 5). https://doi.org/10.5334/jors.148
Harvey, M. J., Mason, N. J., & Rzepa, H. S. (2014). Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks. ACS Publications; American Chemical Society. https://doi.org/10.1021/ci500302p
Harzing, A. W. (2007). Publish or Perish. https://harzing.com/resources/publish-or-perish
Harzing, A. W. (2010). The publish or perish book. Tarma Software Research Pty Limited Melbourne.
Hauge, Ø., Ayala, C., & Conradi, R. (2010). Adoption of open source software in software-intensive organizations – A systematic literature review. Information and Software Technology, 52(11), 1133–1154. https://doi.org/10.1016/j.infsof.2010.05.008
Hodson, S., Jones, S., Collins, S., Genova, F., Harrower, N., Laaksonen, L., Mietchen, D., Petrauskaité, R., & Wittenburg, P. (2018). Turning FAIR data into reality [Techreport]. https://doi.org/10.2777/1524
Houtkoop, B. L., Chambers, C., Macleod, M., Bishop, D. V., Nichols, T. E., & Wagenmakers, E. J. (2018). Data sharing in psychology: A survey on barriers and preconditions. Advances in Methods and Practices in Psychological Science, 1(1), 70–85. https://doi.org/10.1177/2515245917751886
Ireland, M. (2022). Reproducibility in Geophysics. https://doi.org/10.25405/data.ncl.21564381.v1
Jun, H., & Cho, Y. (2022). Repeatability enhancement of time-lapse seismic data via a convolutional autoencoder. Geophysical Journal International, 228(2), 1150–1170. https://doi.org/10.1093/gji/ggab397
Konkol, M., Kray, C., & Pfeiffer, M. (2019). Computational reproducibility in geoscientific papers: Insights from a series of studies with geoscientists and a reproduction study. International Journal of Geographical Information Science, 33(2), 408–429. https://doi.org/10.1080/13658816.2018.1508687
Lepak, D. (2009). Editor’s Comments: What is Good Reviewing? Academy of Management Review, 34(3), 375–381. https://doi.org/10.5465/amr.2009.40631320
McCullough, B. D., & Heiser, D. A. (2008). On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis, 52(10), 4570–4578. https://doi.org/10.1016/j.csda.2008.03.004
McCullough, B. D., & Wilson, B. (2002). On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel XP. Computational Statistics & Data Analysis, 40(4), 713–721. https://doi.org/10.1016/S0167-9473(02)00095-6
McCullough, B. D., & Wilson, B. (2005). On the accuracy of statistical procedures in Microsoft Excel 2003. Computational Statistics & Data Analysis, 49(4), 1244–1252. https://doi.org/10.1016/j.csda.2004.06.016
Mélard, G. (2014). On the accuracy of statistical procedures in Microsoft Excel 2010. Computational Statistics, 29(5), 1095–1128. https://doi.org/10.1007/s00180-014-0482-5
Mesirov, J. P. (2010). Accessible Reproducible Research. Science, 327(5964), 415–416. https://doi.org/10.1126/science.1179653
Muenchow, J., Schäfer, S., & Krüger, E. (2019). Reviewing qualitative GIS research—Toward a wider usage of open‐source GIS and reproducible research practices. Geography Compass, 13(6), 12441. https://doi.org/10.1111/gec3.12441
National Academies of Sciences. (2016). Statistical challenges in assessing and fostering the reproducibility of scientific results: Summary of a workshop. https://doi.org/10.17226/21915
National Academies of Sciences. (2019). Understanding Reproducibility and Replicability. In Reproducibility and Replicability in Science. https://www.ncbi.nlm.nih.gov/books/NBK547546/
Nature. (2014). Journals unite for reproducibility. Nature, 515(7525). https://doi.org/10.1038/515007a
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, S., Chambers, C. D., Chin, G., & Christensen, G. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425. https://doi.org/10.1126/science.aab2374
Nüst, D., & Pebesma, E. (2021). Practical Reproducibility in Geography and Geosciences. Annals of the American Association of Geographers, 111(5), 1300–1310. https://doi.org/10.1080/24694452.2020.1806028
Oguntimilehin, A., & Ademola, E. O. (2014). A Review of Big Data Management, Benefits and Challenges. A Review of Big Data Management, Benefits and Challenges, 5(6), 6.
Oren, C., & Nowack, R. L. (2018). An overview of reproducible 3D seismic data processing and imaging using Madagascar. Geophysics, 83(2), 9–20. https://doi.org/10.1190/geo2016-0603.1
Pendlebury, D. A. (2009). The use and misuse of journal metrics and other citation indicators. Archivum Immunologiae et Therapiae Experimentalis, 57(1), 1–11. https://doi.org/10.1007/s00005-009-0008-y
Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLOS ONE, 2(3), 308. https://doi.org/10.1371/journal.pone.0000308
Poldrack, R. A., & Gorgolewski, K. J. (2014). Making big data open: Data sharing in neuroimaging. Nature Neuroscience, 17(11), 11. https://doi.org/10.1038/nn.3818
Pop, M., & Salzberg, S. L. (2015). Use and mis-use of supplementary material in science publications. BMC Bioinformatics, 16(1), 237. https://doi.org/10.1186/s12859-015-0668-z
Rallison, S. (2015). What are Journals for? Annals of The Royal College of Surgeons of England, 97(2), 89–91. https://doi.org/10.1308/003588414X14055925061397
Reese, R. J. (1965). Recent Applications of Digital Computers to Geophysical Problems. AAPG Bulletin, 49(7), 1089–1089. https://doi.org/10.1306/A66336EE-16C0-11D7-8645000102C1865D
Robinson, E. A., & Treitel, S. (2000). Geophysical signal analysis. Society of Exploration Geophysicists.
SCImago. (n.d.). SCImago Journal & Country Rank. In SCImago. https://www.scimagojr.com/aboutus.php
Starr, J., Castro, E., Crosas, M., Dumontier, M., Downs, R. R., Duerr, R., Haak, L. L., Haendel, M., Herman, I., Hodson, S., Hourclé, J., Kratz, J. E., Lin, J., Nielsen, L. H., Nurnberger, A., Proell, S., Rauber, A., Sacchi, S., Smith, A., & Clark, T. (2015). Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Computer Science, 1, 1. https://doi.org/10.7717/peerj-cs.1
Steventon, M. J., Jackson, C. A., Hall, M., Ireland, M. T., Munafo, M., & Roberts, K. J. (2022). Reproducibility in subsurface geoscience. Earth Science, Systems and Society, 12. https://doi.org/10.3389/esss.2022.10051
Tedersoo, L., Küngas, R., Oras, E., Köster, K., Eenmaa, H., Leijen, Ä., Pedaste, M., Raju, M., Astapova, A., Lukner, H., Kogermann, K., & Sepp, T. (2021). Data sharing practices and data availability upon request differ across scientific disciplines. Scientific Data, 8(1). https://doi.org/10.1038/s41597-021-00981-0
Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., Manoff, M., & Frame, M. (2011). Data Sharing by Scientists: Practices and Perceptions. PLOS ONE, 6(6), 21101. https://doi.org/10.1371/journal.pone.0021101
van Rooij, S. W. (2011). Higher education sub-cultures and open source adoption. Computers & Education, 57(1), 1171–1183. https://doi.org/10.1016/j.compedu.2011.01.006
Waage, M., Bünz, S., Landrø, M., Plaza-Faverola, A., & Waghorn, K. A. (2018). Repeatability of high-resolution 3D seismic data. Geophysics, 84(1), 75–94. https://doi.org/10.1190/geo2018-0099.1
Walker, R., Gill, S. P., Greenfield, C., McCaffrey, K., & Stephens, T. L. (2021). No demonstrated link between sea-level and eruption history at Santorini. In Earth arXiv. https://eartharxiv.org/repository/view/2638/
Wildman, G., & Lewis, E. (2022). Value of open data: A geoscience perspective. Geoscience Data Journal. https://doi.org/10.1002/gdj3.138
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Mark Ireland, Guillermo Algarabel, Michael Steventon, Marcus Munafò
This work is licensed under a Creative Commons Attribution 4.0 International License.