seisio: a fast, efficient geophysical data architecture for
TRANSCRIPT
SeisIO: a fast, efficient geophysical data architecture for1
the Julia language2
Joshua P. Jones1∗, Kurama Okubo2, Tim Clements2, and Marine A. Denolle23
14509 NE Sumner St., Portland, OR, USA4
2Department of Earth and Planetary Sciences, Harvard University, MA, USA5
∗Corresponding author: Joshua P. Jones ([email protected])6
1
Abstract7
SeisIO for the Julia language is a new geophysical data framework that combines the intuitive8
syntax of a high-level language with performance comparable to FORTRAN or C. Benchmark9
comparisons with recent versions of popular programs for seismic data download and analysis10
demonstrate significant improvements in file read speed and orders-of-magnitude improvements11
in memory overhead. Because the Julia language natively supports parallel computing with an12
intuitive syntax, we benchmark test parallel download and processing of multi-week segments of13
contiguous data from two sets of 10 broadband seismic stations, and find that SeisIO outperforms14
two popular Python-based tools for data downloads. The current capabilities of SeisIO include file15
read support for several geophysical data formats, online data access using FDSN web services,16
IRIS web services, and SeisComP SeedLink, with optimized versions of several common data17
processing operations. Tutorial notebooks and extensive documentation are available to improve18
the user experience (UX). As an accessible example of performant scientific computing for the19
next generation of researchers, SeisIO offers ease of use and rapid learning without sacrificing20
computational performance.21
2
1 Introduction22
The dramatic growth in the volume of collected geophysical data has the potential to lead to23
tremendous advances in the science (https://ds.iris.edu/data/distribution/). Leveraging the data rev-24
olution to gain knowledge that is useful for earthquake science, hydrology, industry, and climate25
science requires new tools to help Earth scientists extract meaningful information from arbitrarily26
large data sets. High-performance computing is necessary to manage the scale of these prob-27
lems; however, this requires specialized training at the undergraduate and graduate levels, which is28
rarely taught in undergraduate-level science curricula. On the other hand, open-source computing29
languages (Python) and codes (e.g., ObsPy; Beyreuther et al (2010)) have standardized seismic30
data processing and improved access to seismic data analysis for a new generation of seismolo-31
gists. However, these tools suffer from slow computation time and inefficient memory allocation32
at scale. Therefore, the geophysics community is in need of a computational framework that is33
simultaneously easy to learn and efficient.34
The Julia language combines the syntactic ease of high-level languages like MATLAB and Python35
with the performance of FORTRAN and C. Developed for fast, efficient numerical computing,36
Julia version 1.0.0 was released August 2018, while the first beta version appeared February37
2012 (Bezanson et al., 2017, 2018). The language is known for impressive speed and compu-38
tational efficiency: while still in beta testing, Julia became the fourth programming language39
to achieve a petaflop, after FORTRAN, C, and C++ (Reiger et al., 2018; Perkel, 2019). De-40
spite its relative youth, Julia supports a growing collection of open-source modules for numer-41
ical and scientific computing. Julia wrappers to C, FORTRAN, R, and Python allow seamless42
execution of external code, and third-party packages (https://github.com/JuliaInterop) extend in-43
teroperability to C++, Java, Mathematica, and MATLAB, including the ability to read .mat files44
(https://github.com/JuliaIO/MAT.jl).45
3
2 SeisIO46
The SeisIO package was created in May 2016 with the goal of rapid, efficient analysis of univariate47
geophysical data in the Julia language, using comprehensible, uniform syntax, and simple but48
powerful commands. Its design allows users to read univariate data from arbitrary instruments49
(e.g., seismic, geodetic, gas flux) into a single structure, including gapped and irregularly-sampled50
data. In the subsections below, we describe the capabilities of SeisIO, conduct benchmark tests,51
and introduce tutorials.52
2.1 Capabilities53
SeisIO includes well-tested read support for many geophysical time-series formats (Table 1). Read-54
ers for all formats but ASDF strictly use the Julia language; ASDF uses wrappers to libhdf5, written55
in C. Current data processing operations include filling time gaps, removing the mean and linear56
trend, band-pass filtering, instrument response translation and removal (i.e., flattening to DC),57
resampling, cosine tapering, merging, seismogram differentiation/integration, and time synchro-58
nization. Tools for online acquisition support FDSN services (station, event, and dataselect), IRIS59
time-series requests, FDSN SeedLink, and the IRIS TauP interface (Crofwell et al., 1999).60
SeisIO has been officially listed in the Julia package ecosystem since early 2019. Automated61
testing with Travis-CI (https://travis-ci.org/) and AppVeyor (https://www.appveyor.com/) supports62
Linux, Mac OS, and Windows installations. Code coverage estimates of 97-98% on Codecov63
(https://codecov.io/) and Coveralls (https://coveralls.io/) exceed the 95% coverage threshold typical64
of enterprise-level commercial software releases, yet both Julia and SeisIO are free.65
4
2.2 Installation66
Typical installation of the Julia language, SeisIO, and all dependencies requires three total steps:67
1. Download and install the Julia language from https://julialang.org/downloads/68
• The Julia install directory will be denoted (juliaroot) hereafter.69
• (juliaroot) is typically a pattern like /home/username/julia-v.v.v/ in70
Linux, e.g., /home/josh/julia-1.1.0/.71
2. Start the Julia command-line interface (CLI) with (juliaroot)/bin/julia72
3. Type or copy: using Pkg; Pkg.add("SeisIO"); using SeisIO73
Julia installs package dependencies automatically when Pkg.add is invoked. There is no need74
for dedicated environments or session-specific user settings; however, FFT performance can some-75
times be improved by starting Julia in parallel-ready mode with (juliaroot)/bin/julia76
--procs auto. Total disk space required is typically under 4 GB: 300-400 MB for Julia; 4.277
MB for SeisIO v0.4.1; 300 MB for optional test and benchmark data; and 1-3 GB for a typical78
set of scientific computing packages. The last space requirement is much lower for non-Windows79
users who manually link existing libraries and software (e.g., BLAS, Conda, FFTW) to Julia, but80
this is only recommended for experienced Linux users.81
2.3 SeisIO Data Structure82
SeisIO is designed around easy, fluid, and fast data access. For example, a complete sequence83
of commands to download and process channel data can be executed in one function call with84
keywords:85
5
86julia> S = get_data("FDSN", "UW.LON..BH?", src="IRIS", s="2019-01-01", t=3600, detrend=true, rr=87
true, w=true)88
89
SeisData with 3 channels (2 shown)90
ID: UW.LON..BHE UW.LON..BHN ...91
NAME: Longmire CREST broad-band Longmire CREST broad-band ...92
LOC: 46.7506 N, -121.81 E, 853.0 m 46.7506 N, -121.81 E, 853.0 m ...93
FS: 40.0 40.0 ...94
GAIN: 7.51485e8 7.51485e8 ...95
RESP: a0 1.0, f0 1.0, 1z, 1p a0 1.0, f0 1.0, 1z, 1p ...96
UNITS: m/s m/s ...97
SRC: http://service.iris.edu/fdsnws/da http://service.iris.edu/fdsnws/da ...98
MISC: 4 entries 4 entries ...99
NOTES: 2 entries 2 entries ...100
T: 2019-01-01T00:00:00.010 (0 gaps) 2019-01-01T00:00:00.010 (0 gaps) ...101
X: -1.511e+03 +4.669e+03 ...102
-1.512e+03 +4.699e+03 ...103
... ... ...104
+1.540e+03 +7.483e+02 ...105
(nx = 144000) (nx = 144000) ...106
C: 0 open, 0 total107
108
109110
This example downloads 3600 seconds of data beginning 2019-01-01 00:00:00 (UTC) using FDSN111
dataselect with the IRIS DMC server. The keyword ”detrend” removes the linear trend after down-112
load; ”rr” removes (flattens to DC) the instrument response and replaces the .resp field of each113
channel with an all-pass filter. The keyword ”w” writes the download directly to disk before pro-114
cessing. Access to data properties is straightforward and intentionally simple: for example, in all115
timeseries-data structures, the field .x holds univariate data.116
2.4 Tutorials117
A SeisIO tutorial is available from the project GitHub site, with three short, interactive Jupyter118
notebooks designed to take 5-10 minutes each. A few additional commands in the Julia CLI are119
required to run interactive notebooks:120
6
using PkgPkg.add(["Dates", "IJulia"])using IJuliacd(dirname(pathof(SeisIO))*"/../tutorial/")jupyterlab(dir=pwd())
121
The three tutorials are:122
Part_1-Basic.pynb: introduction to SeisIO123
Part_2-Data_Acquisition.pynb: downloading data & reading files124
Part_3-Processing.pynb: data processing125
Researchers familiar with MATLAB/Octave or Python will find Julia syntax intuitive and may126
need only the language’s official documentation to begin coding. However, many Julia-language127
tutorials can be downloaded from https://julialang.org/learning/ .128
3 Benchmarking129
We conduct a series of benchmark tests on a 64-bit personal computer equipped with an Intel130
DH67CL motherboard, i7-2600 (3.4 GHz) CPU, and 16 GB Kingston DDR3 RAM, running Julia131
v1.1.0 on 64-bit Ubuntu Linux 18.04.3 (kernel 5.0.0-29). File read tests (Table 2) use SeisIO v0.4.1132
and BenchmarkTools.jl with 100 samples per benchmark and one evaluation per sample. Because133
Julia uses a JIT compiler, an initial compile run precedes each test. The results shown in Fig. 1134
suggest that read time and memory use scale quasi-linearly with file size.135
7
3.1 File Reads136
We now compare SeisIO read speeds with those of two popular, well-established seismic data137
packages: ObsPy for Python (Beyreuther et al, 2010; Megies et al., 2011) and SAC (Goldstein et138
al., 2003; Goldstein and Snoke, 2005). Comparative memory usage is shown in Fig. 2 and median139
read times for 100-trial test sets are shown in Fig. 3. For these tests, ObsPy v1.1.1 uses a dedicated140
Python 3.7.3 environment created with Conda 4.7.12; benchmarks use timeit.py and memory-141
profiler 0.55.0 with child processes included in memory estimates. ASDF files are benchmarked142
with pyasdf v0.5.1. SAC v106.a is compiled from source on the test machine and benchmarked143
with perf v5.0.21 and time -v; the median time and memory required to start and exit SAC without144
executing commands are subtracted from the test values.145
We compare programs for all tests in Table 2 with file readers. Comparisons with SAC are limited146
because SAC only reads two of these formats. ObsPy has no reader for PASSCAL, SUDS, or UW,147
and the ObsPy ASCII reader is incompatible with GeoCSV variants on time-series pair (tspair,148
ASCII) data. The ObsPy WIN reader couldn’t read our test files, even though our data were149
downloaded directly from Hinet and integrity-checked by comparing with output from wintosac150
(http://wwweic.eri.u-tokyo.ac.jp/cgi-bin/show man en?wintosac). Thus, all possible comparisons151
with our benchmarks are shown in Figs. 2 & 3.152
SeisIO uses less memory and read files more quickly than both SAC and ObsPy; the former is153
especially noteworthy due to SAC’s low-level coding. With the exception of ASDF read times,154
which differ by < 4%, performance differences cannot be explained by random variations in sys-155
tem background activity. Fig. 2 suggests that ObsPy has a considerable amount of static memory156
overhead associated with each file read, which may explain some read time differences (e.g. Fig.157
3). The closest read times to SeisIO are obtained with ASDF, for which pyasdf also uses wrappers158
to libhdf5. The larger of the two mini-SEED benchmarks is also roughly comparable; notably, be-159
8
cause the ObsPy mini-SEED reader is a wrapper to libmseed for C (Trabant, 2019), both the ObsPy160
and SAC comparisons strongly support the claim that well-optimized Julia code can outperform161
well-optimized C, even with Julia’s high-level syntax, undaunting UX, and JIT compiler.162
3.2 Download Throughput163
With the data requirements of modern analysis techniques, download throughput is an increas-164
ingly important consideration when choosing data acquisition software. We benchmark down-165
load througput using SeisIO and two popular Python tools: ObsPy and ROVER v1.0.4 (devel-166
oped by IRIS-DMC and available at https://iris-edu.github.io/rover/). ROVER has built-in op-167
tions for multi-worker SQL requests. We use mpi4py with the NoisePy noise-correlation toolbox168
(https://github.com/mdenolle/NoisePy, Jiang et al., in prep) to parallelize ObsPy downloads. For169
SeisIO, we use the SeisDownload.jl module (https://github.com/kura-okubo/SeisDownload.jl, ver-170
sion 1.2.0, last accessed 2019/10/02), developed to leverage Julia’s built-in parallelization function171
pmap.172
This benchmark test uses publicly-available data from three-component broadband seismograph173
stations archived at the IRIS DMC and the Northern California Earthquake Data Center (NCEDC).174
Each test uses 10 stations; download sizes are 7 GB for the TA network and 17 GB for the BP175
network. For the IRIS-DMC test, we use 8 worker CPUs to match server-side connection limits176
and the maximum workers available in NoisePy. The request comprises 16 days of continuous data177
sampled at 40 Hz. For the NCEDC test, we requested 3-month segments of seismic data sampled178
at 20 Hz from stations in the Berkeley Parkfield (BP) High Resolution Seismic Network using179
SeisIO and Obspy. Tests were performed using a 32-core Intel(R) Xeon(R) Platinum 8268 CPU180
@ 2.90 GHz with 64 GB RAM.181
The computation time for the tests includes the data request from the remote server and conversion182
9
to mseed format. The download efficiency is defined as the total amount of downloaded data / total183
computational time [MB/s]. No preprocessing (e.g., detrending, tapering, filtering) is applied.184
Figure 4 shows the download efficiency. The download efficiency of SeisIO can reach 3.3× that of185
ObsPy, in agreement with standard microbenchmarks of the Julia language (Bezanson et al., 2017).186
In the IRIS-DMC benchmark, the scaling of download speed with number of workers follows a187
power law with an exponent of 1.06 for ROVER, 0.97 for ObsPy, and 0.92 for SeisIO with the TA188
network (Figure 4a); in the NCEDC benchmark, the scaling exponents are 0.92 for ObsPy and 0.96189
for SeisIO, respectively (Figure 4b). In larger downloads where the computational time required190
for the allocation of workers is negligible compared to that of the data download itself, we report191
that the scaling exponent converges to 1.0 . Therefore, the Julia language appears well-optimized192
for parallel computation using only built-in functions (pmap).193
3.3 Processing Example: Instrument Response Removal194
The removal of an instrument response function is a general processing operation that converts195
recorded counts or Volts to the approximate physical units of measure, such as ground velocity196
(m/s), at frequencies from DC to the Nyquist frequency. This is a common preprocessing step197
in seismic data analysis, e.g., when comparing and/or cross-correlating waveforms recorded by198
different instruments (e.g. Bensen et al., 2007). We use the computational efficiency of response199
removal as an example processing operation and perform comparative benchmark tests using Ob-200
sPy and SeisIO.201
The test data comprise a one-day digital seismogram from channel TA.121A.HHZ, network TA and202
station name 121A, sampled at 100 Hz. Data are bandpass filtered before removing the instrument203
response, with a 4-corner cosine taper in ObsPy and a Butterworth filter in SeisIO. To ensure204
that the test measures a single processing step, the bandpass operation is not timed. We test on a205
10
single-core computer with an Intel(R) Core i5 CPU @ 3.4 GHz with 8 GB RAM.206
Figure 5a shows computation times for file read and response removal. We conducted 100 trials207
of each process; mean values are shown, with standard deviations as error bars. The speedup of208
SeisIO is 1.6x relative to ObsPy for reading data, consistent with the results of test MSEED-1 in209
Figure 3; the speedup is 6.8x for instrument response removal. Figure 5b shows a graphical com-210
parison of output waveforms, demonstrating the agreement between ObsPy and SeisIO. Although211
the differences near the edges of each trace are large compared to the middle, the artifacts can be212
adequately suppressed by cosine tapering before removing instrumental response (Figure 5b top).213
In this test, the first and last 0.2% of samples in each window are tapered with both Obspy and214
SeisIO. The small misfit in amplitude and/or phase arises from differences in filtering strategies.215
4 Conclusions and Future Directions216
The SeisIO data framework is the first of its kind: high-level, easy, performant software that in-217
troduces the next generation of geophysics researchers to cutting-edge scientific computing in the218
Julia language. We have shown that SeisIO’s speed and efficiency can outperform specialized219
precompiled C-language software. The benefits are lower computing requirements and costs.220
The intent of SeisIO is to provide an efficient framework for geophysical data while maintaining221
comprehensible syntax. Core functionality will expand to additional data formats and acquisition222
methods based on demand; APIs and guides are available on the project homepage for potential223
contributors. Analysis programs based on SeisIO are in development, particularly for ambient-224
noise seismology (Bryan et al., 2019; Clements and Denolle, 2019). A SeisIO variant for GPU225
computing is in development and support for multiparametric volcano monitoring data is planned.226
As SeisIO is refined, and its scope expands to include GPU, cloud, and heterogeneous computing,227
11
we expect support to increase among seismologists and other geophysics researchers, many of228
whom find themselves spending valuable research time teaching new students to compile arcane229
(and sometimes, antique) programs.230
Acknowledgments231
The authors thank Andy Nowacki (University of Leeds, UK) for discussions on the Julia lan-232
guage; Douglas Neuhauser (University of California Seismological Laboratory, Berkeley, CA,233
USA) and David Shelly (US Geological Survey, Golden, CO, USA) for discussions on SAC and234
other data formats, which helped motivate the creation of SeisIO. J. Jones is thankful to Chad235
Trabant and Robert Casey (Incorporated Research Institutions for Seismology, Seattle, WA, USA)236
for assistance with IRIS web protocols. M. Denolle and J. Jones thank Ellen Yu and Aparna237
Bhaskaran (California Institute of Technology, Pasadena, California, USA) for assistance with238
SCSN FDSN and correspondence. J. Jones extends additional thanks to Wendy McCausland239
(USGS-VDAP, USA) and Ken Creager (University of Washington, USA) for contributing test data,240
and R. Carniel (Universita di Udine, Italy) for extensive early testing. mini-SEED handling was241
originally based on rdmseed.m for MATLAB by Francois Beauducel (Institut de Physique du242
Globe de Paris, France); SAC routines were originally based on SacIO for Julia by Ben Postleth-243
waite (https://github.com/bpostlethwaite/SacIO). This research was supported by a grant from the244
Packard Foundation.245
Author Contributions246
J. Jones created SeisIO, is the sole developer of the core package, and happily rules with an iron fist247
over its development and maintenance. T. Clements created the SeisIO notebook tutorial, devel-248
12
oped a number of packages based on SeisIO, and created the prototypes of several data processing249
routines. K. Okubo wrote and conducted the benchmarks of download efficiency and instrumen-250
tal response removal, and has developed a parallel downloader prototype, SeisDownload.jl, as an251
example of the many SeisIO applications created by M. Denolle’s research group; its functionality252
is currently being integrated into SeisIO core. M. Denolle contributed to application development,253
research direction, and manuscript editing, and provides management and financial support for254
ongoing development.255
Data and Resources256
Data used in benchmark tests (Table 2) can be found in the SeisIO GitHub repository, with redistri-257
bution restrictions as noted below. Benchmarking scripts are available on the SeisIO GitHub page.258
Data sources in Table 2 use the following key:259
1. Contributed by Prof. K. Creager, University of Washington, Seattle, WA, USA260
([email protected]).261
2. Retrieved with IRIS FDSN dataselect; to duplicate a data request, please contact the corre-262
sponding author for exact parameters. Each binary data file has a single data channel; each263
file name gives the time length and sampling frequency.264
3. File is from the IRIS Mt. St. Helens 1980 special data set (IRIS virtual network265
STHELENS-1980). Original data are available by request from Incorporated Research266
Institutions for Seismology, Seattle, WA, USA.267
4. File data are from the vertical-component channel of station EA3 in Jones et al. (2006). The268
original recording format was the SLIST variant of Lennartz MarsLite portable stations; the269
first line of text was manually edited to match SLIST syntax for this test.270
5. Redistribution restricted; to request this file please contact Dr. W. McCausland, USGS-271
VDAP, Vancouver, WA, USA ([email protected]). Data file comprises five minutes272
of 100 Hz data on 22 channels beginning 2008-10-08T17:01:06.06 (UTC -6).273
6. Available upon request from the corresponding author. Event data extracted from Pacific274
Northwest Seismic Network archives; data are fully described in Jones and Malone (2005).275
13
7. Data from HiNet (NIED, 2019); redistribution prohibited. Request comprises one hour of276
100 Hz data beginning 2014-09-27T09:00:00 (UTC+9) from 8 total channels (seismometer277
+ infrasound at stations V.ONTA and V.ONTN). Benchmark uses the NIED channel file.278
A standalone repository to reproduce the benchmark tests for download efficiency presented in279
section 3.2 is available on GitHub. The required software, computational environment, data sets,280
and commands to execute the benchmark tests are documented in the repository.281
The NoisePy module for ObsPy is part of a separate manuscript, currently in preparation. The282
repository is private until publication, but code is available upon request from its creator (Dr. C.283
Jiang., Harvard University, MA, USA, chengxin [email protected]).284
Addendum285
The SeisIO package presented in this work is the only official Julia package by this name. We286
recently learned of another, newer package that borrows the name SeisIO, consisting of reflection287
seismology software for SEGY data, whose code has migrated to another project. This other288
SeisIO is not part of the Julia registry and is completely unrelated to this work, but can be found289
on GitHub and via. Google search, and packages that depend on it exist in the Julia registry. To290
minimize potential confusion, please follow the installation instructions in this manuscript or on291
our Github page.292
References293
Ahern, T., Casey, R., Barnes, D., Benson, R., & Knight, T. (2007). Seed standard for the exchange of earthquake data294
reference manual format version 2.4. Incorporated Research Institutions for Seismology (IRIS), Seattle.295
Bensen, G. D., Ritzwoller, M. H., Barmin, M. P., Levshin, A. L., Lin, F., Moschetti, M. P., Shapiro, N. M. and Yang, Y.296
14
(2007) Processing seismic ambient noise data to obtain reliable broad-band surface wave dispersion measurements,297
Geophysical Journal International, 169(3), 1239-1260.298
M. Beyreuther, R. Barsch, L. Krischer, T. Megies, Y. Behr and J. Wassermann (2010), ObsPy: A Python Toolbox for299
Seismology, SRL, 81(3), 530-533. DOI: 10.1785/gssrl.81.3.530300
Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A fresh approach to numerical computing.301
SIAM review, 59(1), 65-98.302
Bezanson, J., Chen, J., Chung, B., Karpinski, S., Shah, V. B., Vitek, J., & Zoubritzky, L. (2018). Julia: dynamism and303
performance reconciled by design. Proceedings of the ACM on Programming Languages, 2(OOPSLA), 120.304
Bryan, J. T., Okubo, K., Yuan, C., & Denolle, M. (2019) Improving the resolution of co-seismic velocity change305
monitoring at active fault zones using the ambient seismic field, Poster Presentation at 2019 SCEC Annual Meeting.306
Clements, T. & Denolle, M. (2019, 08) Cactus to Clouds: Processing the SCEDC Open Data Set on AWS, Poster307
Presentation at 2019 SCEC Annual Meeting.308
Crotwell, H. P., T. J. Owens, and J. Ritsema (1999). The TauP Toolkit: Flexible seismic travel-time and ray-path309
utilities, Seismological Research Letters 70, 154-160.310
Goldstein, P., A. Snoke, (2005), ”SAC Availability for the IRIS Community”, Incorporated Institutions for Seismology311
Data Management Center Electronic Newsletter.312
Goldstein, P., D. Dodge, M. Firpo, Lee Minner (2003) SAC2000: Signal processing and analysis tools for seismolo-313
gists and engineers, Invited contribution to ”The IASPEI International Handbook of Earthquake and Engineering314
Seismology”, Edited by WHK Lee, H. Kanamori, P.C. Jennings, and C. Kisslinger, Academic Press, London.315
Hagelund, Rune; Stewart A. Levin, eds. (2017). SEG-Y r2.0: SEG-Y revision 2.0 Data Exchange format (PDF). Tulsa,316
OK: Society of Exploration Geophysicists.317
Jones, J.P., Carniel, R., Harris, A.J., & Malone, S.D. (2006). Seismic characteristics of variable convection at Erta ’Ale318
lava lake, Ethiopia. J. Volcanol. Geotherm. Res., 153(1), 64–79.319
Jones, J.P., & Malone, S. D. (2005). Mount Hood earthquake activity: Volcanic or tectonic origins?. Bulletin of the320
Seismological Society of America, 95(3), 818-832.321
Lion Krischer, James Smith, Wenjie Lei, Matthieu Lefebvre, Youyi Ruan, Elliott Sales de Andrade, Norbert Pod-322
horszki, Ebru Bozdag̈, Jeroen Tromp, An Adaptable Seismic Data Format, Geophysical Journal International, Vol-323
ume 207, Issue 2, November, 2016, Pages 1003?1011, https://doi.org/10.1093/gji/ggw319.324
15
T. Megies, M. Beyreuther, R. Barsch, L. Krischer, J. Wassermann (2011), ObsPy ? What can it do for data centers and325
observatories?, Annals Of Geophysics, 54(1), 47-58, DOI: 10.4401/ag-4838.326
National Research Institute for Earth Science and Disaster Resilience (2019), NIED Hi-net, National Research Institute327
for Earth Science and Disaster Resilience, doi:10.17598/NIED.0003.328
Perkel, Jeffrey M. (2019). Julia: come for the syntax, stay for the speed, Nature 572, 141-142, doi: 10.1038/d41586-329
019-02310-3.330
Regier, J., Fischer, K., Pamnany, K., Noack, A., Revels, J., Lam, M., Howard, S., Giordano, R., Schlegel, D.,331
McAuliffe, J. and Thomas, R., 2019. Cataloging the visible universe through Bayesian inference in Julia at petas-332
cale. Journal of Parallel and Distributed Computing, 127, pp.89-104.333
Schorlemmer, D., Euchner, F., Kstli, P., & Saul, J. (2011). QuakeML: status of the XML-based seismological data334
exchange format. Annals of Geophysics, 54(1), 59-65.335
Trabant, C. (2019), libmseed - The miniSEED library. https://github.com/iris-edu/libmseed, last accessed 2019-09-24.336
Ward, Peter L. (1989). SUDS; seismic unified data system, USGS Open-File Report 89-188, doi:10.3133/ofr89188.337
16
Table 1: Data format support in SeisIO v0.4.1. Columns: ”RW” is read/write support (”r” = read, ”w” = write); column”Cov” is the lesser of % code coverage on CodeCov.io and Coveralls.io. Notes use the key below.
1. coverage reflects only supported blockette/packet types
2. support for Provenance not yet implemented (NYI)
3. supports IEEE-Float and integer data in SEGY rev 0 and rev 1 formats
Format Name SeisIO Name rw Cov Notes ReferenceSEED Ahern et al. (2007)
Dataless SEED dataless r 96 1mini-SEED mseed r 96 1SEED resp resp r 96
SAC e.g. Goldstein et al. (2003)SAC data file sac rw 97SAC pole-zero file sacpz rw 97
OTHERAd Hoc (v1, v2) ah1, ah2 r 96Advanced Seismic Data Format asdf rw 100 2 Krischer et al. (2016)GeoCSV sample list geocsv.slist r 98GeoCSV time-sample pair geocsv r 98QuakeML qml r 100 e.g. Schorlemmer et al. (2012)SEG Y (rev 0, rev 1) segy r 93 3 Hagelund et al. (2017)PASSCAL (SEG Y variant) passcal r 96Sample List ASCII slist r 100(SeisIO low-level format) seisio rw 100 this workFDSN Station XML sxml rw 100Seismic Unified Data System suds r 94 1 Ward (1989)UNAVCO Bottle bottle r 100University of Washington uw r 98WIN (32-bit, v1) win32 r 96 NIED (2019)
17
Table 2: Benchmark tests. Columns: Test Name is how the test is referenced in this manuscript; Filename is the nameor search pattern in SeisIO/test/SampleFiles/; Format corresponds to column 2 of Table 1; SzF is file size on disk;SzO is object size in memory; Mem is peak memory usage; %Ovh ≡ 100 × (Mem/SzO − 1.0)%; T̃ is median readtime in milliseconds for 100 trials. All memory and file size values are in MB. In column Notes, numeric values aredata sources (see Data and Resources); lowercase letters denote special benchmark parameters:
a test uses read asdf
b test uses read data with keywords nx new=36000, nx add=1400000
c test uses read data with keyword full=true
Test Name File Format SzF SzO Mem %Ov T̃ NotesAH 1day-1hz.ah ah1 0.33 0.33 0.33 1.11 0.49 1ASDF 2days-40hz.h5 asdf 21.96 26.37 26.49 0.45 92.74 2,aGeoCSV-tspair geo-tspair.csv geocsv 3.31 0.39 0.44 12.30 204.01 2MSEED-1 1day-100hz.mseed mseed 19.09 32.96 32.96 0.01 71.46 2MSEED-2 SHW.UW.mseed mseed 1.79 5.35 6.19 15.75 9.33 3,bPASSCAL 1day-100hz.segy passcal 32.96 32.96 32.99 0.08 22.30 2,cSAC 1day-100hz.sac sac 32.96 32.96 32.97 0.04 13.02 2,cSLIST 1h-62.5hz.slist slist 2.44 0.86 0.87 1.85 30.09 4SUDS 10081701.WVP suds 1.26 2.53 2.59 2.43 1.36 5UW 99011116541W uw 23.15 37.66 40.29 6.98 26.71 6WIN32 2014092709*.cnt win32 4.49 10.99 11.25 2.33 22.88 7
18
Figure 1: Benchmarks tests (Table 2) in Julia v1.1.0 with SeisIO v0.4.1. Left: file read times. Right: peak memoryuse in SeisIO and file size on disk.
19
Figure 2: Memory use and overhead for all benchmarks in Table 2 that were testable in at least two of ObsPy, SAC,and SeisIO. (top) Memory usage and file sizes on disk. (bottom) Memory overhead. The y-axis is logarithmic. Amissing bar with text label NR” indicates no reader.
20
Figure 3: Read times in milliseconds for all benchmarks in Table 2 that were testable in at least two of ObsPy, SAC,and SeisIO. A missing bar with text label NR indicates no reader. Most read times fall in the range 10-100 ms. SeisIOAH and SUDS benchmarks are labeled with their respective values because the bars themselves are difficult to see.ObsPy SLIST benchmark is labeled with its value because the full bar vastly exceeds the upper bound of the y-axis.
21
Figure 4: Download efficiency as a function of number of workers from (a) the IRIS-DMC server and (b) the NorthernCalifornia Earthquake Data Center (NCEDC). The markers indicate individual speed tests. The dashed lines indicatethe best-fit line (with logarithmic y-axis scaling) associated with each tool; the slope of each line is a proxy measureof the scaling performance.
Figure 5: Benchmark tests of instrument response removal. (a) Time benchmarks of data read and instrument responseremoval. Solid bar heights correspond to the mean times of each benchmark; 1σ error bars are shown as thin blacklines. (b) Waveforms with their respective instrument responses removed are shown to demonstrate that the methodsproduce nearly identical output. For ease of visualization, lines are plotted every 20 points.
22