physics-based data mining for seismic bulletins · [email protected] abstract production of high...
TRANSCRIPT
Physics-Based Data Mining For Seismic BulletinsStephen C. Myers and Gardar Johannesson
Lawrence Livermore National Laboratory, Livermore, CA USA
Correspondence
AbstractProduction of high quality data sets that are used for seismic calibration
remains a painstaking and costly endeavor. The value of meticulous data
analysis is undeniable, but the number and geographic coverage of
carefully measured arrival times is not sufficient for either development of
high-fidelity (3-dimensional) models or comprehensive empirical calibration.
Although regional and global bulletins of seismic data provide excellent data
coverage, these databases are contaminated by noisy and spurious data. In
this study Bayesloc – a stochastic multiple-event seismic location algorithm
– is used to mine a database of arrival time measurements (picks) that are
predominantly collected from bulletins. A small subset of trusted picks
made at Lawrence Livermore National Laboratory is also included.
Bayesloc applies data-mining methodologies and constraints afforded by
the physics of seismic travel times to model the physical parameters of the
multiple-event location system. The Bayesloc method produces a joint
probability density function across event locations, event-station travel
times, pick precision, and phase labels (including identification of erroneous
data). Bayesloc is applied to a collection of approximately 2500 events
spanning the Middle East. We find that out of ~387,000 P and Pn picks,
~30,000 (~8%) are not members of the P or Pn population, but are
erroneous data. Most importantly for seismic studies, Bayesloc analysis
reduces residual standard deviation from 1.6 seconds to 0.8 seconds for P
arrivals and from 2.3 seconds to 1.8 seconds for Pn arrivals. Errors due to
the 3-dimensional Earth remain in Bayesloc residuals, making the data set
ideal for use in 3-dimensional tomographic studies. Data culling on this
massive scale with the precision accomplished here was previously
impractical if not impossible.
Tomography: bulletin data
The Bayesloc MethodProbabilistic formulation of the multiple-event location problem
p(o,x,T ,W , , | a,w) p(a | o,T ,W , )p(T | F(x), )p(W | w)p(x,o)p( )p( )/ p(a)
Probabilistic accuracy of input hypocenter
(provided as prior information)
Probability of arrival times (a) given
o = all origin times
T = all travel times (with corrections)
W= Phase labels
= collection of arrival time error parameters
Probability of estimated travel times (T) given
F = model-based travel time prediction
x = all event locations
= collection of travel time error parameters
Probabilistic precision of arrival time measurement
(provided as prior information)
Probabilistic precision of travel time prediction
(provided as prior information)
Probability of correct phase labels given
w = reported phase labels
(provided as prior information)
Joint probability density
is inferred using
Markov-Chain Monte
Carlo (MCMC)
methods
Flowchart showing MCMC proposal, acceptance/rejection
Joint probability across
1) Event locations
2) Travel time predictions
3) Arrival-time measurement precision
4) Phase label (e.g. P,Pn,…,erroneous)
Tomography: Bayesloc data
Bulletin data issues are apparent when
travel times are plotted
Residuals w.r.t distance for bulletin arrivals labeled “P”
Gross mislocations
Large data spread
Improved
Accuracy
Locations
Epicenters are consistently shifted to
the north northeast. The magnitude of
the shift is ~20 km on average.
The direction and
magnitude of the
epicenter shift is
consistent with
comparisons of local
aftershock deployments
and bulletins by
Bergman et al. (2009).
Travel Times
Posteriori travel time P residuals
Posteriori phase label = P
at >= 0.9 probability
Posteriori phase label != P Posteriori travel time Pn
residualsPosteriori phase label = Pn at
>= 0.9 probability
Posteriori phase label != Pn
Approximately 8% of
measurements labeled P are
found to be erroneous. 92% of
the P arrivals are in the thin
black distribution to the right.
Pn travel time prediction errors
are known to be larger than P
errors. Bayesloc models the
broader Pn distribution and
allows observations with larger
residuals into the Pn distribution.
Earth Models
Post-Tomography variance
Bayesloc 1.10 s2
Bulletin 1.75 s2
Increased data accuracy and
consistency allow better data fit with
a physical model.
From : Simmons, Myers and Ramirez, 2009
Sets of multiple-event
parameters are
proposed. The
proposed parameter set
has a high probability of
acceptance if the new
configuration improves
weighted data fit
relative to the current
configuration. The
collection of accepted
parameter sets is used
to infer the joint
probability across the
multiple-event system.
Pick Precision
Bayesloc decomposes
measurement precision into station,
phase, and event components.
Maps shown here are the station
component of precision.
Posteriori station precision (Global)
Posteriori station precision (Regional)
Precision is used to weight the
importance of data fit. Stations
with high quality picks are
identified and up-weighted.
Improved model fit with Bayesloc
Residual trends
The views expressed here do not necessarily reflect the opinion of the United States Government, the United States Department of Energy, or Lawrence Livermore National Laboratory.