physics-based data mining for seismic bulletins · [email protected] abstract production of high...

Physics-Based Data Mining For Seismic BulletinsStephen C. Myers and Gardar Johannesson

Lawrence Livermore National Laboratory, Livermore, CA USA

Correspondence

[email protected]

AbstractProduction of high quality data sets that are used for seismic calibration

remains a painstaking and costly endeavor. The value of meticulous data

analysis is undeniable, but the number and geographic coverage of

carefully measured arrival times is not sufficient for either development of

high-fidelity (3-dimensional) models or comprehensive empirical calibration.

Although regional and global bulletins of seismic data provide excellent data

coverage, these databases are contaminated by noisy and spurious data. In

this study Bayesloc – a stochastic multiple-event seismic location algorithm

– is used to mine a database of arrival time measurements (picks) that are

predominantly collected from bulletins. A small subset of trusted picks

made at Lawrence Livermore National Laboratory is also included.

Bayesloc applies data-mining methodologies and constraints afforded by

the physics of seismic travel times to model the physical parameters of the

multiple-event location system. The Bayesloc method produces a joint

probability density function across event locations, event-station travel

times, pick precision, and phase labels (including identification of erroneous

data). Bayesloc is applied to a collection of approximately 2500 events

spanning the Middle East. We find that out of ~387,000 P and Pn picks,

~30,000 (~8%) are not members of the P or Pn population, but are

erroneous data. Most importantly for seismic studies, Bayesloc analysis

reduces residual standard deviation from 1.6 seconds to 0.8 seconds for P

arrivals and from 2.3 seconds to 1.8 seconds for Pn arrivals. Errors due to

the 3-dimensional Earth remain in Bayesloc residuals, making the data set

ideal for use in 3-dimensional tomographic studies. Data culling on this

massive scale with the precision accomplished here was previously

impractical if not impossible.

Tomography: bulletin data

The Bayesloc MethodProbabilistic formulation of the multiple-event location problem

p(o,x,T ,W , , | a,w) p(a | o,T ,W , )p(T | F(x), )p(W | w)p(x,o)p( )p( )/ p(a)

Probabilistic accuracy of input hypocenter

(provided as prior information)

Probability of arrival times (a) given

o = all origin times

T = all travel times (with corrections)

W= Phase labels

= collection of arrival time error parameters

Probability of estimated travel times (T) given

F = model-based travel time prediction

x = all event locations

= collection of travel time error parameters

Probabilistic precision of arrival time measurement


Probabilistic precision of travel time prediction


Probability of correct phase labels given

w = reported phase labels


Joint probability density

is inferred using

Markov-Chain Monte

Carlo (MCMC)

methods

Flowchart showing MCMC proposal, acceptance/rejection

Joint probability across

1) Event locations

2) Travel time predictions

3) Arrival-time measurement precision

4) Phase label (e.g. P,Pn,…,erroneous)

Tomography: Bayesloc data

Bulletin data issues are apparent when

travel times are plotted

Residuals w.r.t distance for bulletin arrivals labeled “P”

Gross mislocations

Large data spread

Improved

Accuracy

Locations

Epicenters are consistently shifted to

the north northeast. The magnitude of

the shift is ~20 km on average.

The direction and

magnitude of the

epicenter shift is

consistent with

comparisons of local

aftershock deployments

and bulletins by

Bergman et al. (2009).

Travel Times

Posteriori travel time P residuals

Posteriori phase label = P

at >= 0.9 probability

Posteriori phase label != P Posteriori travel time Pn

residualsPosteriori phase label = Pn at

>= 0.9 probability

Posteriori phase label != Pn

Approximately 8% of

measurements labeled P are

found to be erroneous. 92% of

the P arrivals are in the thin

black distribution to the right.

Pn travel time prediction errors

are known to be larger than P

errors. Bayesloc models the

broader Pn distribution and

allows observations with larger

residuals into the Pn distribution.

Earth Models

Post-Tomography variance

Bayesloc 1.10 s2

Bulletin 1.75 s2

Increased data accuracy and

consistency allow better data fit with

a physical model.

From : Simmons, Myers and Ramirez, 2009

Sets of multiple-event

parameters are

proposed. The

proposed parameter set

has a high probability of

acceptance if the new

configuration improves

weighted data fit

relative to the current

configuration. The

collection of accepted

parameter sets is used

to infer the joint

probability across the

multiple-event system.

Pick Precision

Bayesloc decomposes

measurement precision into station,

phase, and event components.

Maps shown here are the station

component of precision.

Posteriori station precision (Global)

Posteriori station precision (Regional)

Precision is used to weight the

importance of data fit. Stations

with high quality picks are

identified and up-weighted.

Improved model fit with Bayesloc

Residual trends

The views expressed here do not necessarily reflect the opinion of the United States Government, the United States Department of Energy, or Lawrence Livermore National Laboratory.

physics-based data mining for seismic bulletins · [email protected] abstract production of high...

Documents