physics-based data mining for seismic bulletins · [email protected] abstract production of high...

1
Physics-Based Data Mining For Seismic Bulletins Stephen C. Myers and Gardar Johannesson Lawrence Livermore National Laboratory, Livermore, CA USA Correspondence [email protected] Abstract Production of high quality data sets that are used for seismic calibration remains a painstaking and costly endeavor. The value of meticulous data analysis is undeniable, but the number and geographic coverage of carefully measured arrival times is not sufficient for either development of high-fidelity (3-dimensional) models or comprehensive empirical calibration. Although regional and global bulletins of seismic data provide excellent data coverage, these databases are contaminated by noisy and spurious data. In this study Bayesloc a stochastic multiple-event seismic location algorithm is used to mine a database of arrival time measurements (picks) that are predominantly collected from bulletins. A small subset of trusted picks made at Lawrence Livermore National Laboratory is also included. Bayesloc applies data-mining methodologies and constraints afforded by the physics of seismic travel times to model the physical parameters of the multiple-event location system. The Bayesloc method produces a joint probability density function across event locations, event-station travel times, pick precision, and phase labels (including identification of erroneous data). Bayesloc is applied to a collection of approximately 2500 events spanning the Middle East. We find that out of ~387,000 P and Pn picks, ~30,000 (~8%) are not members of the P or Pn population, but are erroneous data. Most importantly for seismic studies, Bayesloc analysis reduces residual standard deviation from 1.6 seconds to 0.8 seconds for P arrivals and from 2.3 seconds to 1.8 seconds for Pn arrivals. Errors due to the 3-dimensional Earth remain in Bayesloc residuals, making the data set ideal for use in 3-dimensional tomographic studies. Data culling on this massive scale with the precision accomplished here was previously impractical if not impossible. Tomography: bulletin data The Bayesloc Method Probabilistic formulation of the multiple-event location problem p ( o , x , T , W , , | a , w) p ( a | o , T , W , ) p ( T | F ( x), ) p ( W | w) p ( x, o ) p ( ) p ( )/ p ( a ) Probabilistic accuracy of input hypocenter (provided as prior information) Probability of arrival times (a) given o = all origin times T = all travel times (with corrections) W= Phase labels = collection of arrival time error parameters Probability of estimated travel times (T) given F = model-based travel time prediction x = all event locations = collection of travel time error parameters Probabilistic precision of arrival time measurement (provided as prior information) Probabilistic precision of travel time prediction (provided as prior information) Probability of correct phase labels given w = reported phase labels (provided as prior information) Joint probability density is inferred using Markov-Chain Monte Carlo (MCMC) methods Flowchart showing MCMC proposal, acceptance/rejection Joint probability across 1) Event locations 2) Travel time predictions 3) Arrival-time measurement precision 4) Phase label (e.g. P,Pn,…,erroneous) Tomography: Bayesloc data Bulletin data issues are apparent when travel times are plotted Residuals w.r.t distance for bulletin arrivals labeled “P” Gross mislocations Large data spread Improved Accuracy Locations Epicenters are consistently shifted to the north northeast. The magnitude of the shift is ~20 km on average. The direction and magnitude of the epicenter shift is consistent with comparisons of local aftershock deployments and bulletins by Bergman et al. (2009). Travel Times Posteriori travel time P residuals Posteriori phase label = P at >= 0.9 probability Posteriori phase label != P Posteriori travel time Pn residuals Posteriori phase label = Pn at >= 0.9 probability Posteriori phase label != Pn Approximately 8% of measurements labeled P are found to be erroneous. 92% of the P arrivals are in the thin black distribution to the right. Pn travel time prediction errors are known to be larger than P errors. Bayesloc models the broader Pn distribution and allows observations with larger residuals into the Pn distribution. Earth Models Post-Tomography variance Bayesloc 1.10 s 2 Bulletin 1.75 s 2 Increased data accuracy and consistency allow better data fit with a physical model. From : Simmons, Myers and Ramirez, 2009 Sets of multiple-event parameters are proposed. The proposed parameter set has a high probability of acceptance if the new configuration improves weighted data fit relative to the current configuration. The collection of accepted parameter sets is used to infer the joint probability across the multiple-event system. Pick Precision Bayesloc decomposes measurement precision into station, phase, and event components. Maps shown here are the station component of precision. Posteriori station precision (Global) Posteriori station precision (Regional) Precision is used to weight the importance of data fit. Stations with high quality picks are identified and up-weighted. Improved model fit with Bayesloc Residual trends The views expressed here do not necessarily reflect the opinion of the United States Government, the United States Department of Energy, or Lawrence Livermore National Laboratory.

Upload: others

Post on 15-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Physics-Based Data Mining For Seismic BulletinsStephen C. Myers and Gardar Johannesson

Lawrence Livermore National Laboratory, Livermore, CA USA

Correspondence

[email protected]

AbstractProduction of high quality data sets that are used for seismic calibration

remains a painstaking and costly endeavor. The value of meticulous data

analysis is undeniable, but the number and geographic coverage of

carefully measured arrival times is not sufficient for either development of

high-fidelity (3-dimensional) models or comprehensive empirical calibration.

Although regional and global bulletins of seismic data provide excellent data

coverage, these databases are contaminated by noisy and spurious data. In

this study Bayesloc – a stochastic multiple-event seismic location algorithm

– is used to mine a database of arrival time measurements (picks) that are

predominantly collected from bulletins. A small subset of trusted picks

made at Lawrence Livermore National Laboratory is also included.

Bayesloc applies data-mining methodologies and constraints afforded by

the physics of seismic travel times to model the physical parameters of the

multiple-event location system. The Bayesloc method produces a joint

probability density function across event locations, event-station travel

times, pick precision, and phase labels (including identification of erroneous

data). Bayesloc is applied to a collection of approximately 2500 events

spanning the Middle East. We find that out of ~387,000 P and Pn picks,

~30,000 (~8%) are not members of the P or Pn population, but are

erroneous data. Most importantly for seismic studies, Bayesloc analysis

reduces residual standard deviation from 1.6 seconds to 0.8 seconds for P

arrivals and from 2.3 seconds to 1.8 seconds for Pn arrivals. Errors due to

the 3-dimensional Earth remain in Bayesloc residuals, making the data set

ideal for use in 3-dimensional tomographic studies. Data culling on this

massive scale with the precision accomplished here was previously

impractical if not impossible.

Tomography: bulletin data

The Bayesloc MethodProbabilistic formulation of the multiple-event location problem

p(o,x,T ,W , , | a,w) p(a | o,T ,W , )p(T | F(x), )p(W | w)p(x,o)p( )p( )/ p(a)

Probabilistic accuracy of input hypocenter

(provided as prior information)

Probability of arrival times (a) given

o = all origin times

T = all travel times (with corrections)

W= Phase labels

= collection of arrival time error parameters

Probability of estimated travel times (T) given

F = model-based travel time prediction

x = all event locations

= collection of travel time error parameters

Probabilistic precision of arrival time measurement

(provided as prior information)

Probabilistic precision of travel time prediction

(provided as prior information)

Probability of correct phase labels given

w = reported phase labels

(provided as prior information)

Joint probability density

is inferred using

Markov-Chain Monte

Carlo (MCMC)

methods

Flowchart showing MCMC proposal, acceptance/rejection

Joint probability across

1) Event locations

2) Travel time predictions

3) Arrival-time measurement precision

4) Phase label (e.g. P,Pn,…,erroneous)

Tomography: Bayesloc data

Bulletin data issues are apparent when

travel times are plotted

Residuals w.r.t distance for bulletin arrivals labeled “P”

Gross mislocations

Large data spread

Improved

Accuracy

Locations

Epicenters are consistently shifted to

the north northeast. The magnitude of

the shift is ~20 km on average.

The direction and

magnitude of the

epicenter shift is

consistent with

comparisons of local

aftershock deployments

and bulletins by

Bergman et al. (2009).

Travel Times

Posteriori travel time P residuals

Posteriori phase label = P

at >= 0.9 probability

Posteriori phase label != P Posteriori travel time Pn

residualsPosteriori phase label = Pn at

>= 0.9 probability

Posteriori phase label != Pn

Approximately 8% of

measurements labeled P are

found to be erroneous. 92% of

the P arrivals are in the thin

black distribution to the right.

Pn travel time prediction errors

are known to be larger than P

errors. Bayesloc models the

broader Pn distribution and

allows observations with larger

residuals into the Pn distribution.

Earth Models

Post-Tomography variance

Bayesloc 1.10 s2

Bulletin 1.75 s2

Increased data accuracy and

consistency allow better data fit with

a physical model.

From : Simmons, Myers and Ramirez, 2009

Sets of multiple-event

parameters are

proposed. The

proposed parameter set

has a high probability of

acceptance if the new

configuration improves

weighted data fit

relative to the current

configuration. The

collection of accepted

parameter sets is used

to infer the joint

probability across the

multiple-event system.

Pick Precision

Bayesloc decomposes

measurement precision into station,

phase, and event components.

Maps shown here are the station

component of precision.

Posteriori station precision (Global)

Posteriori station precision (Regional)

Precision is used to weight the

importance of data fit. Stations

with high quality picks are

identified and up-weighted.

Improved model fit with Bayesloc

Residual trends

The views expressed here do not necessarily reflect the opinion of the United States Government, the United States Department of Energy, or Lawrence Livermore National Laboratory.