triad-friendly approaches to data collection design · 2005-08-31 · alternative triad-friendly...

Triad-Friendly Approaches to Data Collection Design

Alternative Triad-Friendly Approaches

– Collaborative Data Sets• Weight-of-evidence approaches• Using lower analytical quality data for search, higher

analytical quality data for population characterization• Using lower analytical quality data for search, and higher

analytical quality data sets for clarification and as QA/QC• Blending two different data sets statistically

– Dynamic Data Collection Programs• Multi-increment sampling and adaptive compositing

strategies• Sequential probability ratio test/Barnard’s t test• Adaptive cluster sampling• GeoBayesian approaches

3

Costlier/rigorous (lab? field? std? non-std?) analytical methods

Cheaper/rapid (lab? field? std? non-std?) analytical methods

Targeted high density sampling Low DL + analyte specificity

Manages CSM & sampling uncertainty

Manages analytical uncertainty

Collaborative data sets complement each other so all sources of data uncertainty are managed; when using either kind of data alone will not

produce reliable information.

A Second-Generation Data Quality Model (for Heterogeneous Matrices)

Collaborative Data Sets

Ex 1Ex 2Ex 3

Remedy: remove hot spots

1980’s Paradigm

$ $ $

$ $ $

¢ ¢¢ ¢¢ ¢¢

Fixed Lab Analytical

Uncertainty

Sampling Uncertainty

Ex 1

$ $ $

$ $ $

Decreased Sampling Variability after Removal of Hotspots

Fixed Lab Data Ex 3Rapid Analytical

Data

Sampling Uncertainty Controlled through Increased Sampling Density

to Segregate Populations

Ex 2

¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢

¢ ¢¢ ¢

Triad

Traditional Paradigm vs. Triad

CSM & cleanup incomplete; repeat as needed

DONE

Collaborative Data Sets• Collaborative data sets:

– refer to using data from multiple sources to support decision-making.

– can be used to support static or dynamic work strategies.– Usually include a relatively small number of high quality (but

expensive) analytics, and a larger number of lower quality (but much cheaper) analytics.

• Recall what “lower quality” means:– Higher detection limits (perhaps even higher than cleanup

requirement), and/or– Greater variability in analytical results, and/or– Greater potential for interferences and bias, and/or– Measuring something that’s different from, but linked to, the true

parameter of concern.

Collaborative Data Sets and Weight of Evidence

• Locate and remove buried waste pits.

• Collaborative data sets:– Historical air photos– Non-intrusive geophysics– Passive soil gas analysis– Limited intrusive GeoProbe

sampling (~40 locations)• Alternative: using hot spot

approach, 137 intrusive sampling locations.

Search and Stratification Using Collaborative Data

• Cheaper, lower quality analytical data identifies areas of concern.

• More expensive, higher analytical quality data provides more definitive information about population characteristic (e.g., average contaminant concentration).

• Only requirement for cheaper technique is that it has sufficient detection capabilities to identify areas that would be of concern (not necessarily below cleanup requirements).

Quantitatively Blending Data for Population Mean Estimation

• Assumptions:– data are unbiased, – detection limits

below requirements, – linear correlation

exists,– Goal is to estimate

average concentration level.

• Question: What’s the best combination of two methods?

Collaborative Data in a Clarifying Role

• Linear regressions often don’t work.

• Outlier problems.• Non-linear

relationships.• Non-detects.• Result: two data

sets cannot be merged quantitatively.

Th230 vs Gross Activity

R2 = 0.5677

0

100

200

300

400

500

600

700

10000 14000 18000 22000 26000 30000 34000

Gross Activity (cpm)

Th23

0 (p

Ci/g

)

Non-Parametric Techniques Work Well for Establishing Relationships

• Decision focus is yes/no.• Heavy lifting done by

cheaper analytical technique.

• More definitive methods clarify inconclusive results.

• Value depends on:– strength of the relationship

between two techniques, and– the spatial distribution of

contamination.

Re latio ns hip B e twe e n Gamma Walkve r Data and Tho rium-230

0

0.2

0.4

0.6

0.8

1

10K-16K 16K-20K 20K +

C o unts pe r M inute (x 1 0 0 0 )

4/ 1266

12/ 1634/ 40

Example: Hot Spot Search and the Scattered Site

• Looking for hot spots (1,500 over 900 square feet)

• Same hot spot search strategy as described before.

• Two methods:– Real-time method that is

¼ the cost used at each point.

– Standard method used to clarify uncertainty.

• Non-parametric relationship between two methods.

Collaborative Data Set Performance

• 172 real-time measurements and 9 follow-up standard analyses.

• Same conclusions as traditional method.

• Cost savings of 70%.

Multi-Increment Sampling Can Control Spatial Heterogeneity

• Multi-increment composites can be a very effective tool both in search and population characterization for stretching budget.

• Multi-increment sampling reduces the effects of spatial variability on sampling uncertainty.

• Multi-increment sampling can be used to address both short scale spatial variability (e.g., for searching) as well as longer scale spatial variability (e.g., determining decision unit means).

Multi-Increment Sampling Applied to the Scattered Site

• Looking for hot spots (1,500 over 100 square yards)

• Same hot spot search strategy as described before.

• 7-point increments applied to each grid node.

7-Increment Sampling Improves Performance

• 100% of problem areas identified.

• Incorrectly called 2% of “clean” locations “contaminated”.

• Missed 45% of contaminated locations.

• “Compositing”resulted in more hits!

Adaptive Compositing Strategies Can Reduce Sampling Costs

• Applicable when action level is significantly greater than background levels.

• Aggregate samples (single or multi-increment) into larger composites.

• Develop investigation levels for larger composites that indicate when analyses of contributing multi-increment samples are necessary.

Adaptive Compositing Strategies and the Scattered Site

• Goal: Identify hot spots• Background is 700 ppm,

action level 1,500 ppm• Four multi-increment

sample composites• Investigation level is 900

ppm• Results:

– same performance as multi-increment sampling

– Analytical costs reduced by ~50%.

Adaptive Composite Strategies Using Multi-Increment Sampling and Collaborative Data Sets

• Brings it all together:– Multi-increment sampling– Adaptive compositing strategies– Collaborative data sets.

• Same performance from an error perspective as straight multi-increment sampling for scattered site hot spot example.

• Upshot is that this brings further cost-reductions to the table (~80% over single sample, standard analytics hot spot search for the scattered site).

Sequential Probability Ratio Test

• Adaptively addresses population characterization (e.g., estimating the mean).

• Two flavors, depending on whether sample result variability is known ahead of time or not.

• If not, a minimum of 10 samples required.• Samples are distributed across an area in a

manner that leads to “even” coverage.• Tests can be run at any point in time to

determine if requirements have been met.

Adaptive Cluster Sampling• Used to delineate boundaries of contamination

within a decision unit, while also providing information about the average contamination level for the unit.

• Grid laid over decision unit. An initial # of samples determined and systematically distributed. Sampling takes place.

• If a result is above the requirement, adjacent grid nodes are then sampled and analyzed. This continues until the final round of samples yields no exceedances.

Adaptive GeoBayesian Approaches

• Used for searching and boundary delineation (yes/no sample results).

• Spatial autocorrelation explicitly addressed.

• Can roll in “soft” information using Bayesian techniques.

• Explicitly addresses decision errors.

Example: Surface Contamination Event

Terrain Contour Lines

Road

Road

Waste Lagoon

Utility Bldg.

•Surface soil contamination problem.

•Resulted from spillage from the lagoon.

•7,940 sq m actually contaminated, an area unknown to the responsible party.

•Soft information available for the site includes:

•Slope of land;

•Location of barriers to flow;

•Location of source.

•Owner will remediate anything with greater than 20% chance of being contaminated.

Initial Conceptual Site Model• Based on soft information, assign probability of contamination being present.

• Map shows this CSM pictorially, along with the boundary that captures the everything with a probability > 20% based on the CSM.

• This CSM drives subsequent sampling decisions and becomes an important point of concurrence for stakeholders.

Sampling Progression

• Samples are collected sequentially and analyzed with an appropriate “real-time”method.

• CSM updated with current sampling results.

• CSM drives subsequent sample location selection.

• In this example, locations are selected to maximize the area with less than 0.2 probability of contamination.

10 20 30 400

10

20

30

40

50

60

70

80

90

100

50Number of Samples

% of Volume

Clean

Uncertain

Contaminated

Classification of Soils at 80% Certainty Level

Sampling Can Continue Until Goals are Achieved

Adaptive Sampling Realities

• Strength is ability to modify sampling program to fit reality as it unfolds.

• This makes answering the question of “How many samples?” harder.

• Requires flexible contracting mechanisms and careful budget forecasting to be effective.

triad-friendly approaches to data collection design · 2005-08-31 · alternative triad-friendly...

Documents