triad-friendly approaches to data collection design · 2005-08-31 · alternative triad-friendly...
TRANSCRIPT
Triad-Friendly Approaches to Data Collection Design
Alternative Triad-Friendly Approaches
– Collaborative Data Sets• Weight-of-evidence approaches• Using lower analytical quality data for search, higher
analytical quality data for population characterization• Using lower analytical quality data for search, and higher
analytical quality data sets for clarification and as QA/QC• Blending two different data sets statistically
– Dynamic Data Collection Programs• Multi-increment sampling and adaptive compositing
strategies• Sequential probability ratio test/Barnard’s t test• Adaptive cluster sampling• GeoBayesian approaches
3
Costlier/rigorous (lab? field? std? non-std?) analytical methods
Cheaper/rapid (lab? field? std? non-std?) analytical methods
Targeted high density sampling Low DL + analyte specificity
Manages CSM & sampling uncertainty
Manages analytical uncertainty
Collaborative data sets complement each other so all sources of data uncertainty are managed; when using either kind of data alone will not
produce reliable information.
A Second-Generation Data Quality Model (for Heterogeneous Matrices)
Collaborative Data Sets
Ex 1Ex 2Ex 3
Remedy: remove hot spots
1980’s Paradigm
$ $ $
$ $ $
¢ ¢¢ ¢¢ ¢¢
Fixed Lab Analytical
Uncertainty
Sampling Uncertainty
Ex 1
$ $ $
$ $ $
Decreased Sampling Variability after Removal of Hotspots
Fixed Lab Data Ex 3Rapid Analytical
Data
Sampling Uncertainty Controlled through Increased Sampling Density
to Segregate Populations
Ex 2
¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢
¢ ¢¢ ¢
Triad
Traditional Paradigm vs. Triad
CSM & cleanup incomplete; repeat as needed
DONE
Collaborative Data Sets• Collaborative data sets:
– refer to using data from multiple sources to support decision-making.
– can be used to support static or dynamic work strategies.– Usually include a relatively small number of high quality (but
expensive) analytics, and a larger number of lower quality (but much cheaper) analytics.
• Recall what “lower quality” means:– Higher detection limits (perhaps even higher than cleanup
requirement), and/or– Greater variability in analytical results, and/or– Greater potential for interferences and bias, and/or– Measuring something that’s different from, but linked to, the true
parameter of concern.
Collaborative Data Sets and Weight of Evidence
• Locate and remove buried waste pits.
• Collaborative data sets:– Historical air photos– Non-intrusive geophysics– Passive soil gas analysis– Limited intrusive GeoProbe
sampling (~40 locations)• Alternative: using hot spot
approach, 137 intrusive sampling locations.
Search and Stratification Using Collaborative Data
• Cheaper, lower quality analytical data identifies areas of concern.
• More expensive, higher analytical quality data provides more definitive information about population characteristic (e.g., average contaminant concentration).
• Only requirement for cheaper technique is that it has sufficient detection capabilities to identify areas that would be of concern (not necessarily below cleanup requirements).
Quantitatively Blending Data for Population Mean Estimation
• Assumptions:– data are unbiased, – detection limits
below requirements, – linear correlation
exists,– Goal is to estimate
average concentration level.
• Question: What’s the best combination of two methods?
Collaborative Data in a Clarifying Role
• Linear regressions often don’t work.
• Outlier problems.• Non-linear
relationships.• Non-detects.• Result: two data
sets cannot be merged quantitatively.
Th230 vs Gross Activity
R2 = 0.5677
0
100
200
300
400
500
600
700
10000 14000 18000 22000 26000 30000 34000
Gross Activity (cpm)
Th23
0 (p
Ci/g
)
Non-Parametric Techniques Work Well for Establishing Relationships
• Decision focus is yes/no.• Heavy lifting done by
cheaper analytical technique.
• More definitive methods clarify inconclusive results.
• Value depends on:– strength of the relationship
between two techniques, and– the spatial distribution of
contamination.
Re latio ns hip B e twe e n Gamma Walkve r Data and Tho rium-230
0
0.2
0.4
0.6
0.8
1
10K-16K 16K-20K 20K +
C o unts pe r M inute (x 1 0 0 0 )
4/ 1266
12/ 1634/ 40
Example: Hot Spot Search and the Scattered Site
• Looking for hot spots (1,500 over 900 square feet)
• Same hot spot search strategy as described before.
• Two methods:– Real-time method that is
¼ the cost used at each point.
– Standard method used to clarify uncertainty.
• Non-parametric relationship between two methods.
Collaborative Data Set Performance
• 172 real-time measurements and 9 follow-up standard analyses.
• Same conclusions as traditional method.
• Cost savings of 70%.
Multi-Increment Sampling Can Control Spatial Heterogeneity
• Multi-increment composites can be a very effective tool both in search and population characterization for stretching budget.
• Multi-increment sampling reduces the effects of spatial variability on sampling uncertainty.
• Multi-increment sampling can be used to address both short scale spatial variability (e.g., for searching) as well as longer scale spatial variability (e.g., determining decision unit means).
Multi-Increment Sampling Applied to the Scattered Site
• Looking for hot spots (1,500 over 100 square yards)
• Same hot spot search strategy as described before.
• 7-point increments applied to each grid node.
7-Increment Sampling Improves Performance
• 100% of problem areas identified.
• Incorrectly called 2% of “clean” locations “contaminated”.
• Missed 45% of contaminated locations.
• “Compositing”resulted in more hits!
Adaptive Compositing Strategies Can Reduce Sampling Costs
• Applicable when action level is significantly greater than background levels.
• Aggregate samples (single or multi-increment) into larger composites.
• Develop investigation levels for larger composites that indicate when analyses of contributing multi-increment samples are necessary.
Adaptive Compositing Strategies and the Scattered Site
• Goal: Identify hot spots• Background is 700 ppm,
action level 1,500 ppm• Four multi-increment
sample composites• Investigation level is 900
ppm• Results:
– same performance as multi-increment sampling
– Analytical costs reduced by ~50%.
Adaptive Composite Strategies Using Multi-Increment Sampling and Collaborative Data Sets
• Brings it all together:– Multi-increment sampling– Adaptive compositing strategies– Collaborative data sets.
• Same performance from an error perspective as straight multi-increment sampling for scattered site hot spot example.
• Upshot is that this brings further cost-reductions to the table (~80% over single sample, standard analytics hot spot search for the scattered site).
Sequential Probability Ratio Test
• Adaptively addresses population characterization (e.g., estimating the mean).
• Two flavors, depending on whether sample result variability is known ahead of time or not.
• If not, a minimum of 10 samples required.• Samples are distributed across an area in a
manner that leads to “even” coverage.• Tests can be run at any point in time to
determine if requirements have been met.
Adaptive Cluster Sampling• Used to delineate boundaries of contamination
within a decision unit, while also providing information about the average contamination level for the unit.
• Grid laid over decision unit. An initial # of samples determined and systematically distributed. Sampling takes place.
• If a result is above the requirement, adjacent grid nodes are then sampled and analyzed. This continues until the final round of samples yields no exceedances.
Adaptive GeoBayesian Approaches
• Used for searching and boundary delineation (yes/no sample results).
• Spatial autocorrelation explicitly addressed.
• Can roll in “soft” information using Bayesian techniques.
• Explicitly addresses decision errors.
Example: Surface Contamination Event
Terrain Contour Lines
Road
Road
Waste Lagoon
Utility Bldg.
•Surface soil contamination problem.
•Resulted from spillage from the lagoon.
•7,940 sq m actually contaminated, an area unknown to the responsible party.
•Soft information available for the site includes:
•Slope of land;
•Location of barriers to flow;
•Location of source.
•Owner will remediate anything with greater than 20% chance of being contaminated.
Initial Conceptual Site Model• Based on soft information, assign probability of contamination being present.
• Map shows this CSM pictorially, along with the boundary that captures the everything with a probability > 20% based on the CSM.
• This CSM drives subsequent sampling decisions and becomes an important point of concurrence for stakeholders.
Sampling Progression
• Samples are collected sequentially and analyzed with an appropriate “real-time”method.
• CSM updated with current sampling results.
• CSM drives subsequent sample location selection.
• In this example, locations are selected to maximize the area with less than 0.2 probability of contamination.
10 20 30 400
10
20
30
40
50
60
70
80
90
100
50Number of Samples
% of Volume
Clean
Uncertain
Contaminated
Classification of Soils at 80% Certainty Level
Sampling Can Continue Until Goals are Achieved
Adaptive Sampling Realities
• Strength is ability to modify sampling program to fit reality as it unfolds.
• This makes answering the question of “How many samples?” harder.
• Requires flexible contracting mechanisms and careful budget forecasting to be effective.