scaling considerations for performance assessments
TRANSCRIPT
P&RA CoP – Savannah River Site - October 2018
Neptune and Company, Inc.
October 2018
Scaling Considerations for
Performance Assessments
P&RA CoP – Savannah River Site - October 2018
Probabilistic Risk Assessments –
Performance Assessments
• Probabilistic risk assessment (PRA) can be used to assess human health risk from exposure to contaminant chemicals
• When the chemicals of concern are radionuclides modeling into the long-term future is often performed
• Long-term human health risks (radioactive doses) can be dominated by radioactive decay and ingrowth of other radionuclides
• Under various regulations/guidance, such PRAs are called Performance Assessments
2
P&RA CoP – Savannah River Site - October 2018 3
P&RA CoP – Savannah River Site - October 2018
Performance Assessments
• Probabilistic modeling encompasses fate and
transport of the radionuclides
• This is followed by an assessment of potential risk
(radioactive dose) to humans
• All inputs to the fate and transport, and risk, models
are probability distributions, and Monte Carlo
simulation is used to evaluate the system
4
P&RA CoP – Savannah River Site - October 2018 5
P&RA CoP – Savannah River Site - October 2018
PRA and Decision Analysis
• PRA modeling is the foundation of performance
assessment (PA) models, per regulations and guidance
provided by NRC (NUREG -1573), EPA (40 CFR 191,
etc.), and DOE (435.1 – perhaps less clear that PRA is
considered necessary)
• PA models can (should) also be embedded in a decision
analysis (DA) that couples the consequences of actions
that might be taken with costs and values that are
associated with those actions
• if DA leads, then models are built to directly support decision
making
6
P&RA CoP – Savannah River Site - October 2018
Performance Assessment Modeling
• Considering the importance and the large costs
involved in PA-supported decisions, the PA models
must be set up properly
• This requires paying attention to both the model
structure and the probability distributions used as
inputs to the model
• The focus of this work is consideration of the input
distributions used in PA models
7
P&RA CoP – Savannah River Site - October 2018
PA Input Distributions –Challenges
• Data/information often exist about specific variables
(parameters), but at a spatial and temporal scale
that is inconsistent with the scales of the PA models
• Without appropriate correction to address scaling,
the input distributions could be termed “garbage in”
for their specific purpose, which then leads to the
consequent “garbage out”
• GIGO has an effect on decision making – that is,
scaling is important for making the best decision
8
P&RA CoP – Savannah River Site - October 2018
Driving My Car to Work
• When I drove to work I looked at the speedometer
and saw that I was going 75 MPH
• Work is 15 miles from home
• Therefore, I got to work in 12 minutes?
9
P&RA CoP – Savannah River Site - October 2018
My Speed Varies
• At other times on my drive to work I might be
driving 30 MPH, “apparently” taking me 30
minutes to get to work.
• At other times I’m stopped at a traffic light – so
“apparently” I never get to work!
• In general, we might have data for a specific
variable, but we cannot apply the distribution of
those data directly to a model UNLESS the spatial
and temporal scales of the data and the model are
the same (and they hardly ever are)
10
P&RA CoP – Savannah River Site - October 2018
Averages and Sums
• In most of our models, we want to know the
average (or sum) of effects across time and space
• Scaling adjustments have to be made otherwise,
apparently, I might never get to work, or I might
get there in 12 minutes
• Scaling mostly needs be done statistically –
because, usually, we do not have the luxury of
collecting more aggregated data for PAs
11
P&RA CoP – Savannah River Site - October 2018
Consequences of Simulation Approach
• Some problems are caused by computational complexity
• In principle I can pick a random number from the momentary
speed distribution (data distribution) at every time step, and I
can average those and find that it takes around 21 minutes to
get to work each day
• However, in our PA models, we mostly choose a random
number at the beginning of time, and use that number
throughout time (same basic principle applies to space)
• If we do that here, and we do not scale then simulations
would suggest that it takes me perhaps 12 minutes to get
to work every day for the next 1,000 years, or that I never
get to work any day in the next 1,000 years
12
P&RA CoP – Savannah River Site - October 2018
Effect on Decision Making
• This could lead to some interesting (bad) decisions
• If I only pull random numbers at the beginning of
time from my data distribution, then my output
distribution for the (average) amount of time it takes
me to get to work would be between 12 minutes and
never (permanently stopped at a traffic light, or
doing 75 MPH all the way, every day, for ever?)
• How, then do I make a decision about how soon
before a 9:00 conference call or meeting do I need
to leave to get to work?
13
P&RA CoP – Savannah River Site - October 2018
Separate Science and Preferences
• I allow 25 minutes – this addresses the average
amount of time it takes (science-model), time to
allow me to set up at work, desire for more sleep,
and my willingness (preference, desire) to risk
missing the beginning of a meeting or phone call
• This incorporates scaling the data to the model –
otherwise I might convince myself I need hours or
days to get there (or longer!)
• It incorporates what I think I know, how unsure I am
about it, and my value preferences, which are
incorporated separately.
14
P&RA CoP – Savannah River Site - October 2018
Basic Approach
• The model is based on “reality”, and
preferences/desires are separated from the model
• I am not adding conservatism to the model to get an
answer – when I am doing 75 MPH, I am not saying if I
call it 50 MPH then I need more time to get to work – I
am saying given my understanding of the system and
my preferences, my optimal decision is to leave home
25 minutes ahead of time
• Arguably perhaps, this is how we think in general
• But apparently not when it comes to PA modeling
• This creates a stakeholder communication problem
• Our stakeholders do not understand this either
15
P&RA CoP – Savannah River Site - October 2018
Mismatched Scale and PA
• Using instantaneous measurements to directly
estimate a long-term or large spatial process is wrong
• Proper spatio-temporal scaling is critical for effective
decision making (otherwise GIGO)
• Data are assembled from disparate sources to inform
dozens and sometimes hundreds of variables
(parameters) in a PA model
• And, there are often models in the PA process that
represent yet different scales (e.g., systems- vs
process-level models)
16
P&RA CoP – Savannah River Site - October 2018
PA Modeling Context
• PA models are run for multiple realizations where a set of
randomly chosen input values are held constant over an
entire model run
Why?
• Computational complexity of choosing new random
numbers every time step
• The PA model is a systems-level model
• Changing values of parameters through time for such
complex models complicates interpretation of the results
(SA, UA, etc.)
• Proper scaling can get the same answers
17
P&RA CoP – Savannah River Site - October 2018
Simulation Study
• Two Options
1) Sampling every year for 100 years (n = 100)
2) Sampling once with 100 yr average (similar to most PAs)
• Three Cases are considered:
Linear response
Non-linear response (convex function)
Multiplicative response
18
P&RA CoP – Savannah River Site - October 2018
Basic Model
• The underlying CSM concerns ants moving U-238 to the
surface over a 100-yr time period:
• Y = W * X * NV / CL * AF * A/V
• Even this simple model is highly multiplicative• Y is the total mass of U-238 moved to the surface in a year (kg/yr)
• W is mass of U-238 in the inventory (kg)
• X is nest density (1/ha)
• NV is nest volume (m3)
• CL is lifespan of a colony (yr)
• AF is fraction of total volume occupied by ant nests in each depth layer
(unitless)
• A is the area of interest (e.g., area covering the waste zone) (ha)
• V is the volume of interest (e.g., area x depth impacted by ant nests) (m3)
19
P&RA CoP – Savannah River Site - October 2018
Simulation Study
• We held most of these inputs constant so that the
basic model is of the form:
Y = c * W * X
• Model built in GoldSim, but reproduced in R for
verification
• For the simple Linear response case, W
(inventory) is also held constant – this is Case 1
20
P&RA CoP – Savannah River Site - October 2018
Simulation Study
21
P&RA CoP – Savannah River Site - October 2018 22
Ant
Transport
model
P&RA CoP – Savannah River Site - October 2018
Simulation Study• For the linear response case the results are the same
for the 2 different Cases
• Mathematically, what’s going on is:
• Option 1 (simulate every year, assume independence):
E(Z) = 100cwE(X), VAR(Z) = c2w2VAR(SX)
• Option 2 (simulate average at beginning of time):
E(Z) = 100cwE(X), VAR(Z) = 1002c2w2VAR(SX/100)
• These are the same (just move the 100 around)
• When everything is simple, averaging works for scaling
23
P&RA CoP – Savannah River Site - October 2018
Simulation Study
24
Option 1
Option 2
P&RA CoP – Savannah River Site - October 2018
Simulation Study
• Case 2 – a univariate non-linear response:
• Y = c * w * X2
• Ants somehow act synergistically – they want to
“win”, so they all work harder
• The point really is that there are many non-linear
functions in our PA models – this investigates a
relatively simple non-linear model
25
P&RA CoP – Savannah River Site - October 2018
Simulation Study• For the univariate non-linear response case the
results are different for the two options:
• That is, if we scale X and then apply the non-linear
function, we do not match results
• If scaling is performed at all, it is usually performed
by simple averaging of the input distribution
• If this is done for a non-linear function then the results
do not match the intent – the scaling has been done
incorrectly
• This approach gives the wrong answers
• Example…
26
P&RA CoP – Savannah River Site - October 2018 27
Simulated sampling distributions of the expected value of the sum of the general process for Case 2
Option 1 (1yr)
Option 2 (100yr)
P&RA CoP – Savannah River Site - October 2018 28
Simulated sampling distributions of the variance of the sum of the general process for Case 2
Option 1 (1yr)
Option 2 (100yr)
P&RA CoP – Savannah River Site - October 2018
Simulation Study• Without getting into a lot of math (see the report), the
issue is that we need (E is the expectation operator):
E(f(X)) instead of f(E(X))
• The latter averages first, and then applies the function
– this gives the wrong result
• For former applies the function first, and then
averages – this matches the intent (the right result)
• This might not be as easy to implement directly
• might require external generation of scaled distributions for
PA models
• If X is normal, then f(X) needs to be simulated, and then
averaging needs to be applied to that simulation
29
P&RA CoP – Savannah River Site - October 2018
Simulation Study
• Case 3:
• Y = c * W * X
• Simple multiplicative model by changing inventory
to a variable
• Assume W, X independent
30
P&RA CoP – Savannah River Site - October 2018
Simulation Study• For the multiplicative response case the results are
different for the two options
• Following the non-linear case, we might want:
• E(f(X, W))
• However, W and X will usually require different
scaling, so this is not possible (e.g., spatial vs.
temporal scaling)
• We can get E(f(X))*E(g(W))
• Where f() and g() are linear in this example
• For this multiplicative model, for Options 1 and 2, this
gives the same results for the means, but does not
give the same results for the variance31
P&RA CoP – Savannah River Site - October 2018 32
Simulated sampling distributions of the variance of sum of the general process for Case 3
Option 1 (1yr)
Option 2 (100yr)
P&RA CoP – Savannah River Site - October 2018
Summary
33
• Scaling impacts are seen in these simple examples
• PA models are much more complex
• When the system is linear, additive and stationary,
then upscaling = averaging
• This is not the case otherwise
• The effects (differences) could impact decision making
• Scaling is not always obvious – it requires thinking
through inputs and responses to understand how it
should be applied for a specific variable.
P&RA CoP – Savannah River Site - October 2018
Next Steps
34
• Different options for scaling distributions have impacts
on the averages and variance of risk endpoints of
interest
• These impacts depend on the relationship of the
response to explanatory variables
• Try other input distribution types (examples above used
normal distributions as inputs)
• Try more complicated functions such as differential
equations
• Try larger models (e.g., complete the entirety of the ant
example probabilistically, or try van Genuchten)