scaling considerations for performance assessments

P&RA CoP – Savannah River Site - October 2018

Neptune and Company, Inc.

October 2018

Scaling Considerations for

Performance Assessments


Probabilistic Risk Assessments –


• Probabilistic risk assessment (PRA) can be used to assess human health risk from exposure to contaminant chemicals

• When the chemicals of concern are radionuclides modeling into the long-term future is often performed

• Long-term human health risks (radioactive doses) can be dominated by radioactive decay and ingrowth of other radionuclides

• Under various regulations/guidance, such PRAs are called Performance Assessments

2

P&RA CoP – Savannah River Site - October 2018 3



• Probabilistic modeling encompasses fate and

transport of the radionuclides

• This is followed by an assessment of potential risk

(radioactive dose) to humans

• All inputs to the fate and transport, and risk, models

are probability distributions, and Monte Carlo

simulation is used to evaluate the system

4


PRA and Decision Analysis

• PRA modeling is the foundation of performance

assessment (PA) models, per regulations and guidance

provided by NRC (NUREG -1573), EPA (40 CFR 191,

etc.), and DOE (435.1 – perhaps less clear that PRA is

considered necessary)

• PA models can (should) also be embedded in a decision

analysis (DA) that couples the consequences of actions

that might be taken with costs and values that are

associated with those actions

• if DA leads, then models are built to directly support decision

making

6


Performance Assessment Modeling

• Considering the importance and the large costs

involved in PA-supported decisions, the PA models

must be set up properly

• This requires paying attention to both the model

structure and the probability distributions used as

inputs to the model

• The focus of this work is consideration of the input

distributions used in PA models

7


PA Input Distributions –Challenges

• Data/information often exist about specific variables

(parameters), but at a spatial and temporal scale

that is inconsistent with the scales of the PA models

• Without appropriate correction to address scaling,

the input distributions could be termed “garbage in”

for their specific purpose, which then leads to the

consequent “garbage out”

• GIGO has an effect on decision making – that is,

scaling is important for making the best decision

8


Driving My Car to Work

• When I drove to work I looked at the speedometer

and saw that I was going 75 MPH

• Work is 15 miles from home

• Therefore, I got to work in 12 minutes?

9


My Speed Varies

• At other times on my drive to work I might be

driving 30 MPH, “apparently” taking me 30

minutes to get to work.

• At other times I’m stopped at a traffic light – so

“apparently” I never get to work!

• In general, we might have data for a specific

variable, but we cannot apply the distribution of

those data directly to a model UNLESS the spatial

and temporal scales of the data and the model are

the same (and they hardly ever are)

10


Averages and Sums

• In most of our models, we want to know the

average (or sum) of effects across time and space

• Scaling adjustments have to be made otherwise,

apparently, I might never get to work, or I might

get there in 12 minutes

• Scaling mostly needs be done statistically –

because, usually, we do not have the luxury of

collecting more aggregated data for PAs

11


Consequences of Simulation Approach

• Some problems are caused by computational complexity

• In principle I can pick a random number from the momentary

speed distribution (data distribution) at every time step, and I

can average those and find that it takes around 21 minutes to

get to work each day

• However, in our PA models, we mostly choose a random

number at the beginning of time, and use that number

throughout time (same basic principle applies to space)

• If we do that here, and we do not scale then simulations

would suggest that it takes me perhaps 12 minutes to get

to work every day for the next 1,000 years, or that I never

get to work any day in the next 1,000 years

12


Effect on Decision Making

• This could lead to some interesting (bad) decisions

• If I only pull random numbers at the beginning of

time from my data distribution, then my output

distribution for the (average) amount of time it takes

me to get to work would be between 12 minutes and

never (permanently stopped at a traffic light, or

doing 75 MPH all the way, every day, for ever?)

• How, then do I make a decision about how soon

before a 9:00 conference call or meeting do I need

to leave to get to work?

13


Separate Science and Preferences

• I allow 25 minutes – this addresses the average

amount of time it takes (science-model), time to

allow me to set up at work, desire for more sleep,

and my willingness (preference, desire) to risk

missing the beginning of a meeting or phone call

• This incorporates scaling the data to the model –

otherwise I might convince myself I need hours or

days to get there (or longer!)

• It incorporates what I think I know, how unsure I am

about it, and my value preferences, which are

incorporated separately.

14


Basic Approach

• The model is based on “reality”, and

preferences/desires are separated from the model

• I am not adding conservatism to the model to get an

answer – when I am doing 75 MPH, I am not saying if I

call it 50 MPH then I need more time to get to work – I

am saying given my understanding of the system and

my preferences, my optimal decision is to leave home

25 minutes ahead of time

• Arguably perhaps, this is how we think in general

• But apparently not when it comes to PA modeling

• This creates a stakeholder communication problem

• Our stakeholders do not understand this either

15


Mismatched Scale and PA

• Using instantaneous measurements to directly

estimate a long-term or large spatial process is wrong

• Proper spatio-temporal scaling is critical for effective

decision making (otherwise GIGO)

• Data are assembled from disparate sources to inform

dozens and sometimes hundreds of variables

(parameters) in a PA model

• And, there are often models in the PA process that

represent yet different scales (e.g., systems- vs

process-level models)

16


PA Modeling Context

• PA models are run for multiple realizations where a set of

randomly chosen input values are held constant over an

entire model run

Why?

• Computational complexity of choosing new random

numbers every time step

• The PA model is a systems-level model

• Changing values of parameters through time for such

complex models complicates interpretation of the results

(SA, UA, etc.)

• Proper scaling can get the same answers

17


Simulation Study

• Two Options

1) Sampling every year for 100 years (n = 100)

2) Sampling once with 100 yr average (similar to most PAs)

• Three Cases are considered:

Linear response

Non-linear response (convex function)

Multiplicative response

18


Basic Model

• The underlying CSM concerns ants moving U-238 to the

surface over a 100-yr time period:

• Y = W * X * NV / CL * AF * A/V

• Even this simple model is highly multiplicative• Y is the total mass of U-238 moved to the surface in a year (kg/yr)

• W is mass of U-238 in the inventory (kg)

• X is nest density (1/ha)

• NV is nest volume (m3)

• CL is lifespan of a colony (yr)

• AF is fraction of total volume occupied by ant nests in each depth layer

(unitless)

• A is the area of interest (e.g., area covering the waste zone) (ha)

• V is the volume of interest (e.g., area x depth impacted by ant nests) (m3)

19


Simulation Study

• We held most of these inputs constant so that the

basic model is of the form:

Y = c * W * X

• Model built in GoldSim, but reproduced in R for

verification

• For the simple Linear response case, W

(inventory) is also held constant – this is Case 1

20


Simulation Study

21


Ant

Transport

model


Simulation Study• For the linear response case the results are the same

for the 2 different Cases

• Mathematically, what’s going on is:

• Option 1 (simulate every year, assume independence):

E(Z) = 100cwE(X), VAR(Z) = c2w2VAR(SX)

• Option 2 (simulate average at beginning of time):

E(Z) = 100cwE(X), VAR(Z) = 1002c2w2VAR(SX/100)

• These are the same (just move the 100 around)

• When everything is simple, averaging works for scaling

23


Simulation Study

24

Option 1

Option 2


Simulation Study

• Case 2 – a univariate non-linear response:

• Y = c * w * X2

• Ants somehow act synergistically – they want to

“win”, so they all work harder

• The point really is that there are many non-linear

functions in our PA models – this investigates a

relatively simple non-linear model

25


Simulation Study• For the univariate non-linear response case the

results are different for the two options:

• That is, if we scale X and then apply the non-linear

function, we do not match results

• If scaling is performed at all, it is usually performed

by simple averaging of the input distribution

• If this is done for a non-linear function then the results

do not match the intent – the scaling has been done

incorrectly

• This approach gives the wrong answers

• Example…

26


Simulated sampling distributions of the expected value of the sum of the general process for Case 2

Option 1 (1yr)

Option 2 (100yr)


Simulated sampling distributions of the variance of the sum of the general process for Case 2

Option 1 (1yr)

Option 2 (100yr)


Simulation Study• Without getting into a lot of math (see the report), the

issue is that we need (E is the expectation operator):

E(f(X)) instead of f(E(X))

• The latter averages first, and then applies the function

– this gives the wrong result

• For former applies the function first, and then

averages – this matches the intent (the right result)

• This might not be as easy to implement directly

• might require external generation of scaled distributions for

PA models

• If X is normal, then f(X) needs to be simulated, and then

averaging needs to be applied to that simulation

29


Simulation Study

• Case 3:

• Y = c * W * X

• Simple multiplicative model by changing inventory

to a variable

• Assume W, X independent

30


Simulation Study• For the multiplicative response case the results are

different for the two options

• Following the non-linear case, we might want:

• E(f(X, W))

• However, W and X will usually require different

scaling, so this is not possible (e.g., spatial vs.

temporal scaling)

• We can get E(f(X))*E(g(W))

• Where f() and g() are linear in this example

• For this multiplicative model, for Options 1 and 2, this

gives the same results for the means, but does not

give the same results for the variance31


Simulated sampling distributions of the variance of sum of the general process for Case 3

Option 1 (1yr)

Option 2 (100yr)


Summary

33

• Scaling impacts are seen in these simple examples

• PA models are much more complex

• When the system is linear, additive and stationary,

then upscaling = averaging

• This is not the case otherwise

• The effects (differences) could impact decision making

• Scaling is not always obvious – it requires thinking

through inputs and responses to understand how it

should be applied for a specific variable.


Next Steps

34

• Different options for scaling distributions have impacts

on the averages and variance of risk endpoints of

interest

• These impacts depend on the relationship of the

response to explanatory variables

• Try other input distribution types (examples above used

normal distributions as inputs)

• Try more complicated functions such as differential

equations

• Try larger models (e.g., complete the entirety of the ant

example probabilistically, or try van Genuchten)

scaling considerations for performance assessments

Documents