storm damage modeling at the university of connecticut
TRANSCRIPT
Brian Hartman, PhD ASA
31 January 2014.0 (but I don’t care, it doesn’t matter)
Workshop on Insurance Mathematics
Quebec, QC
Storm Damage Modeling at the
University of Connecticut
Acknowledgements
The UConn DPM team
Manos Anagnostou
Marina Astitha
Maria Frediani
David Wanik
Xinxuan Zheng
Alex Baldenko
Tiran Chen
Jichao He
and many others . . .
The Utilities
Northeast Utilities
CL&P
WMECO
NSTAR
Others soon to join
Outline
1. Introduction/Motivation of the Project
2. Data Description
3. Model Description
4. Case Studies
5. Current/Future Work
Motivation for the Project Major recent CT storm events
Tropical Storm Irene (August 2011)
October Nor’easter (October 2011)
Hurricane Sandy (October 2012)
Consequences
Massive long outages
Series of investigative reports (Witt Associates, Two Storm Panel, Davies Consultants)
CL&P President Jeff Butler resignation
Performance standards and penalties for electric restoration times (CT and MA Attorney General)
1. Introduction/Motivation
Comparison of Extreme Storm Outages
Storm Irene
671,000 affected
11 days
October Nor’easter
803,000 affected
9 days
Hurricane Sandy
500,000 affected
10 days
1. Introduction/Motivation
What is a “trouble spot?”
A trouble spot is an extended interruption of service
(>5 minutes).
Usually requires human intervention
Recorded at the nearest isolating device (transformer, fuse,
recloser, or switch)
2. Data Description
CL&P Service Territories
(1.2M customers)
Bozrah L&P
Norwich Public
Utilities
United Illuminating
2. Data Description
Self-reported damage causes
CL&P 2004-Present
Cause Percentage
TREE-Tree Related 89%
LTNG-Lightning 6%
EQUP-Equip Failure 3%
Animals/Birds 1%
VHCL-Veh Accident 0.5%
VAND-Vandalism <0.5%
2. Data Description
Weather Forecasts
Simulations initialized with
Global Forecast System
(GFS) analysis fields at 0.5
degrees, with 6 hour
intervals.
Blue domain (18 km grid),
red domain (6 km), green
domain (2 km)
2. Data Description
Some examples of possible covariates
Physical Weather (GFS)
Wind speed (@10m)
Wind gust
Rainfall
# hours wind > 20 mph
Met Indices (GFS)
CAPE (instability)
CIN (required energy)
BRN (free/forced
convection)
Geographic Characteristics
(public sources)
Household income
Tree density
Population density
Leaf status
System Characteristics (NU)
Pole count (offset)
Tree trimming (std/enhncd)
2. Data Description
Processing covariates
To summarize the time series generated in the weather
forecasts we:
1. Use a 4 hour running window and find the maximum mean
wind speed.
2. At that same window, calculate both the mean and max of
each physical weather characteristic and meteorological
index over each town.
Candidate Models
We are currently looking at a few models for the damage
Poisson GLM
Negative Binomial GLM
Zero-inflated Poisson
Zero-inflated Negative Binomial
Decision/Regression Trees
Variable Selection
All told, we have about 90 potential covariates.
How to choose which to keep?
Whenever doing variable selection, first ask “Why am I
making this model?”
For the utilities, their main concern is storm-wise prediction
Town-wise accuracy is secondary
3. Model Description
Predictive Variable Selection
Because of that goal, we choose a model based on the hold-
out predictive accuracy.
1. Choose one storm to hold out
2. Fit the model using the other storms with the same leaf
status.
3. Compute some measure of predictive accuracy (MAPE,
MSPE, lift, 𝑅𝑝𝑟𝑒𝑑2 , GINI).
4. Repeat for all storms and all models
3. Model Description
Computational Difficulties
Say each model takes 1 second to predict all storms (70
storms).
Because each variable can either be in or out of the model,
we have 290 = 1.24 × 1027 possible models.
That would take approximately 3.93 × 1019 years, so even
with parallel processing you are not going to crack that nut
This is even assuming that the count and zero models include
the same covariates
3. Model Description
Variable Selection Steps
1. Prior to hitting it with the heavy computational hammer,
remove other variables by talking with experts, for
example
a. Meteorological Parameters
b. Snow (for non-winter storms)
2. Choose a good set of 15 possibilities from the remaining
variables
3. Continue adjusting the possibility set until the model starts
to converge.
3. Model Description
Current Model
Current best model: zero-inflated Poisson including
Wind speed at 10 meters
Wind gust
Percent of lines trimmed in the last year
Interaction between division and wind speed at 10 meters
Interaction between division and wind gust
Interaction between precipitation rate and wind speed at 10
meters
3. Model Description
Case Study: Hurricane Sandy
48 hours prior
Prediction: 11,825
Actual: 14,325
24 hours prior
Prediction: 13,330
4a. Case Study: Hurricane Sandy
What really happened
We organized a large portion of the data in
September/October 2012.
We had our first model in late October
no real variable selection/model testing yet
Variables chosen by a single stepwise AIC
Sandy happened in late October.
Using that model, we predicted 52,000 trouble spots
(basically Armageddon)
4a. Case Study: Hurricane Sandy
What really happened cont.
So I (on my desktop, unfortunately) made a really simple
model (wind speed and tree trimming).
No statistical variable selection
No meteorological expertise
Hardly anything more than dumb luck
The utility was very excited by these rather accurate
predictions
4a. Case Study: Hurricane Sandy
Case Study: Summer 2013
We had 5 storms in May and June of this year where NU
wanted us to run the model.
Each time, we predicted 300-600 trouble spots and the
actuals were 75-150.
Why were we over-predicting?
4b. Case Study: Summer 2013
Reasons for over-prediction
Two reasons:
1. Our database consisted entirely of “major storms” which are
classified as having more than 300 trouble spots.
We first showed them this problem 6 months previous.
“Why do you want data on damage when there is not a storm, but a
few days before you think there might be?”
2. Our weather forecasts were consistently over-predicting the
wind speed
4b. Case Study: Summer 2013
Wind speed comparison
Worse in the peaks,
where most of the
damage occurs.
4b. Case Study: Summer 2013
Current Work
1. Clustering the towns
a. Separate
b. Hierarchical
c. Nonparametric Bayes
2. High density (Lidar) data (trimming, HazPix)
5. Current/Future Work
Clustering
Different towns are going to respond in different ways to the
storms.
Areas with underground power lines will be less affected by the
wind
Areas on the coast vs. areas inland
We group the towns into clusters and then fit a separate
model on each cluster.
5. Current/Future Work
Models on clustered data
Once we have the towns clustered, we have a few options
1. Model each cluster individually
2. Borrow strength using a hierarchical structure.
3. We could combine the clustering and model-fitting into a
single step using Bayesian nonparametrics.
5. Current/Future Work
Lidar (Light Detection and Ranging)
High resolution height measurements
(1-9 feet2)
Comparison studies between
Lidar
GIS system map
Observation
Can model on the isolating device level
knowing the proportion of each line
where trees are threatening
5. Current/Future Work
Trees, Weather and Infrastructure
Low HazPix/Asset Medium HazPix/Asset High HazPix/Asset
TS
/Asset
<15 hrs >15 hrs <15 hrs >15 hrs <15 hrs >15 hrs
TS/Asset for various durations of hazardous winds (20 mph) by HazPix/asset
Figure 3: HazPix across Eastern CT
Figure 4: HazPix on distribution ckt