kostas kalpakis associate professor computer science and electrical engineering department...
TRANSCRIPT
Kostas Kalpakis
Associate Professor
Computer Science and Electrical Engineering Department
University of Maryland Baltimore County
April 5, 2011
Joint work with Shiming Yang and Yaacov Yesha
Improving HYSPLIT Forecasts with Data Assimilation*
*Supported in part by an IBM grant.
OutlineIntroduction
Motivation and GoalData Assimilation
Our approachState-Space ModelsThe NOAA HYSPLIT ModelThe LETKF Algorithm
Experiments and EvaluationCAPTEXCalifornia wildfires, August 2009
Summary
2
MotivationHigh volume real-time sensor data streams for
monitoring and forecasting applications are becoming ubiquitous
Bridging the gap between predictions and real-time observations is needed
Demands for environmental monitoring and hazard prediction are pressing
Need to incorporate measurements from the thousands of sensors that underlie IBM’s “smarter Planet” initiatives into various geophysical processes
3
Goals
4
Our goals are toincorporate a data assimilation capability into HYSPLIT
HYSPLIT is extensively used as a routine for many data productsutilize in-situ and remotely sensed observations for
improved forecastsapply to wildfire smoke prediction and monitoringdevelop efficient data assimilation system using
InfoSphereStream’s SPADE framework for distributed high-performance platforms
Data assimilationData assimilation is a set of techniques that
Incorporate real world observations into model analysis and forecast cycle
Help reduce model error growth (small correction and short range forecast)
Improve upon the estimation of model initial conditions for the next forecast cycle
5
The state-space model
Model a system by
Where111t1
t1
)(
)(M
t
t
vxy
uxx
tt
tt
H
tt
tt
t
t
RQ
vu
y
x
and covariance with noise-lly white typica
processes random noisen observatio and model theare and
operatorn observatio theis H
operator model theis M
tat time system theofn observatio theis
tat time system theof state theis
t
t
6
Data assimilation in state-spaceData assimilation becomes an estimation problem
Find a maximum likelihood estimate of the trajectory of the system states given a set of observations
Problem reduces to minimizing the cost function
Kalman filters, a recursive method, can be used to minimize this cost function efficiently for low-dimensional state space, with linear model and observation operators, and Gaussian noise processesOtherwise, the problem is often computationally difficult
2
11t
2/1 MH)(
t
tottJ txyRx
7
Data assimilation via Kalman filtersGraphical view of data assimilation using Kalman
filters
gainKalman theis where
H :step Correction
)(M :step Predictionb
1t111b
1ta
1t
b1t
t
ttt
att
K
xyKxx
xx
- - -
…
time
Background state
Analysis state
Observation
8
The NOAA HYSPLIT ModelHYSPLIT
Hybrid Single Particle Lagrangian Integrated Trajectory Model
A model system that computes air parcel trajectories, dispersion and deposition of pollutants
Computes particle dispersion with the puff model or the particle model
Needs meteorology data and emission source informationHas been validated using ground truth observations*
Used as a routine for various data productsAir Quality Index (AQI)Smoke Forecast System (SFS)
9
*R.R. Draxler, J.L. Heffter, and G.D. Rolph. Datem: Data archive of tracer experiments and meteorology. August 2001. http://www.arl.noaa.gov/DATEM.php, last checked Jul. 2010
System design
10
Data assimilation for HYSPLITUtilize HYSPLIT as a model operator in a state-space
model and assimilate observations into HYSPLIT
First, we need to carefully define the system state, so that we can extract it, modify it, and restart HYSPLIT
Second, since the model operator is non-linear and the system state is very large, standard extended Kalman filters are an expensive option for data assimilationWe use the LETKF algorithm, an ensemble transform
Kalman filter
11
Data assimilation for HYSPLITUse
the mass of the particles in HYSPLIT as the system statethe grid concentrations as the default observation
operator
12
LETKF AlgorithmLETKF (Local Ensemble Transform Kalman Filter)*
nonlinear model operators, linear observation operatorsGaussian state and observational noise processes
Reduces implementation costs since it does not need adjoints
It does analysis locally in the ensemble space which is typically of low dimension (< 100)avoids inverses of large matrices
It is embarrassingly parallelWe have implemented LETKF in C with MPI, and in IBM
InfoSphereStreams
13*Brian Hunt, Eric Kostelich and Istvan Szunyogh, “Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter”, Physica D 230, pp 112-126, 2007.
The LETKF AlgorithmGlobal steps: maintain an ensemble of K system states
Forward system state:Analysis: construct background analysis ensemble , background
observation ensemble , and their mean and covariance matrices
Local steps: for each grid point, choose local observation and background system state. Then calculate:Analysis error covariance:Perturbation:Analysis ensemble in ensemble space:
Analysis ensemble in state space:
14
Implementation using IBM InfoSphereStreams
InfoSphereStreams is a system developed by IBM for the very fast processing of large and fast data streams that supportsparallel and high performance stream processingcontinuous ingestion and analysisscaling over a range of hardware capabilitiesflexible to changing user objectives, available data, and
computing resource availabilitythe bursty nature of real-time observations of rapidly evolving
physical phenomenaUses SPADE to describe the stream operators
15
SPADE Implementation Flowchart
16
Experiments and evaluation
Experimentally evaluate our approach using the controlled releases of tracers available in DATEM datasets
Demonstrate our approach using in-situ and remotely sensed real data from a California fire in August 2009Observation and emission rates are taken from EPA AQS
and GBBEP, and MODIS AOD when available
17
Evaluation metricsWe use HYSPLIT’s statmain to compute evaluation
metrics for a HYSPLIT forecast with respect to the ground truth
We report on the following metricsThe Normalized Mean Squared Error (NMSE)
The model rank, an overall quality of the model (larger values are better; the maximum value is 4).
N
iiM
MPN 1
2i )(P
1NMSE
18
CAPTEXCAPTEX (Cross-Appalachian Tracer Experiment)
Time: 2100 UTC Sep 18 to 2100 UTC Oct 29, 1983Area: U.S. and Canada6 releases (3hr duration each) of special tracer (PFT).emission sources and rates are those in DATEM
Use DATEM CAPTEX observations as the ground truthObservations at 84 stations every 3 hrs for 48 hrs after
each release Run 160 iterations, each iteration simulating a 3hr time
period
19
CAPTEXAfter 3hr
After 6hr After 9hr After 12 hr
Forecasts with data assimilation
20
CAPTEXCAPTEX with and w/o data assimilation
21
CAPTEXCAPTEX with and w/o data assimilation
22
Modified CAPTEXTo assess whether our approach improves the forecasts
given inaccurate emissions rates, we do the followingUse the CAPTEX concentrations as ground truthRun HYSPLIT with modified emissions rate for
CAPTEX in two modes (with and w/o data assimilation)For the 2nd release that begins at 1700 UTC 25 Sep. 1983 use the
emission rate of 33.5 Kg/h instead of the 67Kg/h given in DATEM
Compare with unmodified CAPTEX emissions w/o data assimilation
23
Modified CAPTEX
24
California wildfire, August 2009Experiments to forecast particulate matter PM2.5
concentrations from a wildfire in California on August 2009Data used
Ground observations from EPA’s Air Quality System (AQS) (hourly obs)
Satellite observations fromTerra/Aqua MODIS Aerosol Optical Depth (AOD) (daily obs)Geostationary Operational Environmental Satellite (GOES) East/West
AOD (hourly obs)Emission rates from GBBEP (GOES-E/W Biomass Burning
Emission Product) (hourly obs)Data for SO2, NOx, CO, CO2, relative humidity are also
available from these data sources but not used
25
California wildfire, August 2009Experiment using AQS observations and GBBEP
emission ratesTime: 2100 UTC Aug 9 to 2100 UTC Aug 20, 2009Area: California and Nevadause hourly AQS data as ground truth observationsuse GBBEP hourly PM2.5 emissions from 2019 source
points emission rates range from 200g/hr to 10Kg/hreach iteration simulates a 1hr period
26
California wildfire, August 2009AQS+GBBEP
27
California wildfire, August 2009AQS+GBBEP
28
California wildfire, August 2009AQS+GBBEP
29
SummaryOur data assimilation system:
demonstrates improvement on statistical metrics, e.g. average 16.0% improvement on NMSE in DATEM/CAPTEX
uses state-of-the-art prediction model and assimilation algorithm shows that LETKF offers good algorithmic efficiency
can easily utilize other models and multiple data sourcesUses data sources from ground sites and satellites for pollutant
concentration and emission rates
Can be extended to other domains, e.g. volcanic ashDemo website:
http://bluegrit.cs.umbc.edu/~shiming1/demo/
30
Acknowledgments
31
We would like to thank
IBM for its generous support, and the InfoSphereStream team for its indispensible help
Drs. Ben Kyger and Roland Draxler for providing the HYSPLIT model and answering many of our questions
Dr. Milt Halem for his encouragement and support, and the Multicore Computing Center at UMBC for providing the computing environment
Dr. Hai Zhang of the UMBC Atmospheric Lidar Group, for his help on MODIS AOD
NASA for the MODIS data, NOAA for the GOES, GBBEP, and DATEM data, and EPA for the AQS data
32
Thank you.