data integration: assessing the value and significance of new observations and products
DESCRIPTION
Data Integration: Assessing the Value and Significance of New Observations and Products. John Williams, NCAR Haig Iskenderian, MIT LL. NASA Applied Sciences Weather Program Review Boulder, CO November 19, 2008. Data Integration. Goals - PowerPoint PPT PresentationTRANSCRIPT
Data Integration: Assessing the Value and Significance of New Observations and
Products
John Williams, NCAR
Haig Iskenderian, MIT LL
NASA Applied Sciences Weather Program Review
Boulder, CO
November 19, 2008
Data Integration
• Goals– Integrate NASA-funded research into NextGen 4-D data
cube for SAS products and decision support
– Evaluate potential of new data to contribute to NextGen product skill, in context of other data sources
– Provide feedback on temporal/spatial scales and operationally significant scenarios where new data may contribute
• Approaches– Perform physically-informed transformations and forecast
system integration, e.g., into fuzzy logic algorithm
– Use nonlinear statistical analysis to evaluate new data importance in conjunction with other predictor fields
– Implement, evaluate and tune the system
Example of Forecast System Integration:
SATCAST integration into CoSPA
Numerical Forecast Models
CoSPAWeather Product
Generator
CoSPA Situation Display
NEXRADTDWR
Weather Radar
LLWAS ASOS
Surface Weather
Lightning
Canadian
Satellite
Air TrafficManagers
AirlineDispatch
DecisionSupport Tools
CoSPA 0-2 hour Forecasts
Overview of Heuristic Forecast
FeatureExtraction
Satellite
Mosaic Radar Products
Forecast Engine
Weather AnalysisProducts
Forecasts
Error Statistics
Surface Obs
RUC
InterestImages
Generation of Interest Images
• Interest Images:
– Are VIL-like (0-255) images that have a high impact upon evolution and pattern of future VIL
– Result from combining individual predictor fields using expert meteorological knowledge and image processing for feature extraction
Creating Interest ImagesConvective Initiation
Forecast Engine
Lower Tropospheric Winds/Speed
Regional CI Weights
Orientation and elongation of elliptical kernel prescribed by winds
Cumulus
CI Interest
Locations prescribed by CI Scores
Stability Mask
Number CI Indicators & Visible
Unfavorablefor CI
Favorablefor CI
Predictor Fields Image Processing Feature Extraction
Feature ExtractionWeather Classification
Line
Stratiform
Large Airmass
Small Airmass
Embedded
Overview of Heuristic Forecast
FeatureExtraction
Satellite
Mosaic Radar Products
Forecast Engine
Weather AnalysisProducts
Forecasts
Error Statistics
Surface Obs
RUC
InterestImages
Forecast EngineCombine Interest Images
weight * Pixel Value)weight
VILLong-term
Trend Satellite Interest
RADAR Boundary
Weather Type Image Combined Forecast Image
P(t,pixel,wxtype) =
Short-termTrend
. . . . .
Example of VIL Interest Evolution
1530
4560
7590
105120
LineLarge Air
Small AirStratiform
No Type0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Weight
Forecast Time
VIL Pixel Value = 100
Line
Large Air
Small Air
Stratiform
No Type
Summary of Heuristic Approach and Limitations
• Individual interest images are each 0-255 VIL-like images resulting from a combination of predictor fields and feature extraction
• Forecast is a weighted average of all interest images dependent on lead time and WxType, with weights determined heuristically
– Combines static set of interest images into 0-2 hour forecasts– Storm evolution is embedded in the weights, dependent on WxType
• Limitations: – The process of integrating a candidate predictor is a manual, time-
intensive process– The utility of the predictor or an interest image to the forecast is
known only qualitatively– There may be other predictor fields and interest images that would
be helpful that are not being currently used– Interest image weights and evolution functions may not be optimal– An objective method could help address these issues
Automated Data Importance Evaluation:
Random Forests
Random Forest (RF)
• A non-linear statistical analysis technique
• Produces a collection of decision trees using a “training set” of predictor variables (e.g., observation and model datafeatures) and associated “truth” (e.g., future storm intensity) values
– each decision tree’s forecast logic is based on a random subset of data and predictor variables, making it independent from others
– during training, random forests produce estimates of predictor importance
Example: CoSPA combiner development(focus on 1 hour VIP level prediction)
• Analyzed data collected in summer 2007– Radar, satellite, RUC model, METAR, MIT-LL feature fields,
storm climatology and satellite-based land use fields
– Transformations• distances to VIP thresholds; channel differences
• disc min, max, mean, coverage over 5, 10, 20, 40 and 80-km radii
– Used motion vectors to “pull back” +1 hr VIP truth data to align with analysis time data fields
• For each problem, randomly selected balanced sets of “true” and “false” pixels from dataset and trained RF– VIP 3 (operationally significant convection)
– initiation at varying distances from existing convection
• Plotted ranks of each predictor (low rank is good) for various scenarios
VIL8bit 06/19/2007 23:30
VIL8bit_40kmMax 06/19/2007 23:00
Example fieldsVIL8bit_40kmPctCov 06/19/2007 23:30
VIL8bit_distVIPLevel6+ 06/19/2007 23:30
Importance summary for VIP 3 (var. WxType)
Imp
ort
ance
Ran
km
ore
imp
ort
ant
le
ss im
po
rtan
t
MITLL WxType
Imp
ort
ance
Ran
km
ore
imp
ort
ant
le
ss im
po
rtan
t
MITLL WxType
Importance summary for init 20 km from existing storm
Imp
ort
ance
Ran
km
ore
imp
ort
ant
le
ss im
po
rtan
t
MITLL WxType
Importance summary for init 80 km from existing storm
RF Empirical Model Performance: VIP 3
Random Forest votes for VIP >= 3
Fra
ct.
Inst
ance
s w
ith V
IP >
= 3
Calibration
ROC Curve (blue)
RF empirical model provides a probabilistic forecast performance benchmark
Summary and Conclusions
• Developing satellite-based weather products may be only the first step of their integration into an operational forecast system
• Integration into an existing forecast system may require physically-informed transformations and heuristics
• An RF statistical analysis can help evaluate new candidate predictors in the context of others– Relative importance
– Feedback on scales of contribution
– Also supplies an empirical model benchmark
• Successful operational implementation may require additional funding beyond initial R&D