exploring deep learning algorithms in forecasting …...Øsupervised learning Øtraining data:...
TRANSCRIPT
Exploring Deep Learning Algorithms in Forecasting Severe Haze Events in Southeast Asia
Chien Wang, Laboratoire d’Aérologie (CNRS/UPS)
Ø Why: Clearing land for palm oil plantation; drained peat lands in the area make thing even worse
Ø Profits: Low priced palm oil is used for making numerous daily necessities and food products
Ø Solution: The ultimate one seems quite obvious though perhaps is difficult to implement
Actually, fire is not the whole
story…
(Lee et al., ACP, 2017, 2018)
Year
So, …it seems that forecasting-the-occurrence-of-severe-haze-ahead-of-time would be the most practical mitigation measure…
But, can we forecast it with confidence?
Process-based modeling
Example:• WRF-Chem, high-resolution regional weather/climate model, including chemistry + aerosols• Simulation Skill: for vis ≤ 10 km events, ~80% (equivalent to training accuracy) with correction based on in-situ aerosol measurements• Forecast skill: practically zero due to lack of real time emission estimates
Should we try something else?Using machine learning algorithms to forecast haze
(Lee et al., 2018, ACP)
Ø Haze event ≡ daily surface visibility < 10 kmØ Data: abstract derivatives from meteorological data and satellite retrievalsØ Certain advantages over ”traditional” forecast models (e.g., low demand of
computation); task-centric vs. process-centricØ ~ 93% (training) accuracy in ”same-day” forecast using various algorithmsØ ~ 84% (training) accuracy in “one-day” forecast Ø Applications using “standard” ML algorithms often rely on abstract models,
while our knowledge (expert opinion) about extreme events are very limited
“Traditional”Machine Learning
Deep learningan “end-to-end” approach
Deep learning comes to the picture…
Convolutional neural networks: e.g., LeNet-5 (LeCun et al., 1998)
Liu et al., 2016; arXiv:1605.01156v12 convolution layers with 8 and 16 filter set
CNN has been applied to, e.g., identify certain weather patterns
Tropical Cyclones
Correctly Classified (True Positive)
Miss-classified (False Negative)
Atmospheric River
Miss-classified (False Negative)
Correctly Classified (True Positive)
Ø Supervised learningØ Training data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784,
including 679 events with vis ≤ 7.72 km (p5)Ø Inputs: 18 40x40 or 60x60 features from ERA-Interim reanalysis, 0.75o x 0.75o
Ø Output: 2 classes of vis0: vis > 7.72 km (94.7%); 1: vis ≤ 7.72 km (5.3%)
Ø Output: 3 classes of vis0: vis > 9.98 km (p25); 1: 7.72 km < vis ≤ 9.98 km; 2: vis ≤ 7.72 km
Forecasting Haze Events using Convolutional Neural Networks
Singapore Surface Visibility in kmFrom Global Surface Summary of the Day
(Smith et al., 2011, BAMS)
mean = 10.52, max = 24.94, min = 1.29 km
”HazeNet”
a 16-layer convolutional
neural network
Layers Size/Conv filters Kernels
input 40x40x18 or 60x60x18
Conv1 + dropoutConv2
9292
10x1010x10
Maxpool Maxpool Maxpool
Conv3 + dropoutConv4
192192
6x66x6
Maxpool Maxpool Maxpool
Conv5 + dropoutConv6
384384
3x33x3
Maxpool Maxpool Maxpool
Conv7 + dropoutConv8
384384
3x33x3
Maxpool Maxpool Maxpool
Conv9 + dropoutConv10
512512
3x33x3
Maxpool Maxpool Maxpool
Conv11 + dropoutConv12
512512
3x33x3
Maxpool Maxpool Maxpool
All-flat
Dense1Dense2 + dropout
40964096
sigmoid/softmax 2 or 3
Inputs: 18 features, 40x40map
Longitude
Latit
ude
Example: August 10, 1982, visibility = 7.56 km (data are normalized)
T1000 V10Z500LgScPercip
TCWV
TCWConvPrecip
MCloudZ850RelHumHCloud BLH
U10 LCloudSWVL3TCloudSWVL2 SWVL1
Performance of the Networks:Hazenet-16 with Batch Normalization(averaged over epoch 500-599):
Training accuracy = 0.999Training loss = 0.056Validation accuracy = 0.951Validation loss = 0.413
Importance of the ”hyperparameters” Overcome the overfitting
Different Network Structureson training accuracy
(H7, 1-Day)
Different Activationson training scores
A closer look at the performanceNote that #class0 >> #class1
P5 Class 1 (vis ≤ 7.72 km)Test samples (33%) = 4219
For forecast window = 0 dayLast 100 ep mean:Vacc = 0.951 (0.947)Prec = 0.575Recall = 0.236F1 = 0.330Heidke Skill Score = 0.343
F1 score or F1 = 2 x (precision x recall) / (precision + recall)HSS = ((tp + tn) - ecr)/(nsample - ecr);
ecr = ((tp + fn)*(tp + fp) + (tn + fn)*(tn + fp))/nsample
Frequency of class0 ~ the best accuracy of no-skill forecasting
Vacc = validation accuracyPrec or pre = precision
True Positive (TP)
False Positive (FP)
True Negative (TN)
False Negative (FN)
Rich patterns captured for different events:
Example: V10, class-1, true positive or TP events
What could help us to advance knowledge:
V10: Averaged patterns corresponding to different events
of class-1
What could we learn from the machine?
What could we learn from the machine?
Fwin = 0 days Fwin = 1 days
Fwin = 3 days
Fwin = 2 days
Fwin = 4 days Fwin = 5 days
Mean patterns identified for different forecast windows
TCW
T P
06/11/13 c = 0 06/14/13 c = 1 06/18/13 c = 2
MSL
Why…?
T=11 T=14 T=18
Deterministic forecasting platform
The same deterministic formula or causal relation T(t+1) = f[T(t), T(t-1),…, x1, x2, … xN]
DL forecasting platform
T=11 T=14 T=18
Predictor?
Predictor?
Process-OrientatedWRF-Chem
Task-OrientatedDeep ConvNet
Training Time 25-5 km regional domain:1d=630 or 1yr=230K core-hr
27-year daily data: < 2 hr using a Nvidia GPU
Forecasting Time N/A due to the lack of realtime emission data; otherwise same as above
Negligible
Data Preparation Initial conditions & 6-hourly boundary conditions; emissions
All “relevant” data
Code 1 million+ lines FORTRAN < 40 lines Python (using rich software libs)
Benefits Understanding detailed process connections
Identifying hidden features
You will have to deal with PARAMETERS in both platforms!
Summary
• Deep CNNs have been deployed to ”forecast” severe haze in Southeast Asia - perhaps we can say that it has passed the proof-of-concept stage
• The same network has recently been modified to explore forecasting the intensive lightning activities in Corsica, results are also promising
• Deep learning algorithms can elevate our knowledge base from a few cases to cover all available samples, benefiting our science
• Challenges in using meteorological data, e.g., scale-sensitive features, rich features while still limited samples
• Current networks still produce high number of miss-classified cases (false negative)
• New algorithms and network configurations are being proposed and will be tested