exploring deep learning algorithms in forecasting …...Øsupervised learning Øtraining data:...

Exploring Deep Learning Algorithms in Forecasting Severe Haze Events in Southeast Asia

Chien Wang, Laboratoire d’Aérologie (CNRS/UPS)

Ø Why: Clearing land for palm oil plantation; drained peat lands in the area make thing even worse

Ø Profits: Low priced palm oil is used for making numerous daily necessities and food products

Ø Solution: The ultimate one seems quite obvious though perhaps is difficult to implement

Actually, fire is not the whole

story…

(Lee et al., ACP, 2017, 2018)

Year

So, …it seems that forecasting-the-occurrence-of-severe-haze-ahead-of-time would be the most practical mitigation measure…

But, can we forecast it with confidence?

Process-based modeling

Example:• WRF-Chem, high-resolution regional weather/climate model, including chemistry + aerosols• Simulation Skill: for vis ≤ 10 km events, ~80% (equivalent to training accuracy) with correction based on in-situ aerosol measurements• Forecast skill: practically zero due to lack of real time emission estimates

Should we try something else?Using machine learning algorithms to forecast haze

(Lee et al., 2018, ACP)

Ø Haze event ≡ daily surface visibility < 10 kmØ Data: abstract derivatives from meteorological data and satellite retrievalsØ Certain advantages over ”traditional” forecast models (e.g., low demand of

computation); task-centric vs. process-centricØ ~ 93% (training) accuracy in ”same-day” forecast using various algorithmsØ ~ 84% (training) accuracy in “one-day” forecast Ø Applications using “standard” ML algorithms often rely on abstract models,

while our knowledge (expert opinion) about extreme events are very limited

“Traditional”Machine Learning

Deep learningan “end-to-end” approach

Deep learning comes to the picture…

Convolutional neural networks: e.g., LeNet-5 (LeCun et al., 1998)

Liu et al., 2016; arXiv:1605.01156v12 convolution layers with 8 and 16 filter set

CNN has been applied to, e.g., identify certain weather patterns

Tropical Cyclones

Correctly Classified (True Positive)

Miss-classified (False Negative)

Atmospheric River

Miss-classified (False Negative)

Correctly Classified (True Positive)

Ø Supervised learningØ Training data: 1982-2016 daily visibility (vis) from GSOD, sample size = 12784,

including 679 events with vis ≤ 7.72 km (p5)Ø Inputs: 18 40x40 or 60x60 features from ERA-Interim reanalysis, 0.75o x 0.75o

Ø Output: 2 classes of vis0: vis > 7.72 km (94.7%); 1: vis ≤ 7.72 km (5.3%)

Ø Output: 3 classes of vis0: vis > 9.98 km (p25); 1: 7.72 km < vis ≤ 9.98 km; 2: vis ≤ 7.72 km

Forecasting Haze Events using Convolutional Neural Networks

Singapore Surface Visibility in kmFrom Global Surface Summary of the Day

(Smith et al., 2011, BAMS)

mean = 10.52, max = 24.94, min = 1.29 km

”HazeNet”

a 16-layer convolutional

neural network

Layers Size/Conv filters Kernels

input 40x40x18 or 60x60x18

Conv1 + dropoutConv2

9292

10x1010x10

Maxpool Maxpool Maxpool


192192

6x66x6



384384

3x33x3



384384

3x33x3



512512

3x33x3



512512

3x33x3


All-flat

Dense1Dense2 + dropout

40964096

sigmoid/softmax 2 or 3

Inputs: 18 features, 40x40map

Longitude

Latit

ude

Example: August 10, 1982, visibility = 7.56 km (data are normalized)

T1000 V10Z500LgScPercip

TCWV

TCWConvPrecip

MCloudZ850RelHumHCloud BLH

U10 LCloudSWVL3TCloudSWVL2 SWVL1

Performance of the Networks:Hazenet-16 with Batch Normalization(averaged over epoch 500-599):

Training accuracy = 0.999Training loss = 0.056Validation accuracy = 0.951Validation loss = 0.413

Importance of the ”hyperparameters” Overcome the overfitting

Different Network Structureson training accuracy

(H7, 1-Day)

Different Activationson training scores

A closer look at the performanceNote that #class0 >> #class1

P5 Class 1 (vis ≤ 7.72 km)Test samples (33%) = 4219

For forecast window = 0 dayLast 100 ep mean:Vacc = 0.951 (0.947)Prec = 0.575Recall = 0.236F1 = 0.330Heidke Skill Score = 0.343

F1 score or F1 = 2 x (precision x recall) / (precision + recall)HSS = ((tp + tn) - ecr)/(nsample - ecr);

ecr = ((tp + fn)*(tp + fp) + (tn + fn)*(tn + fp))/nsample

Frequency of class0 ~ the best accuracy of no-skill forecasting

Vacc = validation accuracyPrec or pre = precision

True Positive (TP)

False Positive (FP)

True Negative (TN)

False Negative (FN)

Rich patterns captured for different events:

Example: V10, class-1, true positive or TP events

What could help us to advance knowledge:

V10: Averaged patterns corresponding to different events

of class-1

What could we learn from the machine?

What could we learn from the machine?

Fwin = 0 days Fwin = 1 days

Fwin = 3 days

Fwin = 2 days

Fwin = 4 days Fwin = 5 days

Mean patterns identified for different forecast windows

TCW

T P

06/11/13 c = 0 06/14/13 c = 1 06/18/13 c = 2

MSL

Why…?

T=11 T=14 T=18

Deterministic forecasting platform

The same deterministic formula or causal relation T(t+1) = f[T(t), T(t-1),…, x1, x2, … xN]

DL forecasting platform

T=11 T=14 T=18

Predictor?

Predictor?

Process-OrientatedWRF-Chem

Task-OrientatedDeep ConvNet

Training Time 25-5 km regional domain:1d=630 or 1yr=230K core-hr

27-year daily data: < 2 hr using a Nvidia GPU

Forecasting Time N/A due to the lack of realtime emission data; otherwise same as above

Negligible

Data Preparation Initial conditions & 6-hourly boundary conditions; emissions

All “relevant” data

Code 1 million+ lines FORTRAN < 40 lines Python (using rich software libs)

Benefits Understanding detailed process connections

Identifying hidden features

You will have to deal with PARAMETERS in both platforms!

Summary

• Deep CNNs have been deployed to ”forecast” severe haze in Southeast Asia - perhaps we can say that it has passed the proof-of-concept stage

• The same network has recently been modified to explore forecasting the intensive lightning activities in Corsica, results are also promising

• Deep learning algorithms can elevate our knowledge base from a few cases to cover all available samples, benefiting our science

• Challenges in using meteorological data, e.g., scale-sensitive features, rich features while still limited samples

• Current networks still produce high number of miss-classified cases (false negative)

• New algorithms and network configurations are being proposed and will be tested

exploring deep learning algorithms in forecasting …...Øsupervised learning Øtraining data:...

Documents