mlos forecasting
DESCRIPTION
Notes on the development of an experimental MLOS forecasting scheme for the Pacific IslandsTRANSCRIPT
Introduction Data processing Methods Results Conclusion and recommendations
Notes on the development of an experimentalseasonal MLOS forecasting scheme for the Pacific
Islands
Nicolas Fauchereau 1,2 Scott Stephens 1 Nigel Goodhue 1
Rob Bell 1 Doug Ramsay 1
1NIWA Ltd., Auckland, New Zealand2Oceanography Dept., University of Cape-Town, Cape-Town, South Africa
June 20, 2013
1/19
Introduction Data processing Methods Results Conclusion and recommendations
Table of contents
1 Introduction
2 Data processingMean Level of the Sea anomalies (MLOS)Predictors sets
IndicesSST EOFs
3 MethodsRegressionClassification
4 Results
5 Conclusion and recommendations
2/19
Introduction Data processing Methods Results Conclusion and recommendations
Introduction
RationaleSet out in the “White Paper”
high impact from sea level extremesvalue in developing an “extreme calendar”extreme tides + NTR (MLOS + “high frequency”)
GoalCompared to existing PEAC scheme:
Extend coverage to non-US affiliated IslandsFrequency: every month for the coming 3 months (IslandClimate Update)Performance of the model, type of forecast (probabilistic ?)
3/19
Introduction Data processing Methods Results Conclusion and recommendations
Introduction
ObjectiveProvide recommendations:
Data processing, predictandChoice of the set of predictorsStatistical methods for predictionOperational Implementation
ImplementationFor 3 Islands in the Pacific (presenting wide range of variability):
”Hindcast”: forecast for T+1 to 3 using information at T0(e.g. May for June-August)Different predictorsDifferent methods (state of the art Machine Learning)
4/19
Introduction Data processing Methods Results Conclusion and recommendations
Sea-Level-records
GuamCoordinates (144.7833 W., 13.4500 N.)1948-03-10 to 2008-12-31proportion of days missing: 12 %
Kiribari, TarawaCoordinates (172.9300 W., 1.3625 N.)1974-05-03 to 2012-07-30proportion of days missing: 8 %
Cook Islands, RarotongaCoordinates (200.2147 W., 21.2048 S.)1977-04-24 to 2011-08-31proportion of days missing: 2 %
5/19
Introduction Data processing Methods Results Conclusion and recommendations
Sea-Level-recordsHourly sea-level (cm), tidal and high frequency componentremoved (Scott, Nigel, Rob)
1 Daily then Monthly averages2 Series truncated before 1979-1-13 Climatology over 1979-20084 3-points running averages of monthly anomalies WRT
climatology
1979 1984 1989 1994 1999 2004 20090.25
0.20
0.15
0.10
0.05
0.00
0.05
0.10
0.15
0.20MLOS Seasonal Time-series
Guam
Kiribati
Cooks
6/19
Introduction Data processing Methods Results Conclusion and recommendations
Sea-Level-records
5 categories (”labels”) for classification algorithms:1 ”well below” = (−inf, −0.15]: labelled -22 ”below” = (−0.15, −0.05]: labelled -13 ”normal” = (−0.05, +0.05]: labelled 04 ”above” = (+0.05, +0.15]: labelled 15 ”well-above” = (+0.15, inf): labelled 2
7/19
Introduction Data processing Methods Results Conclusion and recommendations
Predictors sets
Choice of the predictors set is dictated by:
Relevance:Need to reflect plausible physical relationships betweenOcean-Climate system and Sea-Level.Operational constraints:Must be available in near real time (within the first 5 days ofMonth 1 for forecast Season Month 1 - Month 3).
8/19
Introduction Data processing Methods Results Conclusion and recommendations
Indices
Indices of SST and Atmospheric variables, monthly time-scale:
NINOS (1+2, 3.4, 3, 4): from CPCSouthern Oscillation Index (SOI): calculated by NIWA,data from BoMEl Nino Modoki Index (EMI): calculated from ERSSTdatasetSeasonal Cycle: (first 3 harmonics on MLOS climatology)Regional SST anomalies ...
9/19
Introduction Data processing Methods Results Conclusion and recommendations
Indices: Regional SSTs
Regression of SST anomalies on MLOS anomalies (lead 1 month)
10/19
Introduction Data processing Methods Results Conclusion and recommendations
Sea-Surface-Temperatures EOFSEOF analysis of monthly anomalies of ERSST SSTs.9 first Principal Components used as predictors
11/19
Introduction Data processing Methods Results Conclusion and recommendations
Methods
Machine LearningRegression: continuous dependent variableClassification: discrete, categorical dependent variable
Regression1 Generalized Linear Models: Extension of linear regression
for distributions of the exponential family (Normal, Poisson,Binomial, Multinomial, etc)
Ordinary Least Square (Linear Regression)Penalized Least Square (Ridge Regression, LARS, LASSO)Logistic Regression
2 Multivariate Adaptative Regression Splines (MARS):Non-parametric multivariate regression methodModels non-linearities and interactions between predictorsSimilarities with stepwise regression and CART (ClassificationAnd Regression Trees: recursive partitioning)
12/19
Introduction Data processing Methods Results Conclusion and recommendations
Methods
Classification1 Logistic Regression
Binomial or multinomial (categorical) response variableModels probability of observation to belong to each class
2 Support Vector Machines (SVM)Optimal hyperplane (2 classes) or set of hyperplanes (kclasses)Kernel trick: map data to higher dimensional space to dealwith non-linearly separable classesRadial Basis Function is widely used kernel
13/19
Introduction Data processing Methods Results Conclusion and recommendations
Approach
All the methods referred to above are tested in turn, usingsuccessively the Indices and the SST EOFs set as predictorsApplied to Guam, Kiribati and Cooks”Best” Model selected using objective measures (i.e.R-squared) + cross-validation + expert judgmentResults for Guam only presented in details
14/19
Introduction Data processing Methods Results Conclusion and recommendations
Results for GuamNotes on the Guam time-series
12 % of missing valuesLarge gap October 1997 - January 1999, 26 consecutive seasonsmissingtrend from about 2002
1979 1984 1989 1994 1999 2004−0.25
−0.20
−0.15
−0.10
−0.05
0.00
0.05
0.10
0.15
0.20Guam time-series
TS minus quadratic fitOriginal Time-seriesquadratic fit
15/19
Introduction Data processing Methods Results Conclusion and recommendations
Results: Logistic regression (Multinomial)
Predictors set = SST PCs + seasonal cycleSuccess rate: 66.2 % (random: 20 %)
Probabilistic forecast
well-below below normal above well-above
0
1
2
3
4
5
6
7
8
9
Tim
e (
seaso
ns)
Exemple of a Multinomial Logistic regression probabilistic forecast
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Pro
b.
16/19
Introduction Data processing Methods Results Conclusion and recommendations
Results: MARS
Predictors set = SST PCs + seasonal cycle + damped lineartermR-squared: 0.85
1979 1984 1989 1994 1999 2004 20090.25
0.20
0.15
0.10
0.05
0.00
0.05
0.10
0.15
0.20
Guam MARS Model: Var (R2 ): 92.50 MSE: 0.0011, GCV: 0.0017, RSQ: 0.8556, GRSQ: 0.7800
observed
predicted
17/19
Introduction Data processing Methods Results Conclusion and recommendations
Results: Support Vector Machines
Predictors set = SST PCs + seasonal cycle + damped lineartermSuccess rate (with intermediate ”regularization” parameter):96 %
Confusion matrix
WB B N A WAWB 14 2 1 0 0B 0 64 1 0 0N 0 2 117 1 0A 0 0 2 85 0WA 0 0 0 3 4
18/19
Introduction Data processing Methods Results Conclusion and recommendations
Conclusion and recommendations
For regression (continuous): MARS with SST EOFsFor classification (categorical): SVM with SST EOFshow to deal with (non-linear) trend ? here we used a dampedlinear term, but bit of a ad-hoc solutionInclude Pacific Decadal OscillationEnsemble techniques (Random Forests, bagging, boosting) forclassifications ?Hybrid predictor set ? EOF on enhanced indices setLength of the time-series (30 years is really minimum)
19/19