principal component regression & canonical...
TRANSCRIPT
Principal Component Regression & Canonical Correlation Analysis
Nachiketa Acharya [email protected]
Big Thanks to Dr. Simon Mason
Training Workshop on Seasonal Prediction of Southwest Monsoon Rainfall: 16th – 18th April, 2018
2 Seasonal Forecasting Using the Climate Predictability Tool
Making Seasonal forecast for monsoon
There are two main methods in use (and in practice we often combine the two, and/or use a hybrid of them).
I: Models of past statistics – teleconnection
EAST ASIA PR ANOMALY
WARM WATER VOLUME
N. ATL SST ANOMALY
EQ. SE INDIAN OCEAN SST ANOMALY
NORTH WEST EUROPE TEMP ANOMALY
NINO 3.4 SST ANOMALY
NATL PR ANOMALY
NCPAC U850 ANOMALY
Courtesy: IMD
3 Seasonal Forecasting Using the Climate Predictability Tool
Making Seasonal forecast for monsoon
II: Models of the physics – causation • Climate models are computer codes based on fundamental laws of physics
• However the output from these ensemble prediction systems cannot be used directly and requires further calibration in order to produce reliable forecasts.
4 Seasonal Forecasting Using the Climate Predictability Tool
Seasonal forecasting tool
• Climate Predictability tool (CPT) is an easy-to-use software for making seasonal forecast using either empirical predictors, of the outputs from GCM.
• Developed and maintain by Dr. Simon Mason.
• CPT available for Windows 95+ and Linux Batch version.
5 Seasonal Forecasting Using the Climate Predictability Tool
How the CPT make forecast?
6 Seasonal Forecasting Using the Climate Predictability Tool
Options for Making seasonal forecast in CPT
• Multiple Linear Regression
• Principal Component Regression.
• Canonical Correlation Analysis.
7 Seasonal Forecasting Using the Climate Predictability Tool
Multiple Linear Regression Area-average MAM rainfall for Thailand Ocean-based ENSO Indices
MAM rainfall over Thailand can be predict using a single predictor such as Feb NIÑO4 SSTs
0 1
0
1
ˆ NINO4
340 mm
50
y
0.48r
A simple linear regression equation for predicting rainfall has two parameters: • constant: how much rainfall can we
expect on average when the value of the predictor is 0.
• coefficient: how much can we expect rainfall to increase or decrease when the predictor increases by 1.
8 Seasonal Forecasting Using the Climate Predictability Tool
In CPT the MLR (multiple linear regression) option allows for more than one predictor:
Multiple Linear Regression
n
iii XbbY
10
where:
Y = dependent variable
Xi = independent variables
bi = regression coefficients
n = number of independent variables
9 Seasonal Forecasting Using the Climate Predictability Tool
In CPT the MLR (multiple linear regression) option allows for more than one predictor:
Let’s have some equations! The Multiple Regression Model
Estimates coefficients using least-square by minimizing
Multiple Linear Regression
10 Seasonal Forecasting Using the Climate Predictability Tool
Problems with Multiple Linear Regression
Multiplicity - Too many predictors from which to choose.
With more than a handful of candidate predictors, the probability of including a least one spurious predictor (and therefore of subsequently making a bad prediction) becomes very high.
11 Seasonal Forecasting Using the Climate Predictability Tool
Problems with Multiple Linear Regression
• Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated , meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy.
• When two X variables are highly correlated, they both convey essentially the same information.
• In this situation the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data.
• When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so they may be far from the true value.
12 Seasonal Forecasting Using the Climate Predictability Tool
Multicollinearity: Example
MAM Feb5ˆ 340 NINO40y MAM Jan3ˆ 344 NINO48y
MAM Jan Febˆ 332 NINO4 NIN131 O475y
For the first half of the data (1961 – 1985) only:
MAM Jan Febˆ 330 NINO4 NIN17 O419y
Predicting MAM 1961 – 2010 rainfall for Thailand from NIÑO4 SSTs:
Correlation between NINO4Jan and NINO4Feb is 0.97.
Perfect example where coefficient estimates are change erratically in response to small changes in the model or the data due to strong correlation among predictors.
MLR
13 Seasonal Forecasting Using the Climate Predictability Tool
Principal Components Regression • Principal components regression is just like
standard regression except the independent variables are principal components rather than the original X variables.
• Principal components regression (PCR) is a method for combating multicollinearity and results in estimation and prediction better than ordinary least squares
What is principal components analysis? • Principal components are linear
combinations of the X’s. Principal components are new variables which are linear combinations of the X’s. the new variables are not correlated with each other.
• The principal components transformation is equivalent to a rotation of axes.
Credit: Dave Garen
It was later independently developed (and named) by Harold Hotelling in the 1930s
PCA was invented in 1901 by Karl Pearson
14 Seasonal Forecasting Using the Climate Predictability Tool
Principal components
• Principal components analysis is specifically designed as a data reduction technique.
• PCR analysis is sometimes also known as empirical orthogonal function (EOF) analysis
15 Seasonal Forecasting Using the Climate Predictability Tool
Understanding PCA in a simple way
16 Seasonal Forecasting Using the Climate Predictability Tool
Diagrammatic Representation of Principal component
17 Seasonal Forecasting Using the Climate Predictability Tool
Selecting of PCs
How many of the new variables should be retained to represent the total variability of the original variables adequately? A stopping rule is required to identify at which point additional principal components are no longer required.
18 Seasonal Forecasting Using the Climate Predictability Tool
Visualization of Principal Components: Example from SST
Scores and loadings for first principal component of February 1961 – 2000 sea-surface temperatures.
A principal component is a weighted sum of a set of original variables, with the weights set so that the principal component has maximum variance.
19 Seasonal Forecasting Using the Climate Predictability Tool
Principal Components
The score indicates how intensely developed the loading pattern is for each year.
????
20 Seasonal Forecasting Using the Climate Predictability Tool
Scores and loadings for second principal component of February 1961 – 2000 sea-surface temperatures.
Separate patterns (“modes”) of variability can be defined. We can use just a few of these modes to represent the SST variability throughout the domain.
Principal Components
Principal Component Regression: Math (boring )
23 Seasonal Forecasting Using the Climate Predictability Tool
Principal Component
The principal components are orthogonal to each other, that is:
Elimination of Principal Components.
Transformation Back to the Original Variables:
25 Seasonal Forecasting Using the Climate Predictability Tool
Selecting Models in CPT
• MLR can be used when there is one or a very small number of predictors(independent to each other).
• PCR can be used to address problems with MLR that arise when there are many predictors.
• But what if there are many predictands?
26 Seasonal Forecasting Using the Climate Predictability Tool
Canonical Correlation Analysis
27 Seasonal Forecasting Using the Climate Predictability Tool
Canonical Correlation Analysis
28 Seasonal Forecasting Using the Climate Predictability Tool
Some terminology of CCA
29 Seasonal Forecasting Using the Climate Predictability Tool
Mode 1 r=0.73
Feb SSTs, 1961-2000
MAM rainfall
Visualization of Canonical Correlation Analysis: Example from SST and Rain
30 Seasonal Forecasting Using the Climate Predictability Tool
Canonical Correlation Analysis
Mode 2 r=0.67
Feb SSTs, 1961-2000
MAM rainfall
31 Seasonal Forecasting Using the Climate Predictability Tool
Selecting Models in CPT
• MLR can be used when there is one or a very small number of predictors (independent to each other).
• PCR can be used to address problems with MLR that arise when there are many predictors.
• CCA can be used if there are many predictors AND many predictands? But it can also be used even if there are a few of each.
32 Seasonal Forecasting Using the Climate Predictability Tool
Making probability forecast in CPT
• Generally Seasonal forecast describes in “tercile probability”
• Let’s do some hands on to understand this.
Example data: 10.13038, 27.59568, 13.42799, 13.96082, 21.76947, 16.92497, 18.6818, 25.95358, 30.46833, 18.02041,
23.27678, 17.61698, 22.29597, 24.39998, 13.83134, 22.74837, 26.01102, 20.92308, 37.29841, 13.91443, 12.6294, 2.501207, 29.10483, 28.67083, 19.20107, 28.98476, 21.83703, 22.9079, 21.12945, 24.39952
Mean=21.0205 and SD= 7.0933 Based on percentile we can estimates the thresholds Lower bound (33rd ) = 18.3511, Upper bound = 23.8382 (67th )
33 Seasonal Forecasting Using the Climate Predictability Tool
Plot the frequency and Probabilities
35.18XP 83.231)83.23( XPXP
Histogram
Probability Distribution Function based on Normal Distribution
34 Seasonal Forecasting Using the Climate Predictability Tool
• What will be the guess if no forecast is available?
– The forecast will fall in any of the categories
• So what are the chances of getting below normal?
– 33%
• Above Normal?
– 33%
• Near Normal
– 33%
• So every time the forecast is 33% probability of getting all the three categories, We call it as No Skill
35 Seasonal Forecasting Using the Climate Predictability Tool
• Now suppose in one year our models give mean forecast 30
• By various methods we calculate the spread (standard deviation) lets consider 6
• And then we generate forecast pdf
Blue is climatology red is forecast PDF
2% 2%
14% 14%
2%
84%
36 Seasonal Forecasting Using the Climate Predictability Tool
SO WE LEARN ABOUT Methods in CPT
WHAT DOES THEY REALLY
MEAN?
SORRY, I’M NOT PREPARED FOR
IN-DEPTH QUESTIONS
Questions?
web: iri .columbia.edu
@climatesociety
…/climatesociety