Risk Prediction Using Quantified News
Vu Anh Huynh1 Liang Zhang2
1Department of Aeronautics and Astronautics, MIT2Department of Nuclear Science and Engineering, MIT
Acknowledgement: Dan diBartolomeo, Anish Shah, Louis Scott
Northfields 18th Annual Summer Seminar
June 7, 2013
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 1 / 16
Outline
1 Problem Definition
2 Approach
3 Data
4 Regression Model
5 Results
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 2 / 16
Problem Definition
AAPL
Risk Prediction Using Quantified News
Given time-series of prices and intraday volatility of a security,
Given time-series of quantified news for the same security,
Build a model to predict the next intraday volatility.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 3 / 16
Problem Definition
AAPL
Risk Prediction Using Quantified News
Given time-series of prices and intraday volatility of a security,
Given time-series of quantified news for the same security,
Build a model to predict the next intraday volatility.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 3 / 16
Mathematical Formulation
Risk Prediction Using Quantified News
{Pt
}0t
: Price time-series,
{�t
}0t
: Intraday volatility time-series:
�t
=p252⇡/8 log(P
high
/Plow
)
{nt
}0t
where n
t
2 RK : News time-series where n
t
= 0 for dateswithout news.
Find a model F such that:
g(�t+1) = F
⇣{P⌧}0⌧t
, {�⌧}0⌧t
, {n⌧}0⌧t
⌘+ ✏
t+1, (1)
where g : R ! R is a function that describes a property of �t+1.
The choice of the function g is important.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 4 / 16
Mathematical Formulation
Risk Prediction Using Quantified News
{Pt
}0t
: Price time-series,
{�t
}0t
: Intraday volatility time-series:
�t
=p252⇡/8 log(P
high
/Plow
)
{nt
}0t
where n
t
2 RK : News time-series where n
t
= 0 for dateswithout news.
Find a model F such that:
g(�t+1) = F
⇣{P⌧}0⌧t
, {�⌧}0⌧t
, {n⌧}0⌧t
⌘+ ✏
t+1, (1)
where g : R ! R is a function that describes a property of �t+1.
The choice of the function g is important.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 4 / 16
Approach: Using regression and model selection
There are many approaches to construct such a model F .
Each approach is chosen based on the question that we pose aboutthe prediction ability.
In this work, we use regression and model selection to answer thequestion:What is the numerical value of tomorrow’s intraday volatility?
(i.e. g(x)=x)
�t+1 = F
⇣{P⌧}0⌧t
, {�⌧}0⌧t
, {n⌧}0⌧t
⌘+ ✏
t+1.
We will compare performance against a base model, which is a
random walk model.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 5 / 16
Approach: Using regression and model selection
There are many approaches to construct such a model F .
Each approach is chosen based on the question that we pose aboutthe prediction ability.
In this work, we use regression and model selection to answer thequestion:What is the numerical value of tomorrow’s intraday volatility?
(i.e. g(x)=x)
�t+1 = F
⇣{P⌧}0⌧t
, {�⌧}0⌧t
, {n⌧}0⌧t
⌘+ ✏
t+1.
We will compare performance against a base model, which is a
random walk model.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 5 / 16
Approach: Using regression and model selection
There are many approaches to construct such a model F .
Each approach is chosen based on the question that we pose aboutthe prediction ability.
In this work, we use regression and model selection to answer thequestion:What is the numerical value of tomorrow’s intraday volatility?
(i.e. g(x)=x)
�t+1 = F
⇣{P⌧}0⌧t
, {�⌧}0⌧t
, {n⌧}0⌧t
⌘+ ✏
t+1.
We will compare performance against a base model, which is a
random walk model.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 5 / 16
Data: RavenPack – analytics engine converting news text to quantitative data
News data provided by Ravenpack {nt
}0t
More than half a million pieces of news per month (nearly 1000 newsper hour).
For each news, detailed information is provided, such as type,relevance, event sentiment, novelty, etc. (see next slide)
Entity Relevance ESS CSS AEV NIP AEV
t1company A 100 85 55 77 12 68
t2company B ... ... ... ... ... ...
t3company A ... ... ... ... ... ...
t4company C ... ... ... ... ... ...
t5company B ... ... ... ... ... ...
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 6 / 16
Data
News Data Field Descriptions (highlights)
Entity: company names.
RELEVANCE(N): 0-100, indicates how closely related the entity isto the underlying news, can be converged to number of news per day.
ESS: 0-100, represents the news sentiment, i.e. higher values indicatepostive sentiment, while lower values below 50 show negativesentiment.
AES: 0-100, the percentage of positive events measured over a rolling90 day window.
AEV: the count of events measured over a rolling 90 day window.
ENS: 0-100, represents how ”new” or novel a news is.
CSS: 0-100, sentiment score combined various analysis techniques.
NIP: 0-100, the degree of impact a news flash has on the market overthe following 2-hr period.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 7 / 16
Regression Model Selection: General Result and Fundamental Limit
Consider nested models: F1 ✓ F2 ✓ F3 ✓ ... used for training toestimate parameters of the models,
Split data into training set and testing set,
Report training error and testing error for each model Fi
:
Implication: Balance between training error and complexity of a model.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 8 / 16
Regression Model Selection: General Result and Fundamental Limit
Consider nested models: F1 ✓ F2 ✓ F3 ✓ ... used for training toestimate parameters of the models,
Split data into training set and testing set,
Report training error and testing error for each model Fi
:
Implication: Balance between training error and complexity of a model.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 8 / 16
An Example of Nested Models
1log(�
t+1) = log(�t
) + ✏t+1
2log(�
t+1) = log(�t
) + N
t
+ ✏t+1
3log(�
t+1) = log(�t
) + N
t
+ CSS
t
+ ✏t+1
4log(�
t+1) = log(�t
) ⇤ Nt
+ CSS
t
+ ✏t+1
5log(�
t+1) =⇣log(�
t
) + CSS
t
⌘⇤ N
t
+ ✏t+1
6log(�
t+1) =⇣log(�
t
) + CSS
t
⌘⇤ N
t
+ ENS
t
+ ✏t+1
7log(�
t+1) =⇣log(�
t
) + CSS
t
⌘⇤ N
t
+ ENS
t
+ NIP
t
+ ✏t+1
8log(�
t+1) =⇣log(�
t
)+CSS
t
⌘⇤N
t
+ENS
t
⇤NIPt
+AES
t
+AEV
t
+✏t+1
9log(�
t+1) =⇣log(�
t
) + CSS
t
+ ENS
t
⌘⇤ N
t
+ ENS
t
⇤ NIPt
+ AES
t
⇤ AEVt
+ ✏t+1
10log(�
t+1) =⇣log(�
t
)+CSS
t
+ENS
t
+ESS
t
⌘⇤N
t
+ENS
t
⇤NIPt
+AES
t
⇤AEVt
+✏t+1
Note: A ⇤ B means A+ B + AxB
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 9 / 16
Results
Google (NASDAQ: GOOG)
Jan Mar May Jul Sep Nov
20
60
100
Dates
Intr
a!
day
Vola
tility
(%
) RealizedTrainingTesting
Jan Mar May Jul Sep Nov
20
60
100
DatesIn
tra!
day
Vola
tility
(%
) RealizedTrainingTesting
log(�t+1) = log(�
t
) + ✏t+1 log(�
t+1) =⇣log(�
t
) + CSS
t
⌘⇤ N
t
+ ENS
t
+ NIP
t
+ ✏t+1
R
2 = 1.3% R
2 = 31.4%
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 10 / 16
Results
Google (NASDAQ: GOOG)
5.6
5.8
6
6.2
6.4
6.6
6.8
0 1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16T
rain
ing e
rror
Test
ing e
rror
Model ID (in order of complexity)
TrainingTesting
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 11 / 16
Results
Exxon Mobil (NASDAQ: XOM)
Jan Mar May Jul Sep Nov
515
25
35
Dates
Intr
a!
day
Vola
tility
(%
) RealizedTrainingTesting
Jan Mar May Jul Sep Nov
515
25
35
DatesIn
tra!
day
Vola
tility
(%
) RealizedTrainingTesting
log(�t+1) = log(�
t
) + ✏t+1 log(�
t+1) =⇣log(�
t
) + AES
t
+ CSS
t
+ ENS
t
⌘⇤ AEV
t
+⇣CSS
t
+ ESS
t
⌘⇤ N
t
+⇣NIP
t
+ ESS
t
⌘⇤ ENS
t
+ ✏t+1
R
2 = �1.8% R
2 = 18.4%
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 12 / 16
Results
Exxon Mobil (NASDAQ: XOM)
3.1
3.2
3.3
3.4
3.5
3.6
3.7
0 2 4 6 8 10 12 14 4.8
4.9
5
5.1
5.2
5.3
5.4T
rain
ing e
rror
Test
ing e
rror
Model ID (in order of complexity)
TrainingTesting
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 13 / 16
Results
Apple (NASDAQ: AAPL)
Jan Mar May Jul Sep Nov
10
30
50
Dates
Intr
a!
day
Vo
latil
ity (
%) Realized
TrainingTesting
9
9.5
10
10.5
11
0 1 2 3 4 5 6 7 8 9 8
8.5
9
9.5
10
10.5
11
11.5
12
Tra
inin
g e
rro
r
Te
stin
g e
rro
r
Model ID (in order of complexity)
TrainingTesting
No model can be better than the simple random walk model (R2 = 24.4%)
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 14 / 16
Statistics of regression models
We investigated di↵erent regression models for over 150 stocks in theyear of 2012There are many stocks that are news neutral, in the sense thatvolatility is indi↵erent to any factors from news analytics.For stocks that are response to news factor, volatility is consistentlycorrelated with certain news factors (as shown in the figures below).
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 15 / 16
Conclusions and Future work
We have investigated the regression approach to model how newsa↵ects intraday volatity of securities.
For certain stocks, we built up nested regression model tosignificantly improve the prediction of intra-volatility .
There are news neutral stocks for which news-incorporated models donot outperform the random walk model.
For many stocks, volatility is positively related to the number of newsas well as the news sentiment.
We are building statistics on a larger dataset to support ourhypothesis.
Huynh, Zhang (MIT) Risk Prediction Using Quantified News June 7, 2013 16 / 16