visualization of big time series data
TRANSCRIPT
-
Visualisation ofbig time seriesdata
Visualisation of big time series data 1
Rob J Hyndman
-
Visualisation ofbig time seriesdata
Visualisation of big time series data 1
Rob J Hyndman
with Earo Wang, Nikolay LaptevYanfei Kang, Kate Smith-Miles
-
Visualisation ofbig time seriesdata
Visualisation of big time series data 1
Rob J Hyndman
with Earo Wang, Nikolay LaptevYanfei Kang, Kate Smith-Miles
-
Visualisation ofbig time seriesdata
Visualisation of big time series data 1
Rob J Hyndman
with Earo Wang, Nikolay LaptevYanfei Kang, Kate Smith-Miles
-
Visualisation ofbig time seriesdata
Visualisation of big time series data 1
Rob J Hyndman
with Earo Wang, Nikolay LaptevYanfei Kang, Kate Smith-Miles
-
Visualisation ofbig time seriesdata
Visualisation of big time series data 1
Rob J Hyndman
with Earo Wang, Nikolay LaptevYanfei Kang, Kate Smith-Miles
-
Visualisation ofbig time seriesdata
Visualisation of big time series data 1
Rob J Hyndman
with Earo Wang, Nikolay LaptevYanfei Kang, Kate Smith-Miles
-
Outline
1 The problem
2 Australian tourism demand
3 M3 competition data
4 Yahoo web traffic
5 What next?
Visualisation of big time series data The problem 2
-
Spectacle sales
Visualisation of big time series data The problem 3
Monthly sales data from 2000 2014Provided by a large spectacle manufacturerSplit by brand (26), gender (3), price range (6),materials (4), and stores (600)About a million disaggregated series
-
Fulcher collection
www.comp-engine.org/timeseries
38,190 time series from many sources
Over 20,000 real series from meterology,medicine, audio, astrophysics, finance, etc.Over 10,000 simulated series from variouschaotic and stochastic models.
Visualisation of big time series data The problem 4
www.comp-engine.org/timeseries
-
Fulcher collection
www.comp-engine.org/timeseries
38,190 time series from many sources
Over 20,000 real series from meterology,medicine, audio, astrophysics, finance, etc.Over 10,000 simulated series from variouschaotic and stochastic models.
Visualisation of big time series data The problem 4
www.comp-engine.org/timeseries
-
FRED: research.stlouisfed.org/fred2/
Visualisation of big time series data The problem 5
research.stlouisfed.org/fred2/
-
Quandl: www.quandl.com
Visualisation of big time series data The problem 6
www.quandl.com
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
-
How to plot lots of time series?
Visualisation of big time series data The problem 8
-
How to plot lots of time series?
Visualisation of big time series data The problem 8
-
How to plot lots of time series?
Visualisation of big time series data The problem 8
-
How to plot lots of time series?
Visualisation of big time series data The problem 8
-
How to plot lots of time series?
Visualisation of big time series data The problem 8
-
Key idea
Examples for time series
lag correlationsize and direction of trendstrength of seasonalitytiming of peak seasonalityspectral entropy
Called features or characteristics in themachine learning literature.
Visualisation of big time series data The problem 9
John W Tukey
Cognostics
Computer-produced diagnostics(Tukey and Tukey, 1985).
-
Key idea
Examples for time series
lag correlationsize and direction of trendstrength of seasonalitytiming of peak seasonalityspectral entropy
Called features or characteristics in themachine learning literature.
Visualisation of big time series data The problem 9
John W Tukey
Cognostics
Computer-produced diagnostics(Tukey and Tukey, 1985).
-
Key idea
Examples for time series
lag correlationsize and direction of trendstrength of seasonalitytiming of peak seasonalityspectral entropy
Called features or characteristics in themachine learning literature.
Visualisation of big time series data The problem 9
John W Tukey
Cognostics
Computer-produced diagnostics(Tukey and Tukey, 1985).
-
Key idea
Examples for time series
lag correlationsize and direction of trendstrength of seasonalitytiming of peak seasonalityspectral entropy
Called features or characteristics in themachine learning literature.
Visualisation of big time series data The problem 9
John W Tukey
Cognostics
Computer-produced diagnostics(Tukey and Tukey, 1985).
-
Key idea
Examples for time series
lag correlationsize and direction of trendstrength of seasonalitytiming of peak seasonalityspectral entropy
Called features or characteristics in themachine learning literature.
Visualisation of big time series data The problem 9
John W Tukey
Cognostics
Computer-produced diagnostics(Tukey and Tukey, 1985).
-
Key idea
Examples for time series
lag correlationsize and direction of trendstrength of seasonalitytiming of peak seasonalityspectral entropy
Called features or characteristics in themachine learning literature.
Visualisation of big time series data The problem 9
John W Tukey
Cognostics
Computer-produced diagnostics(Tukey and Tukey, 1985).
-
Key idea
Examples for time series
lag correlationsize and direction of trendstrength of seasonalitytiming of peak seasonalityspectral entropy
Called features or characteristics in themachine learning literature.
Visualisation of big time series data The problem 9
John W Tukey
Cognostics
Computer-produced diagnostics(Tukey and Tukey, 1985).
-
Key idea
Examples for time series
lag correlationsize and direction of trendstrength of seasonalitytiming of peak seasonalityspectral entropy
Called features or characteristics in themachine learning literature.
Visualisation of big time series data The problem 9
John W Tukey
Cognostics
Computer-produced diagnostics(Tukey and Tukey, 1985).
-
Outline
1 The problem
2 Australian tourism demand
3 M3 competition data
4 Yahoo web traffic
5 What next?
Visualisation of big time series data Australian tourism demand 10
-
Australian tourism demand
Visualisation of big time series data Australian tourism demand 11
-
Australian tourism demand
Visualisation of big time series data Australian tourism demand 11
Quarterly data on visitor night from1998:Q1 2013:Q4From: National Visitor Survey, based onannual interviews of 120,000 Australiansaged 15+, collected by Tourism ResearchAustralia.Split by 7 states, 27 zones and 76 regions(a geographical hierarchy)Also split by purpose of travel
HolidayVisiting friends and relatives (VFR)BusinessOther
304 disaggregated series
-
Domestic tourism demand: VictoriaB
AA
Hol
BA
BH
ol
BA
AV
isB
AB
Vis
BA
AB
usB
AB
Bus
BA
AO
thB
AB
Oth
BA
CH
olB
BA
Hol
BA
CV
isB
BA
Vis
BA
CB
usB
BA
Bus
BA
CO
thB
BA
Oth
BC
AH
olB
CB
Hol
BC
AV
isB
CB
Vis
BC
AB
usB
CB
Bus
BC
AO
thB
CB
Oth
BC
CH
olB
DA
Hol
BC
CV
isB
DA
Vis
BC
CB
usB
DA
Bus
BC
CO
thB
DA
Oth
BD
BH
olB
DC
Hol
BD
BV
isB
DC
Vis
BD
BB
usB
DC
Bus
BD
BO
thB
DC
Oth
BD
DH
olB
DE
Hol
BD
DV
isB
DE
Vis
BD
DB
usB
DE
Bus
BD
DO
thB
DE
Oth
BD
FH
olB
EA
Hol
BD
FV
isB
EA
Vis
BD
FB
usB
EA
Bus
BD
FO
thB
EA
Oth
BE
BH
olB
EC
Hol
BE
BV
isB
EC
Vis
BE
BB
usB
EC
Bus
BE
BO
thB
EC
Oth
BE
DH
olB
EE
Hol
BE
DV
isB
EE
Vis
BE
DB
usB
EE
Bus
BE
DO
thB
EE
Oth
BE
FH
olB
EG
Hol
BE
FV
isB
EG
Vis
BE
FB
usB
EG
Bus
BE
FO
thB
EG
Oth
Visualisation of big time series data Australian tourism demand 12
-
An STL decompositionTourism demand for holidays in PeninsulaYt = St + Tt + Rt St is periodic with mean 0
5.0
6.0
7.0
data
0.
50.
5
seas
onal
5.8
6.1
6.4
tren
d
0.
40.
0
2000 2005 2010
rem
aind
er
timeVisualisation of big time series data Australian tourism demand 13
-
Seasonal stacked bar chart
Place positive values above the origin whilenegative values below the originMap the bar length to the magnitudeEncode quarters by colours
1.0
0.5
0.0
0.5
1.0
Holiday
BAA BAB BAC BBABCABCBBCCBDABDBBDCBDDBDEBDF BEA BEBBECBEDBEE BEFBEGRegions
Sea
sona
l Com
pone
nt
Qtr
Q1
Q2
Q3
Q4
Visualisation of big time series data Australian tourism demand 14
-
Seasonal stacked bar chart: VIC
Visualisation of big time series data Australian tourism demand 15
-
Seasonal stacked bar chart: VIC
1.00.5
0.00.51.0
1.00.5
0.00.51.0
1.00.5
0.00.51.0
1.00.5
0.00.51.0
Holiday
VF
RB
usinessO
ther
BAABABBACBBABCABCBBCCBDABDBBDCBDDBDEBDFBEABEBBECBEDBEEBEFBEGRegions
Sea
sona
l Com
pone
nt
QtrQ1Q2Q3Q4
Visualisation of big time series data Australian tourism demand 15
-
Trend analysis
Linearity: the long-term direction andstrength of trend.
Curvature: the changing direction of trend.
Estimate by regression:
Tt = 0 + 11(t) + 22(t) + et
where k(t) is a kth-degree orthogonalpolynomial in time t.
To separate the linearity (1) and curvature(2).
Visualisation of big time series data Australian tourism demand 16
-
Trend analysis
Visualisation of big time series data Australian tourism demand 17
01234
01234
01234
01234
Holiday
VF
RB
usinessO
ther
BAA BAB BAC BBA BCABCBBCCBDABDBBDCBDDBDE BDF BEA BEB BECBED BEE BEF BEGRegions
Tren
d Li
near
ity
Direction+
-
Trend analysis
Visualisation of big time series data Australian tourism demand 17
-
Corrgram of remainder
Visualisation of big time series data Australian tourism demand 181
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
BE
EH
olB
EF
Oth
BE
EO
thB
DE
Oth
BE
BO
thB
EA
Bus
BE
FB
usB
DC
Oth
BA
CH
olB
EB
Bus
BE
AV
isB
BA
Hol
BD
EH
olB
AB
Oth
BA
AV
isB
AA
Hol
BD
CH
olB
BA
Bus
BC
BH
olB
EG
Bus
BD
DV
isB
AB
Vis
BD
AV
isB
EA
Oth
BD
FH
olB
EE
Bus
BA
AO
thB
AC
Oth
BD
AO
thB
DE
Bus
BC
BO
thB
AC
Bus
BE
BV
isB
AC
Vis
BC
AO
thB
EF
Vis
BC
BV
isB
ED
Hol
BE
GO
thB
DB
Hol
BA
BB
usB
EB
Hol
BD
FB
usB
EC
Hol
BC
AH
olB
DB
Oth
BE
AH
olB
DC
Bus
BE
CV
isB
DB
Vis
BC
CH
olB
BA
Vis
BA
BH
olB
BA
Oth
BC
CO
thB
CB
Bus
BC
CV
isB
EG
Vis
BD
DH
olB
EC
Oth
BD
CV
isB
AA
Bus
BC
CB
usB
EC
Bus
BC
AV
isB
DF
Vis
BE
GH
olB
DD
Oth
BE
DO
thB
ED
Vis
BD
DB
usB
DE
Vis
BE
FH
olB
EE
Vis
BD
BB
usB
DA
Bus
BD
AH
olB
CA
Bus
BD
FO
thB
ED
Bus
BEEHolBEFOthBEEOthBDEOthBEBOthBEABusBEFBusBDCOthBACHolBEBBusBEAVisBBAHolBDEHolBABOthBAAVisBAAHolBDCHolBBABusBCBHolBEGBusBDDVisBABVisBDAVisBEAOthBDFHolBEEBusBAAOthBACOthBDAOthBDEBusBCBOthBACBusBEBVisBACVisBCAOthBEFVisBCBVisBEDHolBEGOthBDBHolBABBusBEBHolBDFBusBECHolBCAHolBDBOthBEAHolBDCBusBECVisBDBVisBCCHolBBAVisBABHolBBAOthBCCOthBCBBusBCCVisBEGVisBDDHolBECOthBDCVisBAABusBCCBusBECBusBCAVisBDFVisBEGHolBDDOthBEDOthBEDVisBDDBusBDEVisBEFHolBEEVisBDBBusBDABusBDAHolBCABusBDFOthBEDBus
-
Corrgram of remainder
Visualisation of big time series data Australian tourism demand 181
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
BE
EH
olB
EF
Oth
BE
EO
thB
DE
Oth
BE
BO
thB
EA
Bus
BE
FB
usB
DC
Oth
BA
CH
olB
EB
Bus
BE
AV
isB
BA
Hol
BD
EH
olB
AB
Oth
BA
AV
isB
AA
Hol
BD
CH
olB
BA
Bus
BC
BH
olB
EG
Bus
BD
DV
isB
AB
Vis
BD
AV
isB
EA
Oth
BD
FH
olB
EE
Bus
BA
AO
thB
AC
Oth
BD
AO
thB
DE
Bus
BC
BO
thB
AC
Bus
BE
BV
isB
AC
Vis
BC
AO
thB
EF
Vis
BC
BV
isB
ED
Hol
BE
GO
thB
DB
Hol
BA
BB
usB
EB
Hol
BD
FB
usB
EC
Hol
BC
AH
olB
DB
Oth
BE
AH
olB
DC
Bus
BE
CV
isB
DB
Vis
BC
CH
olB
BA
Vis
BA
BH
olB
BA
Oth
BC
CO
thB
CB
Bus
BC
CV
isB
EG
Vis
BD
DH
olB
EC
Oth
BD
CV
isB
AA
Bus
BC
CB
usB
EC
Bus
BC
AV
isB
DF
Vis
BE
GH
olB
DD
Oth
BE
DO
thB
ED
Vis
BD
DB
usB
DE
Vis
BE
FH
olB
EE
Vis
BD
BB
usB
DA
Bus
BD
AH
olB
CA
Bus
BD
FO
thB
ED
Bus
BEEHolBEFOthBEEOthBDEOthBEBOthBEABusBEFBusBDCOthBACHolBEBBusBEAVisBBAHolBDEHolBABOthBAAVisBAAHolBDCHolBBABusBCBHolBEGBusBDDVisBABVisBDAVisBEAOthBDFHolBEEBusBAAOthBACOthBDAOthBDEBusBCBOthBACBusBEBVisBACVisBCAOthBEFVisBCBVisBEDHolBEGOthBDBHolBABBusBEBHolBDFBusBECHolBCAHolBDBOthBEAHolBDCBusBECVisBDBVisBCCHolBBAVisBABHolBBAOthBCCOthBCBBusBCCVisBEGVisBDDHolBECOthBDCVisBAABusBCCBusBECBusBCAVisBDFVisBEGHolBDDOthBEDOthBEDVisBDDBusBDEVisBEFHolBEEVisBDBBusBDABusBDAHolBCABusBDFOthBEDBus
Compute the correlations amongthe remainder components
Render both the sign andmagnitude using a colour mappingof two hues
Order variables according to thefirst principal component of thecorrelations.
-
Corrgram of remainder
Visualisation of big time series data Australian tourism demand 181
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
BD
AH
ol
BD
DH
ol
BE
BH
ol
BE
FH
ol
BE
CH
ol
BE
DH
ol
BD
FH
ol
BC
CH
ol
BD
CH
ol
BC
AH
ol
BE
AH
ol
BE
GH
ol
BB
AH
ol
BA
AH
ol
BA
BH
ol
BD
BH
ol
BD
EH
ol
BA
CH
ol
BC
BH
ol
BE
EH
ol
BDAHol
BDDHol
BEBHol
BEFHol
BECHol
BEDHol
BDFHol
BCCHol
BDCHol
BCAHol
BEAHol
BEGHol
BBAHol
BAAHol
BABHol
BDBHol
BDEHol
BACHol
BCBHol
BEEHol
-
Corrgram of remainder: TAS
Visualisation of big time series data Australian tourism demand 191
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
FC
AH
ol
FB
BH
ol
FB
AH
ol
FAA
Hol
FC
BH
ol
FC
AV
is
FB
BV
is
FAA
Vis
FC
BB
us
FAA
Oth
FC
AO
th
FB
BO
th
FB
AB
us
FB
AO
th
FC
BV
is
FC
AB
us
FB
AV
is
FC
BO
th
FB
BB
us
FAA
Bus
FCAHol
FBBHol
FBAHol
FAAHol
FCBHol
FCAVis
FBBVis
FAAVis
FCBBus
FAAOth
FCAOth
FBBOth
FBABus
FBAOth
FCBVis
FCABus
FBAVis
FCBOth
FBBBus
FAABus
-
Outline
1 The problem
2 Australian tourism demand
3 M3 competition data
4 Yahoo web traffic
5 What next?
Visualisation of big time series data M3 competition data 20
-
M3 forecasting competition
Visualisation of big time series data M3 competition data 21
-
M3 forecasting competition
Visualisation of big time series data M3 competition data 21
-
M3 forecasting competition
The M3-Competition is a final attempt by the authors tosettle the accuracy issue of various time series methods. . .The extension involves the inclusion of more methods/researchers (in particular in the areas of neural networksand expert systems) and more series.
Makridakis & Hibon, IJF 2000
3003 series
All data from business, demography, finance andeconomics.
Series length between 14 and 126.
Either non-seasonal, monthly or quarterly.
All time series positive.
Visualisation of big time series data M3 competition data 22
-
M3 forecasting competition
Visualisation of big time series data M3 competition data 23
-
Candidate features
STL decompositionYt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 Var(Rt)Var(YtTt)Strength of trend: 1 Var(Rt)Var(YtSt)Spectral entropy: H =
fy() log fy()d,
where fy() is spectral density of Yt.Low values of H suggest a time series that iseasier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter Visualisation of big time series data M3 competition data 24
-
Candidate features
STL decompositionYt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 Var(Rt)Var(YtTt)Strength of trend: 1 Var(Rt)Var(YtSt)Spectral entropy: H =
fy() log fy()d,
where fy() is spectral density of Yt.Low values of H suggest a time series that iseasier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter Visualisation of big time series data M3 competition data 24
-
Candidate features
STL decompositionYt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 Var(Rt)Var(YtTt)Strength of trend: 1 Var(Rt)Var(YtSt)Spectral entropy: H =
fy() log fy()d,
where fy() is spectral density of Yt.Low values of H suggest a time series that iseasier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter Visualisation of big time series data M3 competition data 24
-
Candidate features
STL decompositionYt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 Var(Rt)Var(YtTt)Strength of trend: 1 Var(Rt)Var(YtSt)Spectral entropy: H =
fy() log fy()d,
where fy() is spectral density of Yt.Low values of H suggest a time series that iseasier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter Visualisation of big time series data M3 competition data 24
-
Candidate features
STL decompositionYt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 Var(Rt)Var(YtTt)Strength of trend: 1 Var(Rt)Var(YtSt)Spectral entropy: H =
fy() log fy()d,
where fy() is spectral density of Yt.Low values of H suggest a time series that iseasier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter Visualisation of big time series data M3 competition data 24
-
Candidate features
STL decompositionYt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 Var(Rt)Var(YtTt)Strength of trend: 1 Var(Rt)Var(YtSt)Spectral entropy: H =
fy() log fy()d,
where fy() is spectral density of Yt.Low values of H suggest a time series that iseasier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter Visualisation of big time series data M3 competition data 24
-
Candidate features
STL decompositionYt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 Var(Rt)Var(YtTt)Strength of trend: 1 Var(Rt)Var(YtSt)Spectral entropy: H =
fy() log fy()d,
where fy() is spectral density of Yt.Low values of H suggest a time series that iseasier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter Visualisation of big time series data M3 competition data 24
-
Candidate features
Visualisation of big time series data M3 competition data 25
Seasonality
N00
01
1976 1978 1980 1982 1984 1986 1988
1000
3000
5000
N15
02
1978 1980 1982 1984 1986
010
000
2000
0
N30
03
1984 1986 1988 1990 1992
2000
6000
1000
0
-
Candidate features
Visualisation of big time series data M3 competition data 25
Trend
N00
01
1976 1978 1980 1982 1984 1986 1988
2000
4000
6000
N15
02
1982 1984 1986 1988 1990 1992
3000
5000
N30
03
1975 1980 1985100
040
0070
00
-
Candidate features
Visualisation of big time series data M3 competition data 25
ACF1
N00
01
1987 1988 1989 1990
5800
6000
6200
N15
02
1987 1988 1989 1990 1991
3000
5000
7000
N30
03
1984 1986 1988 1990 1992
7000
8000
9000
-
Candidate features
Visualisation of big time series data M3 competition data 25
Spectral entropy
N00
01
1964 1966 1968 1970 1972 1974
2500
4000
5500
N15
02
1986 1988 1990 1992
3000
4500
N30
03
1976 1978 1980 1982 1984 1986 1988200
024
0028
00
-
Candidate features
Visualisation of big time series data M3 competition data 25
Box Cox
N00
05
1976 1978 1980 1982 1984 1986 1988
4500
6000
N22
69
1984 1986 1988 1990 1992
4200
4800
5400
N30
03
0 10 20 30 40 50 60
3500
4500
5500
-
Candidate features
Visualisation of big time series data M3 competition data 26
SpecEntr
0.0 0.4 0.8 2 6 10 0.0 0.4 0.8
0.5
0.9
0.0
0.6
Trend
Season
0.0
0.6
28 Freq
ACF
0.
40.
6
0.5 0.7 0.9
0.0
0.6
0.0 0.4 0.8 0.4 0.2 0.8
Lambda
-
Dimension reduction for time series
Visualisation of big time series data M3 competition data 27
-
Dimension reduction for time series
Visualisation of big time series data M3 competition data 27
SpecEntr
0.0 0.4 0.8 2 6 10 0.0 0.4 0.8
0.5
0.9
0.0
0.6
Trend
Season
0.0
0.6
28 Freq
ACF
0.
40.
6
0.5 0.7 0.9
0.0
0.6
0.0 0.4 0.8 0.4 0.2 0.8
Lambda
Featurecalculation
-
Dimension reduction for time series
Visualisation of big time series data M3 competition data 27
SpecEntr
0.0 0.4 0.8 2 6 10 0.0 0.4 0.8
0.5
0.9
0.0
0.6
Trend
Season
0.0
0.6
28 Freq
ACF
0.
40.
6
0.5 0.7 0.9
0.0
0.6
0.0 0.4 0.8 0.4 0.2 0.8
Lambda
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
Featurecalculation
Principalcomponentdecomposition
-
Feature space of M3 data
Visualisation of big time series data M3 competition data 28
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
First two PCs explain 68% of variation.
-
Feature space of M3 data
Visualisation of big time series data M3 competition data 28
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
3
6
9
12value
Freq
-
Feature space of M3 data
Visualisation of big time series data M3 competition data 28
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
0.00
0.25
0.50
0.75
value
Season
-
Feature space of M3 data
Visualisation of big time series data M3 competition data 28
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
0.25
0.50
0.75
value
Trend
-
Feature space of M3 data
Visualisation of big time series data M3 competition data 28
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
0.0
0.5
value
ACF
-
Feature space of M3 data
Visualisation of big time series data M3 competition data 28
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
0.50.60.70.80.9
value
SpecEntr
-
Feature space of M3 data
Visualisation of big time series data M3 competition data 28
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
0.000.250.500.751.00
value
Lambda
-
Predictability
Three general forecasting methods:
Theta method Best overall in 2000 M3competition
ETS Exponential smoothing statespace models
STL-AR AR model applied to seasonallyadjusted series from STL, andseasonal component forecastusing the seasonal naive method.
Compute minimum MASE from all three methods
Visualisation of big time series data M3 competition data 29
-
Predictability
Three general forecasting methods:
Theta method Best overall in 2000 M3competition
ETS Exponential smoothing statespace models
STL-AR AR model applied to seasonallyadjusted series from STL, andseasonal component forecastusing the seasonal naive method.
Compute minimum MASE from all three methods
Visualisation of big time series data M3 competition data 29
-
Predictability
Visualisation of big time series data M3 competition data 30
Theta
1975 1980 1985 1990
2000
4000
6000
8000
1000
0
-
Predictability
Visualisation of big time series data M3 competition data 30
ETS
1975 1980 1985 1990
2000
4000
6000
8000
1000
0
-
Predictability
Visualisation of big time series data M3 competition data 30
AR
1975 1980 1985 1990
2000
4000
6000
8000
1000
0
-
Predictability
Visualisation of big time series data M3 competition data 31
Theta
1980 1982 1984 1986 1988 1990 1992
3000
4000
5000
6000
-
Predictability
Visualisation of big time series data M3 competition data 31
ETS
1980 1982 1984 1986 1988 1990 1992
3000
4000
5000
6000
-
Predictability
Visualisation of big time series data M3 competition data 31
STLAR
1980 1982 1984 1986 1988 1990 1992
3000
4000
5000
6000
-
Predictability
Visualisation of big time series data M3 competition data 32
Theta
1984 1986 1988 1990 1992 1994
6000
6500
7000
7500
8000
-
Predictability
Visualisation of big time series data M3 competition data 32
ETS
1984 1986 1988 1990 1992 1994
6000
6500
7000
7500
8000
-
Predictability
Visualisation of big time series data M3 competition data 32
STLAR
1984 1986 1988 1990 1992 1994
6000
6500
7000
7500
8000
-
Predictability
Visualisation of big time series data M3 competition data 33
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2Low
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
Middle
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
High
LowMASE values
-
Predictability
Visualisation of big time series data M3 competition data 33
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
Low
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2Middle
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
High
MediumMASE values
-
Predictability
Visualisation of big time series data M3 competition data 33
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
Low
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
Middle
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2High
HighMASE values
-
Predictability
Visualisation of big time series data M3 competition data 34
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Best
EtsNoDiffStlmarTheta
Yearly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2 Best
NoDiffStlmar
Yearly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Best
EtsNoDiffStlmarTheta
Quarterly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2 Best
EtsNoDiffStlmar
Quarterly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Best
EtsNoDiffStlmarTheta
Monthly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2 Best
EtsNoDiffStlmar
Monthly data
Actual SVM prediction
-
Predictability
Visualisation of big time series data M3 competition data 34
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Best
EtsNoDiffStlmarTheta
Yearly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2 Best
NoDiffStlmar
Yearly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Best
EtsNoDiffStlmarTheta
Quarterly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2 Best
EtsNoDiffStlmar
Quarterly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Best
EtsNoDiffStlmarTheta
Monthly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2 Best
EtsNoDiffStlmar
Monthly dataActual SVM prediction
-
Predictability
Visualisation of big time series data M3 competition data 34
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Best
EtsNoDiffStlmarTheta
Yearly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2 Best
NoDiffStlmar
Yearly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Best
EtsNoDiffStlmarTheta
Quarterly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2 Best
EtsNoDiffStlmar
Quarterly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Best
EtsNoDiffStlmarTheta
Monthly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2 Best
EtsNoDiffStlmar
Monthly data
Actual SVM prediction
-
Generating new time series
We can use the feature space to:
Generate new time series with similar features toexisting series
Generate new time series where there are holes inthe feature space.
Let {PC1,PC2, . . . ,PCn} be a population of timeseries of specified length and period.Genetic algorithm uses a process of selection,crossover and mutation to evolve the populationtowards a target point Ti.Optimize: Fitness (PCj) =
(|PCj Ti|2).
Initial population random with some series inneighbourhood of Ti.
Visualisation of big time series data M3 competition data 35
-
Generating new time series
We can use the feature space to:
Generate new time series with similar features toexisting series
Generate new time series where there are holes inthe feature space.
Let {PC1,PC2, . . . ,PCn} be a population of timeseries of specified length and period.Genetic algorithm uses a process of selection,crossover and mutation to evolve the populationtowards a target point Ti.Optimize: Fitness (PCj) =
(|PCj Ti|2).
Initial population random with some series inneighbourhood of Ti.
Visualisation of big time series data M3 competition data 35
-
Generating new time series
We can use the feature space to:
Generate new time series with similar features toexisting series
Generate new time series where there are holes inthe feature space.
Let {PC1,PC2, . . . ,PCn} be a population of timeseries of specified length and period.Genetic algorithm uses a process of selection,crossover and mutation to evolve the populationtowards a target point Ti.Optimize: Fitness (PCj) =
(|PCj Ti|2).
Initial population random with some series inneighbourhood of Ti.
Visualisation of big time series data M3 competition data 35
-
Generating new time series
We can use the feature space to:
Generate new time series with similar features toexisting series
Generate new time series where there are holes inthe feature space.
Let {PC1,PC2, . . . ,PCn} be a population of timeseries of specified length and period.Genetic algorithm uses a process of selection,crossover and mutation to evolve the populationtowards a target point Ti.Optimize: Fitness (PCj) =
(|PCj Ti|2).
Initial population random with some series inneighbourhood of Ti.
Visualisation of big time series data M3 competition data 35
-
Generating new time series
We can use the feature space to:
Generate new time series with similar features toexisting series
Generate new time series where there are holes inthe feature space.
Let {PC1,PC2, . . . ,PCn} be a population of timeseries of specified length and period.Genetic algorithm uses a process of selection,crossover and mutation to evolve the populationtowards a target point Ti.Optimize: Fitness (PCj) =
(|PCj Ti|2).
Initial population random with some series inneighbourhood of Ti.
Visualisation of big time series data M3 competition data 35
-
Evolving new time series
Visualisation of big time series data M3 competition data 36
A
B
C
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
-
Evolving new time series
Visualisation of big time series data M3 competition data 36
Targ
et A
1950 1960 1970 1980 1990
2000
6000
Evo
lved
A
0 5 10 15 20 25 30
4400
4800
5200
Targ
et B
1980 1985 1990 1995
3000
5000
7000
Time
Evo
lved
B5 10 15
5000
7000
Targ
et C
1982 1984 1986 1988 1990 1992 1994
2000
4000
Evo
lved
C
0 5 10 15 20 25 30
3000
5000
7000
-
Evolving new time series
Visualisation of big time series data M3 competition data 37
D
E
F
3
2
1
0
1
2
3
2 0 2 4PC1
PC
2
-
Evolving new time series
Visualisation of big time series data M3 competition data 37
Evo
lved
D
0 5 10 15 20 25 30
3000
5000
7000
Evo
lved
E
5 10 15
4000
8000
1200
0
Evo
lved
F
2 4 6 8 10
020
000
4000
0
-
Evolving new time series
Visualisation of big time series data M3 competition data 38
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Targets
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved yearly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved quarterly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved monthly data
-
Evolving new time series
Visualisation of big time series data M3 competition data 38
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Targets
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved yearly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved quarterly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved monthly data
-
Evolving new time series
Visualisation of big time series data M3 competition data 38
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Targets
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved yearly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved quarterly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved monthly data
-
Evolving new time series
Visualisation of big time series data M3 competition data 38
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Targets
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved yearly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved quarterly data
4
2
0
2
4
2 0 2 4 6PC1
PC
2
Evolved monthly data
-
Questions raised
Can SVM be used to create a forecast selectionroutine to give better forecasts?
How much do M3 conclusions depend on theparticular set of time series involved?
Has the M3 data set biased forecast methoddevelopment?
What other features should we consider? Whatdifference does it make?
Is PCA the right approach? Perhaps we shoulduse multidimensional scaling? Or somethingelse?
Should we use more than 2 PC dimensions?Visualisation of big time series data M3 competition data 39
-
Questions raised
Can SVM be used to create a forecast selectionroutine to give better forecasts?
How much do M3 conclusions depend on theparticular set of time series involved?
Has the M3 data set biased forecast methoddevelopment?
What other features should we consider? Whatdifference does it make?
Is PCA the right approach? Perhaps we shoulduse multidimensional scaling? Or somethingelse?
Should we use more than 2 PC dimensions?Visualisation of big time series data M3 competition data 39
-
Questions raised
Can SVM be used to create a forecast selectionroutine to give better forecasts?
How much do M3 conclusions depend on theparticular set of time series involved?
Has the M3 data set biased forecast methoddevelopment?
What other features should we consider? Whatdifference does it make?
Is PCA the right approach? Perhaps we shoulduse multidimensional scaling? Or somethingelse?
Should we use more than 2 PC dimensions?Visualisation of big time series data M3 competition data 39
-
Questions raised
Can SVM be used to create a forecast selectionroutine to give better forecasts?
How much do M3 conclusions depend on theparticular set of time series involved?
Has the M3 data set biased forecast methoddevelopment?
What other features should we consider? Whatdifference does it make?
Is PCA the right approach? Perhaps we shoulduse multidimensional scaling? Or somethingelse?
Should we use more than 2 PC dimensions?Visualisation of big time series data M3 competition data 39
-
Questions raised
Can SVM be used to create a forecast selectionroutine to give better forecasts?
How much do M3 conclusions depend on theparticular set of time series involved?
Has the M3 data set biased forecast methoddevelopment?
What other features should we consider? Whatdifference does it make?
Is PCA the right approach? Perhaps we shoulduse multidimensional scaling? Or somethingelse?
Should we use more than 2 PC dimensions?Visualisation of big time series data M3 competition data 39
-
Questions raised
Can SVM be used to create a forecast selectionroutine to give better forecasts?
How much do M3 conclusions depend on theparticular set of time series involved?
Has the M3 data set biased forecast methoddevelopment?
What other features should we consider? Whatdifference does it make?
Is PCA the right approach? Perhaps we shoulduse multidimensional scaling? Or somethingelse?
Should we use more than 2 PC dimensions?Visualisation of big time series data M3 competition data 39
-
Outline
1 The problem
2 Australian tourism demand
3 M3 competition data
4 Yahoo web traffic
5 What next?
Visualisation of big time series data Yahoo web traffic 40
-
Yahoo web-trafficTens of thousands of time series collected atone-hour intervals over one month.Consisting of several server metrics (e.g. CPU usageand paging views) from many server farms globally.Aim: find unusual (anomalous) time series.
Visualisation of big time series data Yahoo web traffic 41
-
Yahoo web-traffic
3
6
9
10
20
30
40
1020304050
1
2
3
4
25
50
75
100
bu
sy2
33
bu
sy2
71
bu
sy5
0bu
sy2
00
bu
sy3
69
20
14
1
1
09
20
14
1
1
10
20
14
1
1
11
20
14
1
1
12
20
14
1
1
13
20
14
1
1
14
20
14
1
1
15
20
14
1
1
16
20
14
1
1
17
20
14
1
1
18
20
14
1
1
19
20
14
1
1
20
20
14
1
1
21
20
14
1
1
22
20
14
1
1
23
20
14
1
1
24
20
14
1
1
25
20
14
1
1
26
20
14
1
1
27
20
14
1
1
28
20
14
1
1
29
20
14
1
1
30
20
14
1
2
01
20
14
1
2
02
20
14
1
2
03
20
14
1
2
04
20
14
1
2
05
20
14
1
2
06
20
14
1
2
07
20
14
1
2
08
20
14
1
2
09
20
14
1
2
10
20
14
1
2
11
20
14
1
2
12
date
va
lue
25
30
35
40
45
20
25
30
35
40
50
60
70
10
15
20
25
50
60
me
mo
ry4
60
me
mo
ry4
29
me
mo
ry1
47
me
mo
ry4
13
me
mo
ry4
84
20
14
1
1
09
20
14
1
1
10
20
14
1
1
11
20
14
1
1
12
20
14
1
1
13
20
14
1
1
14
20
14
1
1
15
20
14
1
1
16
20
14
1
1
17
20
14
1
1
18
20
14
1
1
19
20
14
1
1
20
20
14
1
1
21
20
14
1
1
22
20
14
1
1
23
20
14
1
1
24
20
14
1
1
25
20
14
1
1
26
20
14
1
1
27
20
14
1
1
28
20
14
1
1
29
20
14
1
1
30
20
14
1
2
01
20
14
1
2
02
20
14
1
2
03
20
14
1
2
04
20
14
1
2
05
20
14
1
2
06
20
14
1
2
07
20
14
1
2
08
20
14
1
2
09
20
14
1
2
10
20
14
1
2
11
20
14
1
2
12
date
va
lue
0
5000
10000
15000
20000
200
400
600
0
5000
10000
15000
20000
500
1000
0
5000
10000
15000
20000
25000
pa
gin
g5
3p
ag
ing
46
7p
ag
ing
37
1p
ag
ing
33
7p
ag
ing
36
7
20
14
1
1
09
20
14
1
1
10
20
14
1
1
11
20
14
1
1
12
20
14
1
1
13
20
14
1
1
14
20
14
1
1
15
20
14
1
1
16
20
14
1
1
17
20
14
1
1
18
20
14
1
1
19
20
14
1
1
20
20
14
1
1
21
20
14
1
1
22
20
14
1
1
23
20
14
1
1
24
20
14
1
1
25
20
14
1
1
26
20
14
1
1
27
20
14
1
1
28
20
14
1
1
29
20
14
1
1
30
20
14
1
2
01
20
14
1
2
02
20
14
1
2
03
20
14
1
2
04
20
14
1
2
05
20
14
1
2
06
20
14
1
2
07
20
14
1
2
08
20
14
1
2
09
20
14
1
2
10
20
14
1
2
11
20
14
1
2
12
date
va
lue
Visualisation of big time series data Yahoo web traffic 42
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Feature spaceACF1: first order autocorrelation = Corr(Yt, Yt1)Strength of trend and seasonality based on STLTrend linearity and curvatureSize of seasonal peak and troughSpectral entropyLumpiness: variance of block variances (block size 24).Spikiness: variances of leave-one-out variances of STL remainders.Level shift: Maximum difference in trimmed means of consecutivemoving windows of size 24.Variance change: Max difference in variances of consecutivemoving windows of size 24.Flat spots: Discretize sample space into 10 equal-sized intervals.Find max run length in any interval.Number of crossing points of mean line.Kullback-Leibler score: Maximum ofDKL(PQ) =
P(x) ln P(x)/Q(x)dx where P and Q are estimated by
kernel density estimators applied to consecutive windows of size 48.Change index: Time of maximum KL score
Visualisation of big time series data Yahoo web traffic 43
-
Principal component analysis
ACF1
lumpin
ess
entropy
lshiftvchange
cpoints
fspo
ts
trend
linearity
curvature
spikin
ess
seas
onpeak
trou
gh
klscore
chan
ge.id
x
4
2
0
2
2.5 0.0 2.5standardized PC1 (28.7% explained var.)
stan
dard
ized
PC
2 (1
7.3%
exp
lain
ed v
ar.)
Visualisation of big time series data Yahoo web traffic 44
-
What is anomalous
ACF1
lumpin
ess
entropy
lshiftvchange
cpoints
fspo
ts
trend
linearity
curvature
spikin
ess
seas
onpeak
trou
gh
klscore
chan
ge.id
x
4
2
0
2
2.5 0.0 2.5standardized PC1 (28.7% explained var.)
stan
dard
ized
PC
2 (1
7.3%
exp
lain
ed v
ar.)
We need a measure of the anomalousness of a timeseries.
1 Rank points based on their local density.2 Rank points based on whether they are within
-convex hulls of different radius.Visualisation of big time series data Yahoo web traffic 45
-
What is anomalous
ACF1
lumpin
ess
entropy
lshiftvchange
cpoints
fspo
ts
trend
linearity
curvature
spikin
ess
seas
onpeak
trou
gh
klscore
chan
ge.id
x
4
2
0
2
2.5 0.0 2.5standardized PC1 (28.7% explained var.)
stan
dard
ized
PC
2 (1
7.3%
exp
lain
ed v
ar.)
We need a measure of the anomalousness of a timeseries.
1 Rank points based on their local density.2 Rank points based on whether they are within
-convex hulls of different radius.Visualisation of big time series data Yahoo web traffic 45
-
What is anomalous
ACF1
lumpin
ess
entropy
lshiftvchange
cpoints
fspo
ts
trend
linearity
curvature
spikin
ess
seas
onpeak
trou
gh
klscore
chan
ge.id
x
4
2
0
2
2.5 0.0 2.5standardized PC1 (28.7% explained var.)
stan
dard
ized
PC
2 (1
7.3%
exp
lain
ed v
ar.)
We need a measure of the anomalousness of a timeseries.
1 Rank points based on their local density.2 Rank points based on whether they are within
-convex hulls of different radius.Visualisation of big time series data Yahoo web traffic 45
-
Bivariate kernel density
f(x;H) =1
n
ni=1
KH(x Xi)
Xi a bivariate random sample {X1,X2, . . . ,Xn}KH(x) is the standard normal kernel function
H estimated by minimizing the sum of AMISE
Rank points based on f values in 2d PCA space.
Visualisation of big time series data Yahoo web traffic 46
-
Bivariate kernel density
f(x;H) =1
n
ni=1
KH(x Xi)
Xi a bivariate random sample {X1,X2, . . . ,Xn}KH(x) is the standard normal kernel function
H estimated by minimizing the sum of AMISE
Rank points based on f values in 2d PCA space.
Visualisation of big time series data Yahoo web traffic 46
-
Bivariate density ranking
Visualisation of big time series data Yahoo web traffic 47
5 0 5
8
6
4
2
02
46
pc1
pc2
1
2
3
45
-
Bivariate density ranking
Visualisation of big time series data Yahoo web traffic 47
010000200003000040000
0200040006000
01000020000300004000050000
010000200003000040000
010002000300040005000
S7793
S8494
S10464
S7833
S1715
2015
02
28
2015
03
01
2015
03
02
2015
03
03
2015
03
04
2015
03
05
2015
03
06
2015
03
07
2015
03
08
2015
03
09
2015
03
10
2015
03
11
2015
03
12
2015
03
13
2015
03
14
2015
03
15
2015
03
16
2015
03
17
2015
03
18
2015
03
19
2015
03
20
2015
03
21
2015
03
22
2015
03
23
2015
03
24
2015
03
25
2015
03
26
2015
03
27
2015
03
28
2015
03
29
2015
03
30
2015
03
31
2015
04
01
date
valu
e
-
-convex hullsThe space generated by point pairs that can betouched by an empty disc of radius .
gives a convex hull.Points can become isolated when is small.
We rank points based on the value of whenthey become isolated.
Visualisation of big time series data Yahoo web traffic 48
-
-convex hullsThe space generated by point pairs that can betouched by an empty disc of radius .
gives a convex hull.Points can become isolated when is small.
We rank points based on the value of whenthey become isolated.
Visualisation of big time series data Yahoo web traffic 48
-
-convex hullsThe space generated by point pairs that can betouched by an empty disc of radius .
gives a convex hull.Points can become isolated when is small.
We rank points based on the value of whenthey become isolated.
Visualisation of big time series data Yahoo web traffic 48
-
-convex hullsThe space generated by point pairs that can betouched by an empty disc of radius .
gives a convex hull.Points can become isolated when is small.
We rank points based on the value of whenthey become isolated.
Visualisation of big time series data Yahoo web traffic 48
-
-convex hull
Visualisation of big time series data Yahoo web traffic 49
-
-convex hull ranking
Visualisation of big time series data Yahoo web traffic 50
5 0 5
8
6
4
2
02
46
12
3
4
5
-
-convex hull ranking
Visualisation of big time series data Yahoo web traffic 50
01000020000300004000050000
010000200003000040000
0200040006000
010002000300040005000
0100002000030000
S10464
S7793
S8494
S1715
S7826
2015
02
28
2015
03
01
2015
03
02
2015
03
03
2015
03
04
2015
03
05
2015
03
06
2015
03
07
2015
03
08
2015
03
09
2015
03
10
2015
03
11
2015
03
12
2015
03
13
2015
03
14
2015
03
15
2015
03
16
2015
03
17
2015
03
18
2015
03
19
2015
03
20
2015
03
21
2015
03
22
2015
03
23
2015
03
24
2015
03
25
2015
03
26
2015
03
27
2015
03
28
2015
03
29
2015
03
30
2015
03
31
2015
04
01
date
valu
e
-
HDR versus -convex hull
HDR boxplot
5 0 5
8
6
4
2
02
46
pc1
pc2
1
2
3
45
-convex hull
5 0 5
8
6
4
20
24
6
12
3
4
5
Visualisation of big time series data Yahoo web traffic 51
-
Top 5 anomalous time series
HDR0
10000200003000040000
0200040006000
01000020000300004000050000
010000200003000040000
010002000300040005000
S7793
S8494
S10464
S7833
S1715
2015
02
28
2015
03
01
2015
03
02
2015
03
03
2015
03
04
2015
03
05
2015
03
06
2015
03
07
2015
03
08
2015
03
09
2015
03
10
2015
03
11
2015
03
12
2015
03
13
2015
03
14
2015
03
15
2015
03
16
2015
03
17
2015
03
18
2015
03
19
2015
03
20
2015
03
21
2015
03
22
2015
03
23
2015
03
24
2015
03
25
2015
03
26
2015
03
27
2015
03
28
2015
03
29
2015
03
30
2015
03
31
2015
04
01
date
valu
e
-convex hull0
1000020000300004000050000
010000200003000040000
0200040006000
010002000300040005000
0100002000030000
S10464
S7793
S8494
S1715
S7826
2015
02
28
2015
03
01
2015
03
02
2015
03
03
2015
03
04
2015
03
05
2015
03
06
2015
03
07
2015
03
08
2015
03
09
2015
03
10
2015
03
11
2015
03
12
2015
03
13
2015
03
14
2015
03
15
2015
03
16
2015
03
17
2015
03
18
2015
03
19
2015
03
20
2015
03
21
2015
03
22
2015
03
23
2015
03
24
2015
03
25
2015
03
26
2015
03
27
2015
03
28
2015
03
29
2015
03
30
2015
03
31
2015
04
01
date
valu
e
Visualisation of big time series data Yahoo web traffic 52
-
Outline
1 The problem
2 Australian tourism demand
3 M3 competition data
4 Yahoo web traffic
5 What next?
Visualisation of big time series data What next? 53
-
What next?
Develop a more comprehensive set of featuresthat are reliable measures and fast to compute.e.g., for finance data.Consider other dimension reduction methodsand more than 2 dimensions.Develop dynamic and interactive visualizationtools.Make methods available in an R package.
Some of the methods are already available in theanomalous package for R on github.
Papers: robjhyndman.com
Code: github.com/robjhyndman
Email: [email protected]
Visualisation of big time series data What next? 54
-
What next?
Develop a more comprehensive set of featuresthat are reliable measures and fast to compute.e.g., for finance data.Consider other dimension reduction methodsand more than 2 dimensions.Develop dynamic and interactive visualizationtools.Make methods available in an R package.
Some of the methods are already available in theanomalous package for R on github.
Papers: robjhyndman.com
Code: github.com/robjhyndman
Email: [email protected]
Visualisation of big time series data What next? 54
-
What next?
Develop a more comprehensive set of featuresthat are reliable measures and fast to compute.e.g., for finance data.Consider other dimension reduction methodsand more than 2 dimensions.Develop dynamic and interactive visualizationtools.Make methods available in an R package.
Some of the methods are already available in theanomalous package for R on github.
Papers: robjhyndman.com
Code: github.com/robjhyndman
Email: [email protected]
Visualisation of big time series data What next? 54
-
What next?
Develop a more comprehensive set of featuresthat are reliable measures and fast to compute.e.g., for finance data.Consider other dimension reduction methodsand more than 2 dimensions.Develop dynamic and interactive visualizationtools.Make methods available in an R package.
Some of the methods are already available in theanomalous package for R on github.
Papers: robjhyndman.com
Code: github.com/robjhyndman
Email: [email protected]
Visualisation of big time series data What next? 54
-
What next?
Develop a more comprehensive set of featuresthat are reliable measures and fast to compute.e.g., for finance data.Consider other dimension reduction methodsand more than 2 dimensions.Develop dynamic and interactive visualizationtools.Make methods available in an R package.
Some of the methods are already available in theanomalous package for R on github.
Papers: robjhyndman.com
Code: github.com/robjhyndman
Email: [email protected]
Visualisation of big time series data What next? 54
-
What next?
Develop a more comprehensive set of featuresthat are reliable measures and fast to compute.e.g., for finance data.Consider other dimension reduction methodsand more than 2 dimensions.Develop dynamic and interactive visualizationtools.Make methods available in an R package.
Some of the methods are already available in theanomalous package for R on github.
Papers: robjhyndman.com
Code: github.com/robjhyndman
Email: [email protected]
Visualisation of big time series data What next? 54
-
What next?
Develop a more comprehensive set of featuresthat are reliable measures and fast to compute.e.g., for finance data.Consider other dimension reduction methodsand more than 2 dimensions.Develop dynamic and interactive visualizationtools.Make methods available in an R package.
Some of the methods are already available in theanomalous package for R on github.
Papers: robjhyndman.com
Code: github.com/robjhyndman
Email: [email protected]
Visualisation of big time series data What next? 54
The problemAustralian tourism demandM3 competition dataYahoo web trafficWhat next?