AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Accelerating HAC Estimationfor Multivariate Time Series
Ce Guo and Wayne Luk
Imperial College London
June 5, 2013
1 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Multivariate Time Series
Multivariate time series: sequences taken from random processat identical interval in the form
y = 〈y1 , y2 , . . . , yT 〉 (1)
where
T : number of data points (positive integer)
yi : a data instance (D-dimensional column vector)
yi = [yi ,1 yi ,2 . . . yi ,D ]′ (2)
A’: the transpose of A
The d-th component:
y∗,d = 〈y1,d y2,d . . . yT ,d〉 (3)
2 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Multivariate Time Series: An Example
0 5 10 15 20 25 3030
40
50
60
70
80
90
Time
Sto
ck P
rice (
US
D)
Pear
Hugesoft
Daily price data of two stocks
Component 1: stock prices of Pear
Component 2: stock prices of Hugesoft
3 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Multivariate Time Series: Correlation
0 5 10 15 20 25 3030
40
50
60
70
80
90
Time
Sto
ck P
rice
(U
SD
)
Pear
Hugesoft
Daily price data of two stocks
4 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Correlations in Multivariate Time Series
Covariance: degree of correlation of two random variables
Problem: non-temporal covariance measures cannot beapplied to time series
1 2 3 4 5 6 7 8 9 10−5
0
5
10
15
Time
Valu
e
(a) Time Series Data
9 2 1 6 1 0 8 3 7 4−5
0
5
10
15
Time
Va
lue
(b) Reordered Data
5 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Correlations in Multivariate Time Series
Long-run covariance matrix (D ×D): covariances between eachpair of components in the long run
S =∞∑
h=−∞Ωh (4)
Autocovariance matrix (D × D): covariances of ‘state ofvariables at time t’ and ‘state of variables at time (t − h)’
Ωh = E[(yt − E[yt ])(yt−h − E[yt ])′ (5)
Standard correlation measure for time series
Impossible to compute from definition
6 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
HAC Estimation: Newey-West Estimator
Estimating the long-run covariance matrix for a multivariatetime series:
HAC estimation: heteroskedasticity and autocorrelationconsistent estimation
Newey-West estimator
S = Ω0 +H∑
h=1
k(h
H + 1)(Ωh + Ω′h) (6)
where k(·) is a kernel function; H is a truncationparameter (positive integer)
7 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Newey-West Estimator: An Example
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
h=0
h=1
h=2
h=3
cov Ω0
cov Ω1
cov Ω2
cov Ω3
weightedsum S
Example of Newey-West Estimator S . T=6, H=3, lag=h
8 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
HAC Estimation: Computation
Challenges in the computation of the estimator:
Time complexity: O(D2HT )
Real-world time series can be very long andhigh-dimensional
Memory bandwidth bottleneck
9 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Problem: Low Memory Efficiency
Original expression of the Newey-West estimator S :
S = Ω0 +H∑
h=1
k(h
H + 1)(Ωh + Ω′h) (7)
Ωh = E[(yt − µ)(yt−h − µ)′] (8)
Time-consuming part: evaluation of Ωh
10 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Evaluation of Ωh
× × × ×
Σ
data
data (copy)
Ω2
One multiplication and one addition are executed after twodata access operations
Not suitable for reconfigurable computing
11 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Solution: A Different Expression of S
Proposed expression for S :
S =1
T(Ψ + Ψ′) (9)
Ψ: accumulative autocovariance matrix (D × D)
Ψ =G∑
g=0
gc+c−1∑h=gc
wh
T∑t=h+1
utu′t−h (10)
G : maximum group index (integer parameter)wh: Generalised kernel function
wh =
12 if h = 0
k( hH+1) if 0 < h ≤ H
0 otherwise
(11)
ut : Centralised data instance12 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Mathematical Transformation: Insights
Properties of the proposed expression of S :
Mathematically equivalent to the original expression
Time complexity unchanged: O(D2HT )
An algorithm particularly suitable for hardware mappingcan be derived
13 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Parallelising Arithmetic
Seeking for parallelism: rewriting Ψi ,j in vector algebra
Ψi ,j =G∑
g=0
wg ,c rg ,c,i ,j (12)
where
wg ,c =[wgc wgc+1 . . . wgc+c−1
](13)
rg ,c,i ,j =
∑T
t=gc+1 ut,j · ut−gc,i∑Tt=gc+2 ut,j · ut−(gc+1),i
...∑Tt=gc+c ut,j · ut−(gc+c−1),i
(14)
Problem: operations for each entry is different from each other
14 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Simplifying Control Logic
Simplification:
rg ,c,i ,j =
T−gc∑k=1
uk,i
uk+gc,j
uk+gc+1,j...
uk+gc+c−1,j
(15)
15 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Novel HAC Estimation Algorithm: Framework
1: for (i , j) ∈ [1..D]× [1..D] do2: Ψi ,j ← 03: for g ∈ [0..G ] do4: w ←
[wgc wgc+1 . . . wgc+c−1
]5: r ← Pass(g , c , i , j)6: Ψi ,j ← Ψi ,j + w r7: end for8: end for9: return 1
T (Ψ + Ψ′)
Properties:
No complex computation
Query r (D dimensional column vector) from data byinvoking Pass(g,c,i,j)
16 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Novel HAC Estimation Algorithm: Pass(g,c,i,j)
1: r ← 0D×12: for k ∈ [1..(T − gc)] do
3: r ← r + uk,i
uk+gc,j
uk+gc+1,j...
uk+gc+c−1,j
4: end for5: return r
Properties:
No conditionals
Parallelised arithmetic
Potentiality of data reuse
17 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Novel Algorithm for HAC Estimation: Data Reuse
Example: T = 7, c = 3, h = 4g k operation...
......
0 1 r ← r + u1,i [u1,j u2,j u3,j ]′
0 2 r ← r + u2,i [u2,j u3,j u4,j ]′ iteration (m-1)
0 3 r ← r + u3,i [u3,j u4,j u5,j ]′ iteration m
0 4 r ← r + u4,i [u4,j u5,j u6,j ]′
......
...
Iteration m: most data from iteration (m-1)
maximise data reuses
18 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Hardware Design: Elementary Unit
Line 3 of Pass(g , c , i , j):
r ← r + uk,i
uk+gc,j
uk+gc+1,j...
uk+gc+c−1,j
Structure of a bead : multiplication and accumulation
× +
⋯
19 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Hardware Design: Architecture
b1
b2
b3
bc-1
bc...
stream
stream
FIFO buffer
broadcasting buffer
...
High fan-out of broadcasting buffer
can be eliminated by tree-structured data pipelining
One new input each cycle to FIFO buffer: data reuse
reduce memory bandwidth
20 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Advantages of the Proposed Architecture
Improved memory efficiency
Original: one multiplication and one addition after twodata access operationsProposed: c multiplications and c additions after two dataaccess operations (c: number of beads, c = 384 in theexperimental implementation)
Highly parallelised
No complex control logic
21 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Experimental settings
FPGA platform:
Maxeler MAX3 with a V6-SXT475 FPGA at 100MHz
CPU platform:
Intel Xeon CPU at 2.67GHz
Number representation:
IEEE single-precision floating point numbers
22 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Highlights of Experimental Results: Performance
100 200 300 400 500 600
10
20
30
40
50
60
70
80
90
Lag Truncation Parameter, H
Speedup (
tim
es)
exceed FPGA capacity,need extra run
(c) T = 107
200 400 600 800 1000 1200
10
20
30
40
50
60
70
80
90
100
Lag Truncation Parameter, HS
peedup (
tim
es)
Over 1−Core
Over 4−core
Over 8−Core
Over 12−Core
(d) T = 108
23 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Highlights of Experimental Results: Scalability
100 200 30010
20
30
40
50
60
Number of Beads, c
Com
puta
tion T
ime (
seconds)
FPGA
FPGA(P)
(e) Tested Scenario
500 1000 15003
4
5
6
7
8
9
Number of Beads, cC
om
pu
tatio
n T
ime
(se
co
nd
s)
FPGA(E)
FPGA(P)
(f) Extrapolated Scenario
24 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Current and Future work
Integrating the proposed solution into expert systems
Design dimension reduction facilities to handle very highdimensional data
Developing acceleration solution for other problems ontime series
25 / 26
AcceleratingHAC
Estimationfor
MultivariateTime Series
Ce Guo andWayne Luk
HACEstimation
Strategy ofAcceleration
HardwareDesign
Experiments
Summary
Summary
Novel hardware-oriented HAC estimation algorithm:
Avoiding conditionalsParallelising arithmeticPromoting data reuse
Hardware mapping:
Taking full advantage of hardware-friendly properties
Demonstration using V6-SXT475 FPGA:
Up to 111 times speedup over a single-core CPUUp to 14 times speedup over an 8-core CPUPerformance scales well with the amount of resources
26 / 26