evaluating servers using iso-metrics: power, performance...
TRANSCRIPT
Evaluating Servers using Iso-Metrics:Power, Performance and Programmability Implications
Dimitrios S. Nikolopoulos
School of Electronics, Electrical Engineering and Computer ScienceQueen’s University of Belfast
January 20, 2015
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 1 / 30
Motivation: Servers and Micro-Servers
Outline
1 Motivation: Servers and Micro-Servers
2 Micro-servers for Real-Time Analytics
3 Metrics
4 Field Tests
5 Programmability and Iso-Effort
6 Conclusion
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 2 / 30
Motivation: Servers and Micro-Servers
Diversity in the Server Landscape
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 3 / 30
Motivation: Servers and Micro-Servers
Diversity in Workloads
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 4 / 30
Motivation: Servers and Micro-Servers
Diversity in Software
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 5 / 30
Motivation: Servers and Micro-Servers
How do we choose the right server?
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 6 / 30
Micro-servers for Real-Time Analytics
Outline
1 Motivation: Servers and Micro-Servers
2 Micro-servers for Real-Time Analytics
3 Metrics
4 Field Tests
5 Programmability and Iso-Effort
6 Conclusion
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 7 / 30
Micro-servers for Real-Time Analytics
NanoStreams Hardware in a Nutshell
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 8 / 30
Micro-servers for Real-Time Analytics
NanoStreams Software in a Nutshell
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 9 / 30
Micro-servers for Real-Time Analytics
Showcase: Real-Time Option Pricing
Not particularly compute- or data-intensive, low-latency anddata-parallel workloads
Monte Carlo, Black Scholes, Binomial PricingInstance runs in ms or µs, must complete before next tradeHeavily traded symbols trigger Koptions/session
Price = (−1)p(SN((−1)pd1)− Pe−rTN((−1)pd2)
)(1)
Price =e−rT
N
N∑i=1
max
(0,S − Pe(r−σ2
2)T+σ
√Txi
)(2)
u = eσ√T and d =
1
u(3)
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 10 / 30
Metrics
Outline
1 Motivation: Servers and Micro-Servers
2 Micro-servers for Real-Time Analytics
3 Metrics
4 Field Tests
5 Programmability and Iso-Effort
6 Conclusion
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 11 / 30
Metrics
Platform Agnostic Metrics1
Joules/option: Provider-side, sustained throughout trading day,reduction translates to less TCO
Time/option: User-side, end-to-end latency.
QoS: Calculating option before new price arrives; unknown deadline.
1Charles Gillan, Dimitrios S. Nikolopoulos, Giorgis Georgakoudis, Richard Faloon,George Tzenakis and Ivor Spence: On the Viability of Micro-Servers for FinancialAnalytics, In:WHPCF’14–SC’14, New Orleans, LA.
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 12 / 30
Metrics
QoS in Detail
0
10
20
30
40
50
60
70
80
90
100
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
% s
ucc
ess
pri
cing
(all-
or-
noth
ing
)
Processing time (s)
FB 7th July 2014
QoSCumulative Poisson (λ=8, bin-size=0.25)
0
10
20
30
40
50
60
70
80
90
100
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
% s
ucc
ess
pri
cing
(all-
or-
noth
ing
)
Processing time (s)
GOOG 15th July 2014
QoSCumulative Poisson (λ=8, bin-size=0.25)
Cumulative frequency distribution of Facebook and Google stock priceupdates for full trading sessions on July 7th and 15th 2014
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 13 / 30
Metrics
Iso-QoS and Energy
5.5
6
6.5
7
7.5
8
0 20 40 60 80 100 120 140
Inst
anta
neous
CPU
Pow
er
(Watt
s)
Time (Seconds)
QoS(t) = 1− e−λt∑
i=0
λt
bt!c(4)
G ≥ Nopt × Sopt (5)
Egap = Nopt × Jopt (6)
Ngaps = bY × session updates)c(7)
EQoS=Y = Ngaps × Egap (8)
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 14 / 30
Field Tests
Outline
1 Motivation: Servers and Micro-Servers
2 Micro-servers for Real-Time Analytics
3 Metrics
4 Field Tests
5 Programmability and Iso-Effort
6 Conclusion
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 15 / 30
Field Tests
Experimental Platforms
Intel x86-64 Sandy Bridge, 2×octo-core Xeon E5-2650 @2.00GHz,32GB of DRAM (4 × 8GB DDR3 @1600Mhz), Linux CentOS 6.5,2.6.32 (2.6.32 431.17.1.el6.x86 64).
Xeon Phi (Knights Corner) 5110P model over PCIe. Sixty, 4-wayhyperthreaded cores, 512-bit vector unit. 6 GB GDDR5 DRAM. 1.053GHz. Linux kernel 2.6.38.8+mpss3.2.1.
Viridis 2U rack-mounted server with 16 microservers, 10 Gb/sEthernet. Calxeda EnergyCore ECX-1000, 4 ARM [email protected], 4 GB DRAM, Ubuntu 12.04 LTS.
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 16 / 30
Field Tests
Feed Handling
Financial trace data measurement setup
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 17 / 30
Field Tests
Power Measurement
PSU VRM CPU
PRE-PSU PRE-VRM
The path of the current supply to the CPU showing points at which wemeasured power. PSU is the power supply unit and VRM the voltage
regulator module.
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 18 / 30
Field Tests
Rankings in Monte Carlo
Table : MC kernel (N=0.5M and QoS=10%)
Platform VEC TYPE S/Opt J/Opt Energy(KJ)
Viridis(16×4×1) INTRINSICS 0.0038 0.3830 239.85
Intel(2×8×1) AUTOVECT 0.0044 0.3794 237.58
Xeon Phi(1×60×1) KNC512 0.0046 0.2234 139.92
Xeon Phi(1×60×2) NOVECT 0.0036 0.1856 116.26
Xeon Phi(1×60×4) INTRINSICS 0.0030 0.1584 99.19
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 19 / 30
Field Tests
Rankings in BT
Table : BT kernel (N=4000 and QoS=80%)
Platform VEC TYPE S/Opt J/Opt Energy(KJ)
Intel(2×8×1) AVX256 0.0007 0.0611 306.49
Viridis(16×4×1) NEON128 0.0006 0.0603 302.41
Intel(1×8×1) INTRINSICS 0.0013 0.0527 264.32
Xeon Phi(1×60×4) INTRINSICS 0.0005 0.0131 65.88
Xeon Phi(1×60×2) INTRINSICS 0.0004 0.0107 53.50
Xeon Phi(1×60×1) INTRINSICS 0.0004 0.0092 46.27
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 20 / 30
Field Tests
Rankings in BT
Table : BT kernel (N=5000 and QoS=80%)
Platform VEC TYPE S/Opt J/Opt Energy(KJ)
Intel(2×8×1) INTRINSICS 0.0015 0.1180 591.65
Intel(1×8×1) INTRINSICS 0.0022 0.1017 509.69
Viridis(16×4×1) INTRINSICS 0.0010 0.0912 457.05
Xeon Phi(1×60×1) INTRINSICS 0.0006 0.0157 78.58
Xeon Phi(1×60×4) INTRINSICS 0.0006 0.0152 76.23
Xeon Phi(1×60×2) KNC512 0.0005 0.0139 69.76
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 21 / 30
Field Tests
BT Ranking
Table : BT kernel (N=7000 and QoS=80%)
Platform VEC TYPE S/Opt J/Opt Energy(KJ)
Intel(2×8×1) INTRINSICS 0.0032 0.3038 1522.85
Viridis(16×4×1) INTRINSICS 0.0017 0.1679 841.83
Xeon Phi(1×60×2) AUTOVECT 0.0007 0.0281 140.84
Xeon Phi(1×60×4) INTRINSICS 0.0009 0.0275 138.02
Xeon Phi(1×60×1) KNC512 0.0007 0.0216 108.28
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 22 / 30
Field Tests
Servers vs. Micro-Servers
0
200
400
600
800
1000
1200
1400
1600
Intel(2×8×1) Viridis(16×4×1)
En
ergy
(KJ)
400050007000
BT kernel energy consumption scaling (at QoS=80%) of Viridis(16×4×1)and Intel(2×8×1)
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 23 / 30
Programmability and Iso-Effort
Outline
1 Motivation: Servers and Micro-Servers
2 Micro-servers for Real-Time Analytics
3 Metrics
4 Field Tests
5 Programmability and Iso-Effort
6 Conclusion
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 24 / 30
Programmability and Iso-Effort
Does programmability matter?
VEC TYPE Description
AVX256 Assembler code using AVX 256-bit instructions on the Intel Sandyridge.
INTRINSICS Compiler supplied C functions (ARM 128-bit, Intel 256-bit, Phi 512-bit)
KNC512 Assembler code for 512-bit vector instruction set on the Xeon Phi.
NEON128 Assembler code for the ARM NEON 128-bit unit.
AUTOVECT Compiler auto-vectorisation on all platforms
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 25 / 30
Programmability and Iso-Effort
Vectorization Paradoxes
Table : BT kernel (N=7000 and QoS=40%)
Platform VEC TYPE S/Opt J/Opt Energy(KJ)
Intel(2×8×1) INTRINSICS 0.0032 0.3038 761.42
Intel(1×8×1) AVX256 0.0052 0.2526 632.95
Viridis(16×4×1) INTRINSICS 0.0017 0.1679 420.92
Xeon Phi(1×60×2) AUTOVECT 0.0007 0.0281 70.42
Xeon Phi(1×60×4) INTRINSICS 0.0009 0.0275 69.01
Xeon Phi(1×60×1) KNC512 0.0007 0.0216 54.14
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 26 / 30
Programmability and Iso-Effort
Vectorization Details
xi = axi + bxi+1 (9)
Vectorization of Binomial Tree kernel
Manual unrolling
Inter-lane shift to reducememory loads
Critical ISA differences betweenXeon and ARM
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 27 / 30
Conclusion
Outline
1 Motivation: Servers and Micro-Servers
2 Micro-servers for Real-Time Analytics
3 Metrics
4 Field Tests
5 Programmability and Iso-Effort
6 Conclusion
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 28 / 30
Conclusion
Conclusions
Understanding efficiency through platform-agnostic metrics andmethodologies
Iso-metrics are useful for fair ranking of algorithms, architectures,systems
Mathematical formulation of new QoS metric for streaming analyticsIso-QoS for fair ranking of servers
Important applications: capacity planning, resource throttling, costminimisation, profit maximisation, ...
The programmability dilemma: Iso-effort vs. best effort
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 29 / 30
Conclusion
Credits
EU FP7 Grant 610509, EPSRC Grants L000055/1, L004232/1
Who does the hard work: Charles Gillan, Giorgis Georgakoudis,George Tzenakis, Ahmed Sayed, Ivor Spence, Richard Faloon, andNanoStreams technical team.
D. Nikolopoulos (EEECS@QUB) Iso-Metrics, MULTIPROG’15 January 20, 2015 30 / 30