nding big a ds on modern s using...
TRANSCRIPT
Understanding Big Data Workloads onUnderstanding Big Data Workloads on Modern Processors using BigDataBench
Jianfeng Zhan
INSTITUT
http://prof.ict.ac.cn/BigDataBench
TE OF CO
MPUTIN
G
Professor, ICT, Chinese Academy of Sciences and University of Chinese Academy of Sciences
G TECH
NO
LOG
Y
HPBDC 2015Ohio, USA
OutlineOutline
� BigDataBench Overview
� Workload characterization
� Multi-tenancy version� Multi-tenancy version
� Processors evaluation
BigDataBench HPBDC 2015
What is BigDataBench?What is BigDataBench? � An open source big data benchmarking project p g g p j
• http://prof.ict.ac.cn/BigDataBench
• Search Google using “BigDataBench”
BigDataBench HPBDC 2015
BigDataBench DetailBigDataBench Detail
Methodology
• Five application domains• Propose benchmark specifications for each domain
• 14 Real world data sets & 3 kinds of big data generatorsImplementation
• 14 Real world data sets & 3 kinds of big data generators• 33 Big data workloads with diverse implementation
• BigDataBench subset versionSpecific‐
purpose Version
g
BigDataBench HPBDC 2015
Five Application DomainsFive Application DomainsDDBJ/EMBL/GenBank database Growth
Search Engine Social NetworkEl t i C M di St i
Taking up 80% of internet services according to page new 180200
/ /Nucleotides Entries
Internet ServiceS h i S i l k E
Multimedia
40%5%
15%
Electronic Commerce Media StreamingOthers
g p gviews and daily visitors
new
VIDEOS on YouTube every minute
new
PHOTOS on FLICKR every minute
hours MUSIC streaming on PANDORA every minute
140
160
180
140160180
)on)
Search engine, Social network, E‐commerce
40%
15%
e e y ute minute on PANDORA every minute
80
100
120
80100120140
es (m
illion)
tides (b
illio
25%
VIDEO feeds from ll
data growth
are minutes VOICE calls on 20
40
60
406080
Entrie
Nucleot
Bioinformatics
http://www.alexa.com/topsites/global;0
Top 20 websiteshttp://www.oldcolony.us/wp‐content/uploads/2014/11/whatisbigdata‐DKB‐v2.pdf
surveillance cameras IMAGES, VIDEOS, documents, …
Skype every minute0
20
020
f
BigDataBench HPBDC 2015
p // / p /g ;http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth‐e.html#dbgrowth‐graph
Benchmark specificationBenchmark specification
� Guidelines for BigDataBench implementation� Data model � workloads
Describe data model
Model typical application scenarios
Extract important workloads
BigDataBench HPBDC 2015
BigDataBench DetailsBigDataBench Details
Methodology
• Five application domains• Benchmark specification for each domain
• 14 Real world data sets & 3 kinds of big data generatorsImplementation
• 14 Real world data sets & 3 kinds of big data generators• 33 Big data workloads with diverse implementation
• BigDataBench subset versionSpecific‐
purpose Version
g
BigDataBench HPBDC 2015
BigDataBench Summaryg yBDGS(Big Data Generator Suite) for scalable data
Facebook Social Network
ImageNet
Wikipedia Entries
E‐commerce Transaction
English broadcasting audio
Amazon Movie Reviews
ProfSearch Resumes
DVD Input Streams
Google Web Graph
g p
SoGou Data
Image scene
MNIST
Genome sequence data Assembly of the human genome
ImpalaNoSql
14 Real‐world Data Sets
Shark
Impala
Search Engine
H d RDMA
SocialNetwork
E-commerce
MPI
Software Stacks33 Workloads
DataMPI
Hadoop RDMAMultimedia Bioinformatics
BigDataBench HPBDC 2015
Software Stacks33 Workloads
Big Data Generator ToolBig Data Generator Tool
� 3 kinds of big data generators � Preserving original characteristics of real datag g� Text/Graph/Table generator
BigDataBench HPBDC 2015
BigDataBench DetailsBigDataBench Details
Methodology
• Five application domains• Benchmark specification for each domain
• 14 Real world data sets & 3 kinds of big data generatorsImplementation
• 14 Real world data sets & 3 kinds of big data generators• 33 Big data workloads with diverse implementations
• BigDataBench subset versionSpecific‐
purpose Version
g
BigDataBench HPBDC 2015
BigDataBench SubsetBigDataBench Subset
�Motivation� Expensive to run all the benchmarks for system p yand architecture researches
• multiplied by different implementationsmultiplied by different implementations • BigDataBench 3.0 provides about 77 workloads
Eliminate the
Identify workload characteristics from a
Eliminate the correlation data
Clustering Subsetcharacteristics from a specific perspective
Dimensionreduction (PCA)
(K‐Means) Subset
BigDataBench HPBDC 2015
( )
Why BigDataBench?Why BigDataBench?Specification
Application domains
Workload Types
Workloads
Scalable data sets (from real
Multipleimpleme
Multitenancy
Subsets
Simulatorca o do a s ypes oads se s ( o ea
data)p e e
ntationsa cy e s o
version
BigDataBench Y Five Four[1] 33 8 Y Y Y Y
BigBench Y One Three 10 3 N N N NCloud‐Suite N N/A Two 8 3 N N N Y
HiBench N N/A Two 10 3 N N N N
CALDA Y N/A One 5 N/A Y N N N/ /YCSB Y N/A One 6 N/A Y N N NLinkBench Y N/A One 10 N/A Y N N N
AMPBenchmarks
Y N/A One 4 N/A Y N N N
BigDataBench HPBDC 2015
[1] The four workloads types include Offline Analytics, Cloud OLTP, Interactive Analytics and Online Service
BigDataBench UsersBigDataBench Usershtt // f i t /Bi D t B h/ /� http://prof.ict.ac.cn/BigDataBench/users/
� Industry users� Accenture, BROADCOM, SAMSUMG, Huawei, IBM
� China’s first industry‐standard big data benchmark y gsuite� http://prof.ict.ac.cn/BigDataBench/industry‐standard‐p //p / g / ybenchmarks/
� About 20 academia groups published papers using g p p p p gBigDataBench
BigDataBench HPBDC 2015
BigDataBench PublicationsBigDataBench Publicationsh h k f h� BigDataBench: a Big Data Benchmark Suite from Internet Services. 20th IEEE
International Symposium On High Performance Computer Architecture (HPCA‐2014).
� Characterizing data analysis workloads in data centers. 2013 IEEE International Symposium on Workload Characterization (IISWC 2013)(Best paper award)
� BigOP: generating comprehensive big data workloads as a benchmarking framework. 19th International Conference on Database Systems for Advanced Applications (DASFAA 2014)Advanced Applications (DASFAA 2014)
� BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The Fourth workshop on big data benchmarking (WBDB 2014)Id if i D f W kl d i Bi D A l i Xi i� Identifying Dwarfs Workloads in Big Data Analytics arXiv preprint arXiv:1505.06872
� BigDataBench‐MT: A Benchmark Tool for Generating Realistic Mixed Data
BigDataBench HPBDC 2015
Center Workloads arXiv preprint arXiv:1504.02205
OutlineOutline
� BigDataBench Overview
� Workload characterization
� Multi-tenancy version� Multi-tenancy version
� Processors evaluation
BigDataBench HPBDC 2015
System BehaviorsSystem BehaviorsCPU utilization I/O wait ratio
� Diversified system level behaviors:
20%
40%
60%
80%
100%CPU utilization I/O wait ratio
erce
ntag
e
0%
20%
H-G
rep(
7)Km
eans
(1)
geR
ank(
1)dC
ount
(1)
H-B
ayes
(1)
M-B
ayes
M-K
mea
nsPa
geR
ank
-Rea
d(10
)fe
renc
e(9)
tQue
ry(9
) dC
ount
(8)
-Pro
ject
(4)
Ord
erBy
(3)
S-G
rep(
1)M
-Gre
p-T
PC-D
S-…
Ord
erBy
(7)
-TPC
-DS-
…-T
PC-D
S-…
S-So
rt(1)
Wor
dCou
ntM
-Sor
tS_
BigD
ata
Pe
10
100
o
S-K
S-Pa
gH
-Wor H M
M-P H-
H-D
iffI-S
elec
tS-
Wor S- S-O H I-O S S
M-W
AVG
_S
0.01
0.1
1
10
… … …ed I/
O ti
me
ratio
0.01H
-Gre
p(7)
S-Km
eans
(1)
S-Pa
geR
ank(
1)H
-Wor
dCou
nt(1
)H
-Bay
es(1
)M
-Bay
esM
-Km
eans
M-P
ageR
ank
H-R
ead(
10)
H-D
iffer
ence
(9)
Sele
ctQ
uery
(9)
S-W
ordC
ount
(8)
S-Pr
ojec
t(4)
S-O
rder
By(3
)S-
Gre
p(1)
M-G
rep
H-T
PC-D
S-…
I-Ord
erBy
(7)
S-TP
C-D
S-…
S-TP
C-D
S-…
S-So
rt(1)
M-W
ordC
ount
M-S
ort
AVG
_S_B
igD
ata
Wei
ghte
BigDataBench HPBDC 2015
S H I-S S A
The Average Weighted disk I/O time ratio
System BehaviorsSystem BehaviorsCPU utilization I/O wait ratio
� Diversified system level behaviors:
20%
40%
60%
80%
100%CPU utilization I/O wait ratio
erce
ntag
e
� High CPU utilization & less I/O time
0%
20%
H-G
rep(
7)Km
eans
(1)
geR
ank(
1)dC
ount
(1)
H-B
ayes
(1)
M-B
ayes
M-K
mea
nsPa
geR
ank
-Rea
d(10
)fe
renc
e(9)
tQue
ry(9
) dC
ount
(8)
-Pro
ject
(4)
Ord
erBy
(3)
S-G
rep(
1)M
-Gre
p-T
PC-D
S-…
Ord
erBy
(7)
-TPC
-DS-
…-T
PC-D
S-…
S-So
rt(1)
Wor
dCou
ntM
-Sor
tS_
BigD
ata
Pe
10
100
tio
S-K
S-Pa
gH
-Wor H M
M-P H-
H-D
iffI-S
elec
tS-
Wor S- S-O H I-O S S
M-W
AVG
_S
0.01
0.1
1
10
… … …
ed I/
O ti
me
rat
0.01H
-Gre
p(7)
S-Km
eans
(1)
S-Pa
geR
ank(
1)H
-Wor
dCou
nt(1
)H
-Bay
es(1
)M
-Bay
esM
-Km
eans
M-P
ageR
ank
H-R
ead(
10)
H-D
iffer
ence
(9)
Sele
ctQ
uery
(9)
S-W
ordC
ount
(8)
S-Pr
ojec
t(4)
S-O
rder
By(3
)S-
Gre
p(1)
M-G
rep
H-T
PC-D
S-…
I-Ord
erBy
(7)
S-TP
C-D
S-…
S-TP
C-D
S-…
S-So
rt(1)
M-W
ordC
ount
M-S
ort
AVG
_S_B
igD
ata
Wei
ght
BigDataBench HPBDC 2015
S H I-S S A
The Average Weighted disk I/O time ratio
System BehaviorsSystem BehaviorsCPU utilization I/O wait ratio
� Diversified system level behaviors:
20%
40%
60%
80%
100%CPU utilization I/O wait ratio
erce
ntag
e
� High CPU utilization & less I/O time
� Low CPU utilization
0%
20%
H-G
rep(
7)Km
eans
(1)
geR
ank(
1)dC
ount
(1)
H-B
ayes
(1)
M-B
ayes
M-K
mea
nsPa
geR
ank
-Rea
d(10
)fe
renc
e(9)
tQue
ry(9
) dC
ount
(8)
-Pro
ject
(4)
Ord
erBy
(3)
S-G
rep(
1)M
-Gre
p-T
PC-D
S-…
Ord
erBy
(7)
-TPC
-DS-
…-T
PC-D
S-…
S-So
rt(1)
Wor
dCou
ntM
-Sor
tS_
BigD
ata
Pe
10
100
atio
Low CPU utilization relatively and lots of I/O time
S-K
S-Pa
gH
-Wor H M
M-P H-
H-D
iffI-S
elec
tS-
Wor S- S-O H I-O S S
M-W
AVG
_S
0.01
0.1
1
10
… … …ted
I/O ti
me
ra
0.01H
-Gre
p(7)
S-Km
eans
(1)
S-Pa
geR
ank(
1)H
-Wor
dCou
nt(1
)H
-Bay
es(1
)M
-Bay
esM
-Km
eans
M-P
ageR
ank
H-R
ead(
10)
H-D
iffer
ence
(9)
Sele
ctQ
uery
(9)
S-W
ordC
ount
(8)
S-Pr
ojec
t(4)
S-O
rder
By(3
)S-
Gre
p(1)
M-G
rep
H-T
PC-D
S-…
I-Ord
erBy
(7)
S-TP
C-D
S-…
S-TP
C-D
S-…
S-So
rt(1)
M-W
ordC
ount
M-S
ort
AVG
_S_B
igD
ata
Wei
ght
BigDataBench HPBDC 2015
S H I-S S A
The Average Weighted disk I/O time ratio
System BehaviorsSystem BehaviorsCPU utilization I/O wait ratio
� Diversified system level behaviors:
20%
40%
60%
80%
100%CPU utilization I/O wait ratio
erce
ntag
e
� High CPU utilization & less I/O time
� Relatively low CPU
0%
20%
H-G
rep(
7)Km
eans
(1)
geR
ank(
1)dC
ount
(1)
H-B
ayes
(1)
M-B
ayes
M-K
mea
nsPa
geR
ank
-Rea
d(10
)fe
renc
e(9)
tQue
ry(9
) dC
ount
(8)
-Pro
ject
(4)
Ord
erBy
(3)
S-G
rep(
1)M
-Gre
p-T
PC-D
S-…
Ord
erBy
(7)
-TPC
-DS-
…-T
PC-D
S-…
S-So
rt(1)
Wor
dCou
ntM
-Sor
tS_
BigD
ata
Pe
Relatively low CPU utilization & lots of I/O timeM di CPU tili ti &
S-K
S-Pa
gH
-Wor H M
M-P H-
H-D
iffI-S
elec
tS-
Wor S- S-O H I-O S S
M-W
AVG
_S
1
10
100
ratio
� Medium CPU utilization & I/O time
0.01
0.1
1
(7)
(1)
k(1) t(1
)(1
)ye
san
san
k10
)(9
)(9
) t(8
)t(4
)y(
3) (1)
rep
DS-
…y(
7)D
S-…
DS-
…t(1
)un
tSo
rtat
a
ghte
d I/O
tim
e
H-G
rep
S-Km
eans
S-Pa
geR
ank
H-W
ordC
ount
H-B
ayes
M-B
ayM
-Km
eaM
-Pag
eRa
H-R
ead(
1H
-Diff
eren
ceI-S
elec
tQue
ry(
S-W
ordC
ount
S-Pr
ojec
tS-
Ord
erBy
S-G
rep
M-G
rH
-TPC
-DI-O
rder
ByS-
TPC
-DS-
TPC
-DS-
Sort
M-W
ordC
ouM
-SAV
G_S
_Big
Da
Wei
g
BigDataBench HPBDC 2015
The average Weighted disk I/O time ratios
Workloads ClassificationWorkloads Classification
� From perspective of system behaviors:� System behaviors vary across different workloads y y� Workloads are divided into 3 categories:T W kl dType Workloads
CPU Intensive H‐Grep, S‐Kmeans, S‐PageRank, H‐WordCount, H‐Bayes, M‐Bayes, M‐Kmeans and M‐PageRank
I/O Intensive H‐Read, H‐Difference, I‐SelectQuery, S‐WordCount, S‐Project, S‐OrderBy, M‐Grep and S‐Grepp
Hybrid H‐TPC‐DS‐query3, I‐OrderBy, S‐TPC‐DS‐query10, S‐TPC‐DS‐query8, S‐Sort, M‐WordCount and M‐Sort
BigDataBench HPBDC 2015
Off Chip BandwidthOff‐Chip Bandwidth
Most of CPU‐intensive workloads have higher off‐chip bandwidth (3GB/s), Maximum is 6.2GB/s; Other workloads have lower off‐chip bandwidth (0.6GB/s).
MPI based workloads need low memory bandwidth.
BigDataBench HPBDC 2015
IPC of BigDataBench vs. other
2
benchmarks
1
1.5
IPC
0
0.5
rep(
7)an
s(1)
ank(
1)un
t(1)
yes(
1)Ba
yes
mea
nsR
ank
d(10
)ce
(9)
ry(9
) un
t(8)
ect(4
)By
(3)
rep(
1)-G
rep
ry3(
9)By
(7)
C-D
S-…
ry8(
1)or
t(1)
Cou
ntM
-Sor
tgD
ata
PC-C
Suite
HPC
CR
SEC
PEC
fpEC
int
H-G
rS-
Kmea
S-Pa
geR
aH
-Wor
dCou
H-B
ay M-B
M-K
mM
-Pag
eH
-Rea
H-D
iffer
enI-S
elec
tQue
S-W
ordC
ouS-
Proj
eS-
Ord
erS-
Gr
M-
-TPC
-DS-
quer
I-Ord
erS-
TPC
-TPC
-DS-
quer
S-So
M-W
ordC M
AVG
_S_B
ig TPAV
G_C
loud
Avg_
HAv
g_PA
RAV
G_S
PAV
G_S
PE
H- S-
The average IPC of the big data workloads is larger than that of CloudSuite, SPECFP and SPECINT similar with PARSEC and slightly lower than HPCCSPECINT, similar with PARSEC and slightly lower than HPCC
The avrerage IPC of BigDataBench is 1.3 times of that of CloudSuite
Some workloads have high IPC (M Kmeans S TPC DS Query8)
BigDataBench HPBDC 2015
Some workloads have high IPC (M_Kmeans, S‐TPC‐DS‐Query8)
Instructions Mix of BigDataBench vs. other benchmarks
z Bi d kl d d d i d i i h b hz Big data workloads are data movement dominated computing with more branch operationsz 92% percentage in terms of instruction mix (Load + Store + Branch + data movements of INT)
BigDataBench HPBDC 2015
Pipeline StallsPipeline Stalls� The service workloads have more RAT (Register Allocation Table) stalls � The data analysis workloads have more RS (Reservation Station) and� The data analysis workloads have more RS (Reservation Station) and
ROB (ReOrder Buffer) full stalls� Notable front end stalls (i.e., instruction fetch stall) !
Data Service Data analysis
BigDataBench HPBDC 2015
Cache Behaviors of BigDataBenchgCPU‐intensive I/O‐intensive hybrid
z L1I MPKIz L1I MPKIz Larger than traditional benchmarks, but lower than that of CloudSuite (12 vs.
31)Diff bi d kl dz Different among big data workloadsz CPU‐intensive(8), I/O intensive(22), and hybrid workloads(9)
z One order of magnitude differences among diverse implementations
BigDataBench HPBDC 2015
z M_WordCount is 2, while H_WordCount is 17
Cache BehaviorsCache Behaviors
� L2 Cache:� The IO‐intensive workloads undergo more L2 MPKI
� L3 Cache:� The average L3 MPKI of the big data workloads is lower� The average L3 MPKI of the big data workloads is lower than all of the other workloads
� The underlying software stacks impact data� The underlying software stacks impact data locality� MPI workloads have better data locality and less cache misses
BigDataBench HPBDC 2015
TLB BehaviorsTLB BehaviorsCPU‐intensive I/O‐intensive hybrid
0.1
1
10 DTLB MPKI ITLB MPKI
miss
es (M
PKI)
0.01
0.1
H-Gr
ep(7)
mean
s(1)
eRan
k(1)
dCou
nt(1)
Baye
s(1)
M-Ba
yes
-Kmea
nsag
eRan
kRe
ad(10
)ere
nce(9
)Qu
ery(9)
dCou
nt(8)
Projec
t(4)
rderBy
(3)S-G
rep(1)
M-Gr
epqu
ery3(9
)rde
rBy(7)
uery1
0(4)
query
8(1)
S-Sort
(1)ord
Coun
tM-
Sort
_BigD
ataTP
C-C
oudS
uite
vg_H
PCC
PARS
EC_S
PECfp
_SPE
Cint
TLB
z ITLB
HS-K
mS-P
age
H-Wo
rd H- M M-P H-R
H-Dif
feI-S
electQ
S-Word S-P S-O
r S
H-TP
C-DS
-q I-Or
S-TPC
-DS-q
uS-T
PC-D
S-qS
M-Wo
AVG_
S_
AVG_
Cl AvAv
g_P
AVG_
AVG_
z ITLB� IO‐intensive workloads undergo more ITLB MPKI.
DTLBz DTLBz CPU‐intensive workloads have more DTLB MPKI.
BigDataBench HPBDC 2015
Our observations from BigDataBenchOur observations from BigDataBench� Unique characteristics� Unique characteristics
� Data movement dominated computing with more branch operations• 92% percentage in terms of instruction mix
� Notable pipeline frontend stalls
Diff b h i Bi D kl d� Different behaviors among Big Data workloads� Disparity of IPCs and memory access behaviors
• CloudSuite is a subclass of Big DataCloudSuite is a subclass of Big Data
� Software stacks impacts� The L1I cache miss rates have one order of magnitude differences among
diverse implementations with different software stacks.
BigDataBench HPBDC 2015
Correlation AnalysisCorrelation Analysis
� Compute the correlation coefficients of CPI with other micro‐architecture level metrics.� Pearson’s correlation coefficient:
� Absolute value (from 0 to 1) shows the dependency:• The bigger the absolute value, the stronger the correlation.
BigDataBench HPBDC 2015
Top five coefficientsTop five coefficients
①
⑤
Naive Bayes Grep WordCount Kmeans FKmeans PageRank Sort Hive IBCF HMM SVM
BigDataBench HPBDC 2015
InsightsInsights� Frontend stall does not have high correlation� Frontend stall does not have high correlation coefficient value for most of big data analytics workloadsworkloads� Frontend stall is not the factor that affects the CPI performance mostperformance most.
� L2 cache misses and TLB misses have high correlation coefficient valuescorrelation coefficient values.� The long latency memory accesses (access L3 cache or memory) affect the CPI performance most and shouldmemory) affect the CPI performance most and should be the optimization point with highest priority.
BigDataBench HPBDC 2015
OutlineOutline
� BigDataBench Overview
� Workload characterization
� Multi-tenancy version� Multi-tenancy version
� Processors evaluation
BigDataBench HPBDC 2015
Cloud Data CentersCloud Data Centers
� Two class of popular workloads� Long‐running servicesg g
• Search engines, E‐commerce sites
� Short‐term data analytic jobs� Short term data analytic jobs • Hadoop MapReduce, Spark jobs
BigDataBench HPBDC 2015
ProblemProblem
� Existing benchmarks focus on specific types of workload
� Scenarios are not realistic D t t h th t i l d t t i� Does not match the typical data center scenario that mixes different percentages of tenants and
kl d h i th tiworkloads shareing the same computing infrastructure
BigDataBench HPBDC 2015
Purpose of BigDataBench MTPurpose of BigDataBench‐MT
� Developing realistic benchmarks to reflect such practical scenarios of mixed workloads. p� Both service and data analytic workloads� Dynamic scaling up and down� Dynamic scaling up and down
� The tool is publicly available from http://prof.ict.ac.cn/BigDataBench/multi‐p //p / g /tenancyversion
BigDataBench HPBDC 2015
What can you do with it?What can you do with it?
�We consider two dimensions of the benchmarking scenariosg� From tenants’ perspectives
kl d ’� From workloads’ perspectives
BigDataBench HPBDC 2015
You can specify the tenantsYou can specify the tenants
Th b f t t� The number of tenants� Scalability Benchmark: How many tenants are able run in parallel ?run in parallel ?
� The priorities of tenantsF i B h k H f i i th� Fairness Benchmark: How fair is the system, i.e., are the available resources equally available to all tenants? If tenants have differentavailable to all tenants? If tenants have different priorities ?
� Time line� How the number and priorities of tenants change over time?
BigDataBench HPBDC 2015
You can specify the workloadsYou can specify the workloads
D t h t i ti� Data characteristics� Data type, source � Input/output data volumes, distributions
� Computation semanticsp� Source code� Big data software stacks� Big data software stacks
� Job arrival patternsA i l� Arrival rate
� Arrival sequence
BigDataBench HPBDC 2015
Two major challengesTwo major challenges� H t it f l kl d� Heterogeneity of real workloads
� Different workload types9 /9 e.g. CPU or I/O intensive workloads
� Different software stacks99 e.g. Hadoop, Spark, MPI
� Workload dynamicity hidden in real‐world traces� Arrival patterns9 Request/Job submitting time and sequences
� Job input sizes9 e.g. ranging from KB to ZB
BigDataBench HPBDC 2015
Existing big data benchmarksg g
Benchmarks Actual workloads
Real workload traces
Mixed workloads
AMPLab benchmark LinkbenchAMPLab benchmark, Linkbench , Bigbench , YCSB, CloudSuite Yes No No
d
H t t l kl d th b i f
GridMix , SWIM No Yes No
� How to generate real workloads on the basis of real workload traces, is still an open question.
BigDataBench HPBDC 2015
System OverviewSystem Overview
� Three modules� Benchmark User
PortalPortal• A visual interface
� Combiner of Workloads d Tand Traces
• A matcher of real workloads and traces
M l i W kl d� Multi‐tenant Workload Generator
• A multi‐tenant workload generatorworkload generator
BigDataBench HPBDC 2015
Key technique: Combination of real and synthetic data analytic jobs
�Goal: Combining the arrival patterns t t d f l t ith lextracted from real traces with real
workloads.
�Problem: Workload traces only contain anonymous jobs whose workload typesanonymous jobs whose workload types and/or input data are unknown.
BigDataBench HPBDC 2015
Solution: the first stepp� Deriving the workload characteristics of both greal and anonymous jobs
Metric Description
TABLE. Metrics to represent workload characteristics
Execution time Measured in secondsCPU usage Total CPU time per second
Memory usage Measured in GBCPI Cycles per instruction
h b fMAI The number of memory accesses per instruction
BigDataBench HPBDC 2015
Solution: the second stepp�Matching both types of jobs whose workload g yp jcharacteristics are sufficiently similar
BigDataBench HPBDC 2015
An Examplep� An example of matching Hadoop workloads
Mining Facebook/Googleworkload trace
Matching result: replaying basis
workload trace(Exact workload characteristics information
Job type Input size (GB)
StartingTime information
Profiling Hadoop
Workload matching using
( )(minutes)
Bayes 2 10
Sort 1 20g pworkloads from BigDataBench
(Collect workload
usingk‐means clustering
Sort 1 20
K‐means 0.5 25
Bayes 5 30
characteristics information)
Sort 1 40
BigDataBench HPBDC 2015
System demostration� Three steps to generate a mix of search service and Hadoop MapReduce jobs� Traces : 24‐hour Sogou user query logs and Google cluster trace.
Step 1‐Specification of tested machines and workloadsStep 2 Selection of benchmarking period and scaleStep 2‐Selection of benchmarking period and scale
Step 3‐Generation of mixed workloads
BigDataBench HPBDC 2015
Workloads and traces in BigDataBench MTBigDataBench‐MT
M lti t V1 0 l
f
� Multi‐tenancy V1.0 releases:
Workloads Software stack Workload traceNutch Web Search
Apache Tomcat6.0.26, Search
Sogou(http://www.sogou.com/labs/dl
Server(Nutch) /q‐e.html)
Hadoop Hadoop 1.0.2 Facebook(https://github.com/SWIMProjectUCB/SWIM/wiki)
Shark Shark 0 8 0 Google data centerShark Shark 0.8.0 Google data center(https://code.google.com/p/googleclusterdata/)
BigDataBench HPBDC 2015
OutlineOutline
� BigDataBench Overview
� Workload characterization
� Multi-tenancy version� Multi-tenancy version
� Processors evaluation
BigDataBench HPBDC 2015
Core ArchitectureCore Architecture
� Multi brawny core (Xeon E5645, 2.4 GHz)� 6 Out‐of‐Order cores� Dynamic Multiple Issue (supper scalar)� Dynamic Overclocking� Simultaneous multithreading
� Many wimpy core architecture (Tile‐Gx36, 1.2 GHz):y py ( , )� 36 In‐Order cores� Static Multiple Issue (VLIW)p ( )
BigDataBench HPBDC 2015
Experiment methodologyExperiment methodology
� User real hardware instead of simulation� Real power consumption measurement instead of modeling
� Saturate CPU performance by:Saturate CPU performance by: � Isolate the processor behavior
• Over‐provisions the disk I/O subsystem by using RAM diskp / y y g
� Optimize benchmarks• Tune the software stack parameters • JVM flags to performance
BigDataBench HPBDC 2015
Execution timeExecution time
� For Hadoop based sort, the performance gap is about 1.08×.� For the other workloads, more than 2× gaps exist between
X d TilXeon and Tilera.
3.54
4.55
me
Xeon Tilera
11.52
2.53
3.5
Normalized
Tim
00.51
� From the perspective of execution time, the Xeon processor is b tt th Til ll th ti
BigDataBench HPBDC 2015
better than Tilera processor all the time.
Cycle CountsCycle Counts
� There are huge cycle count gaps between Xeon and Tileraranging from 5.3 to 14.Til d l t l t th t f� Tilera need more cycles to complete the same amount of work.
16 Xeon Tilera
8
10
12
14
zed Cycles
2
4
6
8
Normaliz
0
BigDataBench HPBDC 2015
Pipeline EfficiencyPipeline Efficiency• The theoretical IPC:The theoretical IPC:
¾ Xeon: 4 instructions per cycle
• Pipeline efficiency:¾ Tilera: 1 instruction bundle per cycle
Pipeline efficiency:
0.45 Tilera Xeon
0 150.2
0.250.3
0.350.4
ine Efficiency
00.050.1
0.15
Pipe
li
• OoO pipelines are more efficient than in‐order ones
BigDataBench HPBDC 2015
Power ConsumptionPower Consumption
� Tilera is power optimized.� Xeon consumes more power.� Xeon consumes more power.
1.6
1.8 Tilera Xeon
0.8
1
1.2
1.4
alized
Pow
er
0
0.2
0.4
0.6
Norm
BigDataBench HPBDC 2015
Energy ConsumptionEnergy Consumptiond b d l l h� Hadoop based sort consumes less energy on Tilera than on Xeon
� Hadoop sort is an extremely I/O intensive workloads. � Tilera consumes more energy than Xeon to complete the same amount ofTilera consumes more energy than Xeon to complete the same amount of
work for most big data workloads� The longer execution time offsets the lower power design
3 5 Xeon Tilera
2
2.5
3
3.5
zed En
ergy
0.5
1
1.5
Normali
0
BigDataBench HPBDC 2015
Total Cost of Ownership (TCO) Model[*]Total Cost of Ownership (TCO) Model[ ]
� Three‐year depreciation cycle� Hardware costs associated with individual components� CPU� Memory� Disk� Board� Power � Cooling
[*] K. Lim et al. Understanding and designing new server architectures for emerging warehouse‐computing
BigDataBench HPBDC 2015
environments. ISCA 2008
Cost modelCost model
� The cost data originate from diverse sources:Diff t d� Different vendors
� Corresponding official websites� Power and cooling:
BigDataBench HPBDC 2015
� An activity factor of 0.75
Performance per TCOPerformance per TCO
� Haoop‐based Sort has higher performance per TCO on the Tilera.
� For other workloads, Xeon outperforms Tilera.Tilera Xeon Turbo & HT enabled
2
2.5
3
3.5
ance per TCO
0.5
1
1.5
2
malized
Perform
0Norm
BigDataBench HPBDC 2015
Key TakeawaysKey Takeaways
� Try using an open‐source big data benchmark suite from http://prof.ict.ac.cn/BigDataBench
� Big Data: data movement dominated computing with more branch operations� 92% percentage in terms of instruction mix
� Multi‐tenancy version: replaying mixed workloadsMulti tenancy version: replaying mixed workloads according to publicly available workloads traces.
� Wimpy‐core processors only suit a part of big data� Wimpy‐core processors only suit a part of big data workloads.
BigDataBench HPBDC 2015
BigDataBench HPBDC 2015