nding big a ds on modern s using...

60
Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan INSTITUT http://prof.ict.ac.cn/BigDataBench TE OF COMPUTING Professor, ICT, Chinese Academy of Sciences and University of Chinese Academy of Sciences G TECHNOLOGY HPBDC 2015 Ohio, USA

Upload: others

Post on 26-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Understanding Big Data Workloads onUnderstanding Big Data Workloads on Modern Processors using BigDataBench

Jianfeng Zhan

INSTITUT

http://prof.ict.ac.cn/BigDataBench

TE OF CO

MPUTIN

G

Professor,  ICT, Chinese  Academy of Sciences and  University of Chinese Academy of Sciences

G TECH

NO

LOG

Y

HPBDC 2015Ohio, USA

Page 2: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

OutlineOutline

� BigDataBench Overview

� Workload characterization

� Multi-tenancy version� Multi-tenancy version

� Processors evaluation

BigDataBench HPBDC 2015

Page 3: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

What is BigDataBench?What is BigDataBench? � An open source big data benchmarking project p g g p j

• http://prof.ict.ac.cn/BigDataBench

• Search Google using “BigDataBench”

BigDataBench HPBDC 2015

Page 4: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

BigDataBench DetailBigDataBench Detail 

Methodology

• Five application domains• Propose benchmark specifications for each domain

• 14 Real world data sets & 3 kinds of big data generatorsImplementation  

• 14 Real world data sets &  3 kinds of big data generators• 33 Big data workloads with diverse implementation

• BigDataBench subset versionSpecific‐

purpose Version

g

BigDataBench HPBDC 2015

Page 5: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Five Application DomainsFive Application DomainsDDBJ/EMBL/GenBank database Growth

Search Engine Social NetworkEl t i C M di St i

Taking up 80% of internet services according to page new 180200

/ /Nucleotides Entries

Internet ServiceS h i S i l k E

Multimedia

40%5%

15%

Electronic Commerce Media StreamingOthers

g p gviews and daily visitors

new 

VIDEOS on YouTube every minute

new 

PHOTOS on FLICKR every minute

hours MUSIC streaming on PANDORA every minute

140

160

180

140160180

)on)

Search engine, Social network, E‐commerce

40%

15%

e e y ute minute on PANDORA every minute

80

100

120

80100120140

es (m

illion)

tides (b

illio

25%

VIDEO feeds from ll

data growth 

are minutes VOICE calls on 20

40

60

406080

Entrie

Nucleot

Bioinformatics

http://www.alexa.com/topsites/global;0

Top 20 websiteshttp://www.oldcolony.us/wp‐content/uploads/2014/11/whatisbigdata‐DKB‐v2.pdf

surveillance cameras IMAGES, VIDEOS, documents, …

Skype every minute0

20

020

f

BigDataBench HPBDC 2015

p // / p /g ;http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth‐e.html#dbgrowth‐graph

Page 6: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Benchmark specificationBenchmark specification

� Guidelines for BigDataBench implementation� Data model � workloads 

Describe data model

Model typical application scenarios

Extract important workloads

BigDataBench HPBDC 2015

Page 7: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

BigDataBench DetailsBigDataBench Details 

Methodology

• Five application domains• Benchmark specification for each domain

• 14 Real world data sets & 3 kinds of big data generatorsImplementation  

• 14 Real world data sets &  3 kinds of big data generators• 33 Big data workloads with diverse implementation

• BigDataBench subset versionSpecific‐

purpose Version

g

BigDataBench HPBDC 2015

Page 8: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

BigDataBench Summaryg yBDGS(Big Data Generator Suite) for scalable data

Facebook Social Network

ImageNet

Wikipedia  Entries

E‐commerce  Transaction

English broadcasting audio

Amazon Movie Reviews

ProfSearch Resumes

DVD Input Streams

Google Web Graph

g p

SoGou Data

Image scene

MNIST

Genome sequence data Assembly of the human genome

ImpalaNoSql

14  Real‐world Data Sets

Shark

Impala

Search Engine

H d RDMA

SocialNetwork

E-commerce

MPI

Software Stacks33 Workloads

DataMPI

Hadoop RDMAMultimedia Bioinformatics

BigDataBench HPBDC 2015

Software Stacks33 Workloads

Page 9: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Big Data Generator ToolBig Data Generator Tool

� 3 kinds of big data generators � Preserving original characteristics of real datag g� Text/Graph/Table generator

BigDataBench HPBDC 2015

Page 10: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

BigDataBench DetailsBigDataBench Details 

Methodology

• Five application domains• Benchmark specification for each domain

• 14 Real world data sets & 3 kinds of big data generatorsImplementation  

• 14 Real world data sets &  3 kinds of big data generators• 33 Big data workloads with diverse implementations

• BigDataBench subset versionSpecific‐

purpose Version

g

BigDataBench HPBDC 2015

Page 11: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

BigDataBench SubsetBigDataBench Subset

�Motivation� Expensive to run all the benchmarks for system p yand architecture researches

• multiplied by different implementationsmultiplied by different implementations • BigDataBench 3.0 provides about 77 workloads

Eliminate the

Identify workload characteristics from a

Eliminate the correlation data 

Clustering Subsetcharacteristics from a specific perspective

Dimensionreduction (PCA)

(K‐Means) Subset

BigDataBench HPBDC 2015

( )

Page 12: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Why BigDataBench?Why BigDataBench?Specification

Application  domains

Workload Types

Workloads

Scalable data sets (from real 

Multipleimpleme

Multitenancy

Subsets

Simulatorca o do a s ypes oads se s ( o ea

data)p e e

ntationsa cy e s o

version

BigDataBench Y Five Four[1] 33 8 Y Y Y Y

BigBench Y One Three 10 3 N N N NCloud‐Suite N N/A Two 8 3 N N N Y

HiBench N N/A Two 10 3 N N N N

CALDA Y N/A One 5 N/A Y N N N/ /YCSB Y N/A One 6 N/A Y N N NLinkBench Y N/A One 10 N/A Y N N N

AMPBenchmarks

Y N/A One 4 N/A Y N N N

BigDataBench HPBDC 2015

[1] The four workloads types include Offline Analytics, Cloud OLTP,  Interactive Analytics and Online Service

Page 13: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

BigDataBench UsersBigDataBench Usershtt // f i t /Bi D t B h/ /� http://prof.ict.ac.cn/BigDataBench/users/

� Industry users� Accenture, BROADCOM, SAMSUMG, Huawei, IBM

� China’s first industry‐standard big data benchmark y gsuite� http://prof.ict.ac.cn/BigDataBench/industry‐standard‐p //p / g / ybenchmarks/

� About 20 academia groups published papers using g p p p p gBigDataBench

BigDataBench HPBDC 2015

Page 14: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

BigDataBench PublicationsBigDataBench Publicationsh h k f h� BigDataBench: a Big Data Benchmark Suite from Internet Services.  20th IEEE 

International Symposium On High Performance Computer Architecture (HPCA‐2014).

� Characterizing data analysis workloads in data centers.  2013 IEEE International Symposium on Workload Characterization (IISWC 2013)(Best paper award)

� BigOP: generating comprehensive big data workloads as a benchmarking framework.  19th International Conference on Database Systems for Advanced Applications (DASFAA 2014)Advanced Applications (DASFAA 2014)

� BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The Fourth workshop on big data benchmarking (WBDB 2014)Id if i D f W kl d i Bi D A l i Xi i� Identifying Dwarfs Workloads in Big Data Analytics   arXiv preprint arXiv:1505.06872

� BigDataBench‐MT: A Benchmark Tool for Generating Realistic Mixed Data 

BigDataBench HPBDC 2015

Center Workloads   arXiv preprint arXiv:1504.02205

Page 15: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

OutlineOutline

� BigDataBench Overview

� Workload characterization

� Multi-tenancy version� Multi-tenancy version

� Processors evaluation

BigDataBench HPBDC 2015

Page 16: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

System BehaviorsSystem BehaviorsCPU utilization I/O wait ratio

� Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

10

100

o

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

0.01

0.1

1

10

… … …ed I/

O ti

me

ratio

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

Wei

ghte

BigDataBench HPBDC 2015

S H I-S S A

The Average Weighted disk I/O time ratio

Page 17: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

System BehaviorsSystem BehaviorsCPU utilization I/O wait ratio

� Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

� High CPU utilization & less I/O time

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

10

100

tio

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

0.01

0.1

1

10

… … …

ed I/

O ti

me

rat

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

Wei

ght

BigDataBench HPBDC 2015

S H I-S S A

The Average Weighted disk I/O time ratio

Page 18: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

System BehaviorsSystem BehaviorsCPU utilization I/O wait ratio

� Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

� High CPU utilization & less I/O time

� Low CPU utilization

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

10

100

atio

Low CPU utilization relatively and lots of I/O time

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

0.01

0.1

1

10

… … …ted

I/O ti

me

ra

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

Wei

ght

BigDataBench HPBDC 2015

S H I-S S A

The Average Weighted disk I/O time ratio

Page 19: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

System BehaviorsSystem BehaviorsCPU utilization I/O wait ratio

� Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

� High CPU utilization & less I/O time

� Relatively low CPU

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

Relatively low CPU utilization & lots of I/O timeM di CPU tili ti &

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

1

10

100

ratio

� Medium CPU utilization & I/O time

0.01

0.1

1

(7)

(1)

k(1) t(1

)(1

)ye

san

san

k10

)(9

)(9

) t(8

)t(4

)y(

3) (1)

rep

DS-

…y(

7)D

S-…

DS-

…t(1

)un

tSo

rtat

a

ghte

d I/O

tim

e

H-G

rep

S-Km

eans

S-Pa

geR

ank

H-W

ordC

ount

H-B

ayes

M-B

ayM

-Km

eaM

-Pag

eRa

H-R

ead(

1H

-Diff

eren

ceI-S

elec

tQue

ry(

S-W

ordC

ount

S-Pr

ojec

tS-

Ord

erBy

S-G

rep

M-G

rH

-TPC

-DI-O

rder

ByS-

TPC

-DS-

TPC

-DS-

Sort

M-W

ordC

ouM

-SAV

G_S

_Big

Da

Wei

g

BigDataBench HPBDC 2015

The average Weighted disk I/O time ratios

Page 20: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Workloads ClassificationWorkloads Classification

� From perspective of system behaviors:� System behaviors vary across different workloads y y� Workloads are divided into 3 categories:T W kl dType Workloads

CPU Intensive H‐Grep, S‐Kmeans, S‐PageRank, H‐WordCount, H‐Bayes, M‐Bayes, M‐Kmeans and M‐PageRank

I/O Intensive H‐Read, H‐Difference, I‐SelectQuery, S‐WordCount, S‐Project, S‐OrderBy, M‐Grep and S‐Grepp

Hybrid H‐TPC‐DS‐query3, I‐OrderBy, S‐TPC‐DS‐query10, S‐TPC‐DS‐query8, S‐Sort, M‐WordCount and M‐Sort

BigDataBench HPBDC 2015

Page 21: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Off Chip BandwidthOff‐Chip Bandwidth

Most of CPU‐intensive workloads have higher off‐chip bandwidth (3GB/s), Maximum is 6.2GB/s; Other workloads have lower off‐chip bandwidth (0.6GB/s).

MPI based workloads need low memory bandwidth.

BigDataBench HPBDC 2015

Page 22: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

IPC of BigDataBench vs. other 

2

benchmarks

1

1.5

IPC

0

0.5

rep(

7)an

s(1)

ank(

1)un

t(1)

yes(

1)Ba

yes

mea

nsR

ank

d(10

)ce

(9)

ry(9

) un

t(8)

ect(4

)By

(3)

rep(

1)-G

rep

ry3(

9)By

(7)

C-D

S-…

ry8(

1)or

t(1)

Cou

ntM

-Sor

tgD

ata

PC-C

Suite

HPC

CR

SEC

PEC

fpEC

int

H-G

rS-

Kmea

S-Pa

geR

aH

-Wor

dCou

H-B

ay M-B

M-K

mM

-Pag

eH

-Rea

H-D

iffer

enI-S

elec

tQue

S-W

ordC

ouS-

Proj

eS-

Ord

erS-

Gr

M-

-TPC

-DS-

quer

I-Ord

erS-

TPC

-TPC

-DS-

quer

S-So

M-W

ordC M

AVG

_S_B

ig TPAV

G_C

loud

Avg_

HAv

g_PA

RAV

G_S

PAV

G_S

PE

H- S-

The average IPC of the big data workloads is larger than that of CloudSuite, SPECFP and SPECINT similar with PARSEC and slightly lower than HPCCSPECINT,  similar with PARSEC and slightly lower than HPCC

The avrerage IPC of BigDataBench is 1.3 times of that of CloudSuite

Some workloads have high IPC (M Kmeans S TPC DS Query8)

BigDataBench HPBDC 2015

Some workloads have high IPC (M_Kmeans, S‐TPC‐DS‐Query8)

Page 23: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Instructions Mix of BigDataBench vs. other benchmarks

z Bi d kl d d d i d i i h b hz Big data workloads are data movement dominated computing with more branch operationsz 92% percentage in terms of instruction mix  (Load + Store + Branch + data movements of INT)

BigDataBench HPBDC 2015

Page 24: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Pipeline StallsPipeline Stalls� The service workloads have more RAT (Register Allocation Table) stalls � The data analysis workloads have more RS (Reservation Station) and� The data analysis workloads have more RS (Reservation Station) and 

ROB (ReOrder Buffer) full stalls� Notable front end stalls (i.e., instruction fetch stall) !

Data Service Data analysis

BigDataBench HPBDC 2015

Page 25: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Cache Behaviors of BigDataBenchgCPU‐intensive I/O‐intensive hybrid

z L1I MPKIz L1I MPKIz Larger than traditional benchmarks, but lower than that of CloudSuite (12 vs. 

31)Diff bi d kl dz Different among big data workloadsz CPU‐intensive(8), I/O intensive(22), and hybrid workloads(9)

z One order of magnitude differences among diverse implementations

BigDataBench HPBDC 2015

z M_WordCount is 2, while H_WordCount is 17

Page 26: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Cache BehaviorsCache Behaviors

� L2 Cache:� The IO‐intensive workloads undergo more L2 MPKI

� L3 Cache:� The average L3 MPKI of the big data workloads is lower� The average L3 MPKI of the big data workloads is lower than all of the other workloads

� The underlying software stacks impact data� The underlying software stacks impact data locality� MPI workloads have better data locality and less cache misses

BigDataBench HPBDC 2015

Page 27: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

TLB BehaviorsTLB BehaviorsCPU‐intensive I/O‐intensive hybrid

0.1

1

10 DTLB MPKI ITLB MPKI

miss

es (M

PKI)

0.01

0.1

H-Gr

ep(7)

mean

s(1)

eRan

k(1)

dCou

nt(1)

Baye

s(1)

M-Ba

yes

-Kmea

nsag

eRan

kRe

ad(10

)ere

nce(9

)Qu

ery(9)

dCou

nt(8)

Projec

t(4)

rderBy

(3)S-G

rep(1)

M-Gr

epqu

ery3(9

)rde

rBy(7)

uery1

0(4)

query

8(1)

S-Sort

(1)ord

Coun

tM-

Sort

_BigD

ataTP

C-C

oudS

uite

vg_H

PCC

PARS

EC_S

PECfp

_SPE

Cint

TLB

z ITLB

HS-K

mS-P

age

H-Wo

rd H- M M-P H-R

H-Dif

feI-S

electQ

S-Word S-P S-O

r S

H-TP

C-DS

-q I-Or

S-TPC

-DS-q

uS-T

PC-D

S-qS

M-Wo

AVG_

S_

AVG_

Cl AvAv

g_P

AVG_

AVG_

z ITLB� IO‐intensive workloads undergo more ITLB MPKI.

DTLBz DTLBz CPU‐intensive workloads have more DTLB MPKI.

BigDataBench HPBDC 2015

Page 28: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Our observations from BigDataBenchOur observations from BigDataBench� Unique characteristics� Unique characteristics

� Data movement dominated computing with more branch operations• 92% percentage in terms of instruction mix

� Notable pipeline frontend stalls

Diff b h i Bi D kl d� Different behaviors among Big Data workloads� Disparity  of IPCs and  memory access behaviors

• CloudSuite is a subclass of Big DataCloudSuite is a subclass of Big Data

� Software stacks impacts� The L1I cache miss rates have one order of magnitude differences among 

diverse implementations with different software stacks.

BigDataBench HPBDC 2015

Page 29: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Correlation AnalysisCorrelation Analysis 

� Compute the correlation coefficients of CPI with other micro‐architecture level metrics.� Pearson’s correlation coefficient:

� Absolute value (from 0 to 1) shows the dependency:• The bigger the absolute value, the stronger the correlation.

BigDataBench HPBDC 2015

Page 30: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Top five coefficientsTop five coefficients 

Naive Bayes Grep WordCount Kmeans FKmeans PageRank Sort Hive IBCF HMM SVM

BigDataBench HPBDC 2015

Page 31: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

InsightsInsights� Frontend stall does not have high correlation� Frontend stall does not have high correlation coefficient value for most of big data analytics workloadsworkloads� Frontend stall is not the factor that affects the CPI performance mostperformance most.

� L2 cache misses and TLB misses have high correlation coefficient valuescorrelation coefficient values.� The long latency memory accesses (access L3 cache or memory) affect the CPI performance most and shouldmemory) affect the CPI performance most and should be the optimization point with highest priority.

BigDataBench HPBDC 2015

Page 32: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

OutlineOutline

� BigDataBench Overview

� Workload characterization

� Multi-tenancy version� Multi-tenancy version

� Processors evaluation

BigDataBench HPBDC 2015

Page 33: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Cloud Data CentersCloud Data Centers

� Two class of popular workloads� Long‐running servicesg g

• Search engines, E‐commerce sites

� Short‐term data analytic jobs� Short term data analytic jobs • Hadoop MapReduce, Spark jobs

BigDataBench HPBDC 2015

Page 34: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

ProblemProblem

� Existing benchmarks focus on specific types of workload

� Scenarios are not realistic  D t t h th t i l d t t i� Does not match the typical data center scenario that mixes different percentages of tenants and 

kl d h i th tiworkloads shareing the same computing infrastructure

BigDataBench HPBDC 2015

Page 35: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Purpose of BigDataBench MTPurpose of BigDataBench‐MT

� Developing realistic benchmarks to reflect such practical scenarios of mixed workloads. p� Both service and data analytic workloads� Dynamic scaling up and down� Dynamic scaling up and down

� The tool is publicly available from http://prof.ict.ac.cn/BigDataBench/multi‐p //p / g /tenancyversion

BigDataBench HPBDC 2015

Page 36: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

What can you do with it?What can you do with it?

�We consider two dimensions of the benchmarking scenariosg� From tenants’ perspectives

kl d ’� From workloads’ perspectives

BigDataBench HPBDC 2015

Page 37: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

You can specify the tenantsYou can specify the tenants

Th b f t t� The number of tenants� Scalability Benchmark: How many tenants are able run in parallel ?run in parallel ?

� The priorities of tenantsF i B h k H f i i th� Fairness Benchmark: How fair is the system, i.e., are the available resources equally available to all tenants? If tenants have differentavailable to all tenants? If tenants have different priorities ?

� Time line� How the number and priorities of tenants change over time?

BigDataBench HPBDC 2015

Page 38: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

You can specify the workloadsYou can specify the workloads

D t h t i ti� Data characteristics� Data type, source � Input/output data volumes, distributions

� Computation semanticsp� Source code� Big data software stacks� Big data software stacks

� Job arrival patternsA i l� Arrival rate

� Arrival sequence

BigDataBench HPBDC 2015

Page 39: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Two major challengesTwo major challenges� H t it f l kl d� Heterogeneity of real workloads

� Different workload types9 /9 e.g. CPU or I/O intensive workloads

� Different  software stacks99 e.g. Hadoop, Spark, MPI

� Workload dynamicity hidden in real‐world traces� Arrival patterns9 Request/Job submitting time and sequences

� Job input sizes9 e.g. ranging from KB to ZB

BigDataBench HPBDC 2015

Page 40: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Existing big data benchmarksg g

Benchmarks Actual workloads

Real workload traces

Mixed workloads

AMPLab benchmark LinkbenchAMPLab benchmark, Linkbench , Bigbench , YCSB, CloudSuite Yes No No

d

H t t l kl d th b i f

GridMix , SWIM  No Yes No

� How to generate real workloads on the basis of real workload traces, is still an open question.

BigDataBench HPBDC 2015

Page 41: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

System OverviewSystem Overview

� Three modules� Benchmark User 

PortalPortal• A visual interface

� Combiner of Workloads d Tand Traces

• A matcher of real workloads and traces

M l i W kl d� Multi‐tenant Workload Generator

• A multi‐tenant workload generatorworkload generator

BigDataBench HPBDC 2015

Page 42: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Key technique: Combination of real and synthetic data analytic jobs

�Goal: Combining the arrival patterns t t d f l t ith lextracted from real traces with real 

workloads.

�Problem: Workload traces only contain anonymous jobs whose workload typesanonymous jobs whose workload types and/or input data are unknown. 

BigDataBench HPBDC 2015

Page 43: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Solution: the first stepp� Deriving the workload characteristics of both greal and anonymous jobs

Metric Description

TABLE. Metrics to represent workload characteristics

Execution time Measured in secondsCPU usage Total CPU time per second

Memory usage Measured in GBCPI Cycles per instruction

h b fMAI The number of memory accesses per instruction

BigDataBench HPBDC 2015

Page 44: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Solution: the second stepp�Matching both types of jobs whose workload g yp jcharacteristics are sufficiently similar

BigDataBench HPBDC 2015

Page 45: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

An Examplep� An example of matching Hadoop workloads

Mining Facebook/Googleworkload trace

Matching result: replaying basis

workload trace(Exact workload characteristics information

Job type Input size (GB)

StartingTime information

Profiling Hadoop

Workload matching using

( )(minutes)

Bayes 2 10

Sort 1 20g pworkloads from BigDataBench

(Collect workload 

usingk‐means clustering

Sort 1 20

K‐means 0.5 25

Bayes 5 30

characteristics information)

Sort 1 40

BigDataBench HPBDC 2015

Page 46: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

System demostration� Three steps to generate a mix of search service and Hadoop MapReduce jobs� Traces : 24‐hour Sogou user query logs and Google cluster trace.

Step 1‐Specification of tested machines and workloadsStep 2 Selection of benchmarking period and scaleStep 2‐Selection of benchmarking period and scale

Step 3‐Generation of mixed workloads

BigDataBench HPBDC 2015

Page 47: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Workloads and traces in BigDataBench MTBigDataBench‐MT

M lti t V1 0 l

f

� Multi‐tenancy V1.0 releases:

Workloads Software stack Workload traceNutch Web Search

Apache Tomcat6.0.26, Search

Sogou(http://www.sogou.com/labs/dl

Server(Nutch) /q‐e.html)

Hadoop Hadoop 1.0.2 Facebook(https://github.com/SWIMProjectUCB/SWIM/wiki)

Shark Shark 0 8 0 Google data centerShark Shark 0.8.0 Google data center(https://code.google.com/p/googleclusterdata/)

BigDataBench HPBDC 2015

Page 48: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

OutlineOutline

� BigDataBench Overview

� Workload characterization

� Multi-tenancy version� Multi-tenancy version

� Processors evaluation

BigDataBench HPBDC 2015

Page 49: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Core ArchitectureCore Architecture

� Multi brawny core (Xeon E5645, 2.4 GHz)� 6 Out‐of‐Order cores� Dynamic Multiple Issue (supper scalar)� Dynamic Overclocking� Simultaneous multithreading

� Many wimpy core architecture (Tile‐Gx36, 1.2 GHz):y py ( , )� 36 In‐Order cores� Static Multiple Issue (VLIW)p ( )

BigDataBench HPBDC 2015

Page 50: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Experiment methodologyExperiment methodology

� User real hardware instead of simulation� Real power consumption measurement instead of modeling

� Saturate CPU performance by:Saturate CPU performance by: � Isolate the processor behavior

• Over‐provisions the disk I/O subsystem by using RAM diskp / y y g

� Optimize benchmarks• Tune the software stack parameters • JVM flags to performance

BigDataBench HPBDC 2015

Page 51: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Execution timeExecution time

� For Hadoop based sort, the performance gap is about 1.08×.� For the other workloads, more than 2× gaps exist between 

X d TilXeon and Tilera.

3.54

4.55

me

Xeon Tilera

11.52

2.53

3.5

Normalized

 Tim

00.51

� From the perspective of execution time, the Xeon processor is b tt th Til ll th ti

BigDataBench HPBDC 2015

better than Tilera processor all the time.

Page 52: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Cycle CountsCycle Counts

� There are huge cycle count gaps between Xeon and Tileraranging from 5.3 to 14.Til d l t l t th t f� Tilera need more cycles to complete the same amount of work.

16 Xeon Tilera

8

10

12

14

zed Cycles

2

4

6

8

Normaliz

0

BigDataBench HPBDC 2015

Page 53: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Pipeline EfficiencyPipeline Efficiency• The theoretical IPC:The theoretical IPC:

¾ Xeon: 4 instructions per cycle

• Pipeline efficiency:¾ Tilera: 1 instruction bundle per cycle

Pipeline efficiency:

0.45 Tilera Xeon

0 150.2

0.250.3

0.350.4

ine Efficiency

00.050.1

0.15

Pipe

li

• OoO pipelines are more efficient than in‐order ones

BigDataBench HPBDC 2015

Page 54: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Power ConsumptionPower Consumption

� Tilera is power optimized.� Xeon consumes more power.� Xeon consumes more power.

1.6

1.8 Tilera Xeon

0.8

1

1.2

1.4

alized

 Pow

er

0

0.2

0.4

0.6

Norm

BigDataBench HPBDC 2015

Page 55: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Energy ConsumptionEnergy Consumptiond b d l l h� Hadoop based sort consumes less energy on Tilera than on Xeon 

� Hadoop sort  is an extremely I/O intensive workloads. � Tilera consumes more energy than Xeon to complete the same amount ofTilera consumes more energy than Xeon to complete the same amount of 

work for most big data workloads� The longer execution time offsets the lower power design

3 5 Xeon Tilera

2

2.5

3

3.5

zed En

ergy

0.5

1

1.5

Normali

0

BigDataBench HPBDC 2015

Page 56: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Total Cost of Ownership (TCO) Model[*]Total Cost of Ownership (TCO) Model[ ]

� Three‐year depreciation cycle� Hardware costs associated with individual components� CPU� Memory� Disk� Board� Power � Cooling

[*] K. Lim et al. Understanding and designing new server architectures for emerging warehouse‐computing 

BigDataBench HPBDC 2015

environments. ISCA 2008

Page 57: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Cost modelCost model

� The cost data originate from diverse sources:Diff t d� Different vendors 

� Corresponding official websites� Power and cooling:

BigDataBench HPBDC 2015

� An activity factor of 0.75

Page 58: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Performance per TCOPerformance per TCO

� Haoop‐based Sort has higher performance per TCO on the Tilera.

� For other workloads, Xeon outperforms Tilera.Tilera Xeon Turbo & HT enabled

2

2.5

3

3.5

ance per TCO

0.5

1

1.5

2

malized

 Perform

0Norm

BigDataBench HPBDC 2015

Page 59: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

Key TakeawaysKey Takeaways

� Try using an open‐source big data benchmark suite from http://prof.ict.ac.cn/BigDataBench

� Big Data:  data movement dominated computing with more branch operations� 92% percentage in terms of instruction mix

� Multi‐tenancy version: replaying mixed workloadsMulti tenancy version: replaying  mixed workloads according to publicly available workloads traces.

� Wimpy‐core processors only suit a part of big data� Wimpy‐core processors only suit a part of big data workloads. 

BigDataBench HPBDC 2015

Page 60: nding Big a ds on Modern s using aBenchweb.cse.ohio-state.edu/~lu.932/hpbdc2015/slides/hpbdc15-zhan.pdf · LinkBench N AMP s N BigDataBench 2015 [1] The r workloads types include

BigDataBench HPBDC 2015