hadoop hardware @twitter: size does matter!

21
Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3

Upload: hadoop-summit

Post on 12-May-2015

5.515 views

Category:

Technology


4 download

DESCRIPTION

At Twitter we started out with a large monolithic cluster that served most of the use-cases. As the usage expanded and the cluster grew accordingly, we realized we needed to split the cluster by access pattern. This allows us to tune the access policy, SLA, and configuration for each cluster. We will explain our various use-cases, their performance requirements, and operational considerations and how those are served by the corresponding clusters. We will discuss what our baseline Hadoop node looks like. Various, sometimes competing, considerations such as storage size, disk IO, CPU throughput, fewer fast cores versus many slower cores, 1GE bonded network interfaces versus a single 10 GE card, 1T, 2T or 3T disk drives, and power draw all need to be considered in a trade-off where cost and performance are major factors. We will show how we have arrived at quite different hardware platforms at Twitter, not only saving money, but also increasing performance.

TRANSCRIPT

Page 1: Hadoop Hardware @Twitter: Size does matter!

Hadoop Hardware @Twitter:Size does matter.

@joep and @eecraftHadoop Summit 2013

v2.3

Page 2: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit20132

Joep RottinghuisSoftware Engineer @ Twitter

Engineering Manager Hadoop/HBase team @ Twitter

Follow me @joep

Jay ShenoyHardware Engineer @ Twitter

Engineering Manager HW @ Twitter

Follow me @eecraft

HW & Hadoop teams @ Twitter, Many others

•••

•••

About us

Page 3: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit20133

Scale of Hadoop ClustersSingle versus multiple clustersTwitter Hadoop ArchitectureHardware investigationsResults

Agenda

Page 4: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit2013

Scale

4

Scaling limits

JobTracker 10’s thousands of jobs per day; 10’s Ks concurrentslots

Namenode 250-300 M objects in single namespace

Namenode @~100 GB heap -> full GC pauses

Shipping job jars to 1,000’s of nodes

JobHistory server at a few 100’s K job history/conf files

••••

# Nodes

Page 5: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit2013

When / why to split clusters ?

5

In principle preference for single clusterCommon logs, shared free space, reduced admin burden, more rack

diversity

Varying SLA’sWorkload diversity

Storage intensiveProcessing (CPU / Disk IO) intensiveNetwork intensive

Data accessHot, Warm, Cold

•••

Page 6: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit2013

Cluster Architecture

6

Page 7: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit2013

Hardware investigations

7

Page 8: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit20138

Hadoop does not need live HDD swapTwitter DC : No SLA on data nodesRack SLA : Only 1 rack down at any time in a cluster

Service criteria for hardware

Page 9: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit20139

Baseline Hadoop Server (~ early 2012)

E56xx

DIMM

DIMM

DIMM

E56xx

DIMM

DIMM

DIMM

PCH NICGbE

HBA

Expander

Works for the general cluster,but...

Need more density for storage

Potential IO bottlenecks

••

Characteristics: Standard 2U

server 20 servers / rack

E5645 CPU Dual 6-core 72GB memory 12 x 2TB HDD 2 x 1 GbE

•••••

Page 10: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit201310

Hadoop Server: Possible evolution

Characteristics:+ CPU performance? 20 servers / rack

Candidate forDW

NICGbE

HBA

Expander16 x 2T?16 x 3T?24 x 3T?

E5-26xx orE5-24xx

DIMM

DIMM

DIMM

DIMM

E5-26xx orE5-24xx

DIMM

DIMM

DIMM

DIMM

10GbE ?

Can deploy into the general DW cluster, but...

Too much CPU for storage intensive apps

Server failure domain too large if we scale updisks

••

Page 11: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit2013

Rethinking hardware evolution

11

Debunking mythsBigger is always betterOne size fits all

Back to Hadoop Hardware Roots:Scale horizontally, not vertically

Twitter Hadoop Server - “THS”

••

Page 12: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit201312

NIC

SAS HBA

E3-12xxDIMM

DIMM

PCH

GbE

THS for backups

Storage focus:

Cost efficient (single socket, 3Tdrives)

Less memory needed

Characteristics: + IO Performance

Few fast cores

E3-1230 V2 CPU 16 GB memory 12 x 3 TB HDD SSD boot 2 x 1 GbE

•••••

Page 13: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit201313

THS variant for Hadoop-Proc and HBase

NIC

SAS HBA

10GbE

E3-12xxDIMM

DIMM

PCH

Characteristics: + IO Performance

Few fast cores

E3-1230 V2 CPU 32 GB memory 12 x 1 TB HDD SSD boot 1 x 10 GbE

•••••

Processing / throughput focus:

Cost efficient (single socket, 1Tdrives)

More disk and network IO persocket

Page 14: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit201314

THS for cold cluster

NIC

SAS HBA

E3-12xxDIMM

DIMM

PCH

GbE

Characteristics:

Disk Efficiency

Some compute

E3-1230 V2 CPU

32 GB memory

12 x 3 TB HDD

2 x 1 GbE

••

••••Combination of previous 2 use cases:

Space & power efficient

Storage dense and some processingcapabilities

••

Page 15: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit201315

Rack-level view

BaselineTwitter Hadoop Server

Backups Proc ColdPower ~ 8 kW ~ 8 kW ~ 8 kW ~ 8 kWCPU sockets; DRAM 40; 1440 GB 40; 640 GB 40; 1280 GB 40; 1280 GBSpindles; TB raw 240; 480 TB 480; 1,440 TB 480; 480 TB 480; 1,440 TBUplink; Internal BW 20 ; 40 Gbps 20 ; 80 Gbps 40 ; 400 Gbps 20 ; 80 Gbps

1G TOR1G TOR1G TOR

1G TOR1G TOR10G TOR

Page 16: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit201316

Processing performance comparison

Benchmark Baseline Server THS (-Cold)TestDFSIO (write replication = 1) 360 MB/s / node 780 MB/s / nodeTeraGen (30TB replication = 3) 1:36 hrs 1:35 hrsTeraSort (30 TB, replication = 3) 6:11 hrs 4:22 hrs2 Parallel TeraSort (30 TB each, replication = 3) 10:36 hrs 6:21 hrsApplication #1 4:37 min 3:09 minApplication set #2 13:3 hrs 10:57 hrs

Performance benchmark set up:

Each clusters 102 nodes of respective type

Efficient server = 3 racks, Baseline 5+ racks

“Dated” stack: CentOS 5.5, Sun 1.6 JRE, Hadoop 2.0.3

•••

Page 17: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit2013

Results

17

Page 18: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit201316

LZO performance comparison

18

Page 19: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit2013

Recap

19

At a certain scale it makes sense to split into multiple clustersFor us: RT, PROC, DW, COLD, BACKUPS, TST, EXP

For large enough clusters, depending on use-case, it may be worth to choosedifferent HW configurations

••

Page 20: Hadoop Hardware @Twitter: Size does matter!

@Twitter#HadoopSummit2013

Conclusion

20

@Twitter our “Twitter Hadoop Server”not only saves many $$$, it is also

faster !

Page 21: Hadoop Hardware @Twitter: Size does matter!

#ThankYou

@joep and @eecraft

Come talk to us at booth 26