hadoop hardware @twitter: size does matter!

Hadoop Hardware @Twitter:Size does matter.

@joep and @eecraftHadoop Summit 2013

v2.3

@Twitter#HadoopSummit20132

Joep RottinghuisSoftware Engineer @ Twitter

Engineering Manager Hadoop/HBase team @ Twitter

Follow me @joep

Jay ShenoyHardware Engineer @ Twitter

Engineering Manager HW @ Twitter

Follow me @eecraft

HW & Hadoop teams @ Twitter, Many others

•

•••

•

•••

•

About us


Scale of Hadoop ClustersSingle versus multiple clustersTwitter Hadoop ArchitectureHardware investigationsResults

•

•

•

•

•

Agenda


Scale

4

Scaling limits

JobTracker 10’s thousands of jobs per day; 10’s Ks concurrentslots

Namenode 250-300 M objects in single namespace

Namenode @~100 GB heap -> full GC pauses

Shipping job jars to 1,000’s of nodes

JobHistory server at a few 100’s K job history/conf files

•

•

••••

# Nodes


When / why to split clusters ?

5

In principle preference for single clusterCommon logs, shared free space, reduced admin burden, more rack

diversity

Varying SLA’sWorkload diversity

Storage intensiveProcessing (CPU / Disk IO) intensiveNetwork intensive

Data accessHot, Warm, Cold

•

•

•

•

•••

•

•


Cluster Architecture

6


Hardware investigations

7


Hadoop does not need live HDD swapTwitter DC : No SLA on data nodesRack SLA : Only 1 rack down at any time in a cluster

•

•

•

Service criteria for hardware


Baseline Hadoop Server (~ early 2012)

E56xx

DIMM

DIMM

DIMM

E56xx

DIMM

DIMM

DIMM

PCH NICGbE

HBA

Expander

Works for the general cluster,but...

Need more density for storage

Potential IO bottlenecks

••

Characteristics: Standard 2U

server 20 servers / rack

E5645 CPU Dual 6-core 72GB memory 12 x 2TB HDD 2 x 1 GbE

•

•

•••••


Hadoop Server: Possible evolution

Characteristics:+ CPU performance? 20 servers / rack

Candidate forDW

•

NICGbE

HBA

Expander16 x 2T?16 x 3T?24 x 3T?

E5-26xx orE5-24xx

DIMM

DIMM

DIMM

DIMM

E5-26xx orE5-24xx

DIMM

DIMM

DIMM

DIMM

10GbE ?

Can deploy into the general DW cluster, but...

Too much CPU for storage intensive apps

Server failure domain too large if we scale updisks

••


Rethinking hardware evolution

11

Debunking mythsBigger is always betterOne size fits all

Back to Hadoop Hardware Roots:Scale horizontally, not vertically

Twitter Hadoop Server - “THS”

•

••

•

•


NIC

SAS HBA

E3-12xxDIMM

DIMM

PCH

GbE

THS for backups

Storage focus:

Cost efficient (single socket, 3Tdrives)

Less memory needed

•

•

Characteristics: + IO Performance

Few fast cores

E3-1230 V2 CPU 16 GB memory 12 x 3 TB HDD SSD boot 2 x 1 GbE

•

•••••


THS variant for Hadoop-Proc and HBase

NIC

SAS HBA

10GbE

E3-12xxDIMM

DIMM

PCH

Characteristics: + IO Performance

Few fast cores

E3-1230 V2 CPU 32 GB memory 12 x 1 TB HDD SSD boot 1 x 10 GbE

•

•••••

Processing / throughput focus:

Cost efficient (single socket, 1Tdrives)

More disk and network IO persocket

•

•


THS for cold cluster

NIC

SAS HBA

E3-12xxDIMM

DIMM

PCH

GbE

Characteristics:

Disk Efficiency

Some compute

E3-1230 V2 CPU

32 GB memory

12 x 3 TB HDD

2 x 1 GbE

••

••••Combination of previous 2 use cases:

Space & power efficient

Storage dense and some processingcapabilities

••


Rack-level view

BaselineTwitter Hadoop Server

Backups Proc ColdPower ~ 8 kW ~ 8 kW ~ 8 kW ~ 8 kWCPU sockets; DRAM 40; 1440 GB 40; 640 GB 40; 1280 GB 40; 1280 GBSpindles; TB raw 240; 480 TB 480; 1,440 TB 480; 480 TB 480; 1,440 TBUplink; Internal BW 20 ; 40 Gbps 20 ; 80 Gbps 40 ; 400 Gbps 20 ; 80 Gbps

1G TOR1G TOR1G TOR

1G TOR1G TOR10G TOR


Processing performance comparison

Benchmark Baseline Server THS (-Cold)TestDFSIO (write replication = 1) 360 MB/s / node 780 MB/s / nodeTeraGen (30TB replication = 3) 1:36 hrs 1:35 hrsTeraSort (30 TB, replication = 3) 6:11 hrs 4:22 hrs2 Parallel TeraSort (30 TB each, replication = 3) 10:36 hrs 6:21 hrsApplication #1 4:37 min 3:09 minApplication set #2 13:3 hrs 10:57 hrs

Performance benchmark set up:

Each clusters 102 nodes of respective type

Efficient server = 3 racks, Baseline 5+ racks

“Dated” stack: CentOS 5.5, Sun 1.6 JRE, Hadoop 2.0.3

•••


Results

17


LZO performance comparison

18


Recap

19

At a certain scale it makes sense to split into multiple clustersFor us: RT, PROC, DW, COLD, BACKUPS, TST, EXP

For large enough clusters, depending on use-case, it may be worth to choosedifferent HW configurations

•

••


Conclusion

20

@Twitter our “Twitter Hadoop Server”not only saves many $$$, it is also

faster !

#ThankYou

@joep and @eecraft

Come talk to us at booth 26

hadoop hardware @twitter: size does matter!

Technology

dimm dimm dimm dimm

twitter hadoop server

dimm dimm pch characteristics

hadoopsummit2013 scale

dimm dimm pch gbe ths

hadoopsummit2013 results

hadoopsummit2013 conclusion

hadoopsummit2013 recap