hadoop hardware @twitter: size does matter!
DESCRIPTION
At Twitter we started out with a large monolithic cluster that served most of the use-cases. As the usage expanded and the cluster grew accordingly, we realized we needed to split the cluster by access pattern. This allows us to tune the access policy, SLA, and configuration for each cluster. We will explain our various use-cases, their performance requirements, and operational considerations and how those are served by the corresponding clusters. We will discuss what our baseline Hadoop node looks like. Various, sometimes competing, considerations such as storage size, disk IO, CPU throughput, fewer fast cores versus many slower cores, 1GE bonded network interfaces versus a single 10 GE card, 1T, 2T or 3T disk drives, and power draw all need to be considered in a trade-off where cost and performance are major factors. We will show how we have arrived at quite different hardware platforms at Twitter, not only saving money, but also increasing performance.TRANSCRIPT
Hadoop Hardware @Twitter:Size does matter.
@joep and @eecraftHadoop Summit 2013
v2.3
@Twitter#HadoopSummit20132
Joep RottinghuisSoftware Engineer @ Twitter
Engineering Manager Hadoop/HBase team @ Twitter
Follow me @joep
Jay ShenoyHardware Engineer @ Twitter
Engineering Manager HW @ Twitter
Follow me @eecraft
HW & Hadoop teams @ Twitter, Many others
•
•••
•
•••
•
About us
@Twitter#HadoopSummit20133
Scale of Hadoop ClustersSingle versus multiple clustersTwitter Hadoop ArchitectureHardware investigationsResults
•
•
•
•
•
Agenda
@Twitter#HadoopSummit2013
Scale
4
Scaling limits
JobTracker 10’s thousands of jobs per day; 10’s Ks concurrentslots
Namenode 250-300 M objects in single namespace
Namenode @~100 GB heap -> full GC pauses
Shipping job jars to 1,000’s of nodes
JobHistory server at a few 100’s K job history/conf files
•
•
••••
# Nodes
@Twitter#HadoopSummit2013
When / why to split clusters ?
5
In principle preference for single clusterCommon logs, shared free space, reduced admin burden, more rack
diversity
Varying SLA’sWorkload diversity
Storage intensiveProcessing (CPU / Disk IO) intensiveNetwork intensive
Data accessHot, Warm, Cold
•
•
•
•
•••
•
•
@Twitter#HadoopSummit2013
Cluster Architecture
6
@Twitter#HadoopSummit2013
Hardware investigations
7
@Twitter#HadoopSummit20138
Hadoop does not need live HDD swapTwitter DC : No SLA on data nodesRack SLA : Only 1 rack down at any time in a cluster
•
•
•
Service criteria for hardware
@Twitter#HadoopSummit20139
Baseline Hadoop Server (~ early 2012)
E56xx
DIMM
DIMM
DIMM
E56xx
DIMM
DIMM
DIMM
PCH NICGbE
HBA
Expander
Works for the general cluster,but...
Need more density for storage
Potential IO bottlenecks
••
Characteristics: Standard 2U
server 20 servers / rack
E5645 CPU Dual 6-core 72GB memory 12 x 2TB HDD 2 x 1 GbE
•
•
•••••
@Twitter#HadoopSummit201310
Hadoop Server: Possible evolution
Characteristics:+ CPU performance? 20 servers / rack
Candidate forDW
•
NICGbE
HBA
Expander16 x 2T?16 x 3T?24 x 3T?
E5-26xx orE5-24xx
DIMM
DIMM
DIMM
DIMM
E5-26xx orE5-24xx
DIMM
DIMM
DIMM
DIMM
10GbE ?
Can deploy into the general DW cluster, but...
Too much CPU for storage intensive apps
Server failure domain too large if we scale updisks
••
@Twitter#HadoopSummit2013
Rethinking hardware evolution
11
Debunking mythsBigger is always betterOne size fits all
Back to Hadoop Hardware Roots:Scale horizontally, not vertically
Twitter Hadoop Server - “THS”
•
••
•
•
@Twitter#HadoopSummit201312
NIC
SAS HBA
E3-12xxDIMM
DIMM
PCH
GbE
THS for backups
Storage focus:
Cost efficient (single socket, 3Tdrives)
Less memory needed
•
•
Characteristics: + IO Performance
Few fast cores
E3-1230 V2 CPU 16 GB memory 12 x 3 TB HDD SSD boot 2 x 1 GbE
•
•••••
@Twitter#HadoopSummit201313
THS variant for Hadoop-Proc and HBase
NIC
SAS HBA
10GbE
E3-12xxDIMM
DIMM
PCH
Characteristics: + IO Performance
Few fast cores
E3-1230 V2 CPU 32 GB memory 12 x 1 TB HDD SSD boot 1 x 10 GbE
•
•••••
Processing / throughput focus:
Cost efficient (single socket, 1Tdrives)
More disk and network IO persocket
•
•
@Twitter#HadoopSummit201314
THS for cold cluster
NIC
SAS HBA
E3-12xxDIMM
DIMM
PCH
GbE
Characteristics:
Disk Efficiency
Some compute
E3-1230 V2 CPU
32 GB memory
12 x 3 TB HDD
2 x 1 GbE
••
••••Combination of previous 2 use cases:
Space & power efficient
Storage dense and some processingcapabilities
••
@Twitter#HadoopSummit201315
Rack-level view
BaselineTwitter Hadoop Server
Backups Proc ColdPower ~ 8 kW ~ 8 kW ~ 8 kW ~ 8 kWCPU sockets; DRAM 40; 1440 GB 40; 640 GB 40; 1280 GB 40; 1280 GBSpindles; TB raw 240; 480 TB 480; 1,440 TB 480; 480 TB 480; 1,440 TBUplink; Internal BW 20 ; 40 Gbps 20 ; 80 Gbps 40 ; 400 Gbps 20 ; 80 Gbps
1G TOR1G TOR1G TOR
1G TOR1G TOR10G TOR
@Twitter#HadoopSummit201316
Processing performance comparison
Benchmark Baseline Server THS (-Cold)TestDFSIO (write replication = 1) 360 MB/s / node 780 MB/s / nodeTeraGen (30TB replication = 3) 1:36 hrs 1:35 hrsTeraSort (30 TB, replication = 3) 6:11 hrs 4:22 hrs2 Parallel TeraSort (30 TB each, replication = 3) 10:36 hrs 6:21 hrsApplication #1 4:37 min 3:09 minApplication set #2 13:3 hrs 10:57 hrs
Performance benchmark set up:
Each clusters 102 nodes of respective type
Efficient server = 3 racks, Baseline 5+ racks
“Dated” stack: CentOS 5.5, Sun 1.6 JRE, Hadoop 2.0.3
•••
@Twitter#HadoopSummit2013
Results
17
@Twitter#HadoopSummit201316
LZO performance comparison
18
@Twitter#HadoopSummit2013
Recap
19
At a certain scale it makes sense to split into multiple clustersFor us: RT, PROC, DW, COLD, BACKUPS, TST, EXP
For large enough clusters, depending on use-case, it may be worth to choosedifferent HW configurations
•
••
@Twitter#HadoopSummit2013
Conclusion
20
@Twitter our “Twitter Hadoop Server”not only saves many $$$, it is also
faster !
#ThankYou
@joep and @eecraft
Come talk to us at booth 26