fat-tree data center topology - electrical engineering and ...sugih/courses/eecs589/... ·...

8
A Scalable, Commodity Data Center Network Architecture Al9Fares, Loukissas, Vahdat, "A Scalable, Commodity Data Center Network Architecture," Proc. of ACM SIGCOMM '08, 38(4):63974, Oct. 2008. Presenter: William Beyer Paper Goals Point out faults with current data center designs Propose new architecture based on fat9tree Scalable interconnecUon bandwidth Economies of scale Backward compaUbility A Typical Data Center Data center topology is typically 293 level tree of switches and routers OversubscripUon RaUo of worst9case achievable aggregate bandwidth among end9hosts to the total bisecUon bandwidth of the network topology Ability of hosts to fully uUlize their uplink capacity 1:1 – All hosts can use full uplink capacity 5:1 – Only 20% of host bandwidth may be available Typical raUo is 2.5:1 (400 Mbps) to 8:1 (125 Mbps)

Upload: others

Post on 12-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

A"Scalable,"Commodity"Data"Center"Network"Architecture"

Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"Data"Center"Network"Architecture,""Proc."of"ACM"

SIGCOMM"'08,"38(4):63974,"Oct."2008.""

Presenter:"William"Beyer"

Paper"Goals"

•  Point"out"faults"with"current"data"center"designs"

•  Propose"new"architecture"based"on"fat9tree"– Scalable"interconnecUon"bandwidth"– Economies"of"scale"– Backward"compaUbility"

A"Typical"Data"Center"

•  Data"center"topology"is"typically"293"level"tree"of"switches"and"routers"

OversubscripUon"

•  RaUo"of"worst9case"achievable"aggregate"bandwidth"among"end9hosts"to"the"total"bisecUon"bandwidth"of"the"network"topology"– Ability"of"hosts"to"fully"uUlize"their"uplink"capacity"

•  1:1"–"All"hosts"can"use"full"uplink"capacity"•  5:1"–"Only"20%"of"host"bandwidth"may"be"available"

•  Typical"raUo"is"2.5:1"(400"Mbps)"to"8:1"(125"Mbps)"

Page 2: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

MulU9path"RouUng"

•  “MulU9rooted”"tree"required"to"communicate"at"full"bandwidth"for"large"clusters"– Otherwise"limited"to"max"bandwidth"of"a"single"expensive"switch"(1289port"10"GigE)"

•  Use"mulU9path"rouUng"technique"such"as"ECMP"– Performs"staUc"load"splidng,"cannot"account"for"flow"sizes"

– RouUng"tables"become"very"large"with"mulUple"paths"

Cost"Analysis"

Cost"Analysis" Fat9tree"Architecture"

•  k9ary"fat9tree:"three9layer"topology"(edge,"aggregaUon,"core)"–  k"pods,"each"consists"of"(k/2)2"hosts"and"two"layers"(edge/aggregate)"each"with"k/2"k9port"switches"

–  Each"edge"switch"connects"to"k/2"hosts"and"k/2"aggregate"switches"

–  Each"aggregate"switch"connects"to"k/2"edge"and"k/2"core"switches"

–  (k/2)2"core"switches:"each"connects"to"k"pods"–  Supports"k3/4"hosts!"

Page 3: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

Fat9tree"Topology"with"k"="4" Issues"with"Fat9tree"Topologies"

•  Backwards"compaUble"with"IP/Ethernet"– Good"thing,"but"rouUng"algorithms"will"naively"choose"a"single"shortest"path"to"use"between"subnets"

– Leads"to"boilenecks"quickly"–  (k/2)2"shortest"paths"available,"should"use"them"all"equally"

•  Complex"wiring"due"to"lack"of"high"speed"ports"

Addressing"in"Fat9tree"

•  Use"10.0.0.0/8"private"addressing"block"•  Pod"switches"have"address"10.pod.switch.1"– Pod"and"switch"in"[0,"k91]"based"on"posiUon"

•  Core"switches"have"address"10.k.j.i"–  i"and"j"denote"core"posiUon"in"(k/2)2"core"switches"

•  Hosts"have"address"10.pod.switch.ID"–  ID"is"host"ID"in"switch"subnet"([2,"(k/2)"+"1])"– k"<"256,"this"scheme"does"not"scale"indefinitely"

Two9Level"Lookup"Table"

•  Prefixes"used"for"forwarding"intra9pod"traffic"•  Suffixes"used"for"forwarding"inter9pod"traffic"

Page 4: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

Two9Level"Lookup"ImplementaUon"

•  Implemented"in"hardware"using"a"TCAM"– Can"perform"parallel"lookups"across"table"– Stores"don’t"care"bits,"suitable"for"storing"variable"length"prefixes"

•  Prefixes"preferred"over"suffixes"

RouUng"Algorithm"

•  Prefixes"in"two9level"table"prevent"intra9pod"traffic"from"leaving"pod"

•  Inter9pod"traffic"handled"by"suffix"table"– Suffixes"based"off"host"IDs,"ensures"spread"of"traffic"across"core"switches"

– Prevents"packet"reordering"by"having"staUc"path"•  Each"host9to9host"communicaUon"has"a"single"staUc"path"– Beier"than"having"a"single"path"between"subnets"

RouUng"Algorithm"(cont.)"

•  Core"switches"contain"(10.pod.0.0/16,"port)"entries"–  StaUcally"forwards"inter9pod"traffic"on"specified"port"

•  Aggregate"switches"contain"(10.pod.switch.0/24,"port)"entries"–  Switch"value"is"the"edge"switch"number"

•  Assumes"a"central"enUty"with"full"knowledge"of"topology"generates"these"rouUng"tables"– Also"responsible"for"detecUng"switch"failures"and"re9rouUng"traffic"

RouUng"Algorithm"Example"

Page 5: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

Dynamic"RouUng"Techniques"•  AlternaUves"to"two9level"rouUng"table"– Aiempt"to"classify"and"schedule"flows"rather"than"use"staUc"rouUng"

•  Flow"ClassificaUon"–  Periodically"reassigns"flow"output"ports"–  Prevents"compeUUon"between"flows"for"a"single"port"

•  Flow"Scheduling"–  IdenUfy"large"flows"and"establish"reserved"paths"for"them"

–  Requires"communicaUon"between"edge"switches"and"a"central"flow"scheduler"

Fault"Tolerance"

•  Many"possible"paths"between"hosts"leads"to"“easy”"fault"tolerance"

•  Each"switch"maintains"BidirecUonal"Forwarding"DetecUon"session"with"neighbors"– Allows"switch"to"determine"when"neighbors"fail"

•  Two"primary"types"of"link"failure"– Between"lower"and"upper"switches"– Between"upper"and"core"switches"

Router"Power"and"Heat"DissipaUon" Topology"Power/Heat"DissipaUon"

Page 6: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

Cafarella,"2013"

Hamilton,"2008"

Cafarella,"2013"

Emerson"Network"Power,"2007"

Cafarella,"2013"

EPA,"2007"

Sotware"ImplementaUon"

•  Validated"in"sotware"using"Click"– Click"is"a"modular"sotware"router"architecture"–  Implement"routers"on"PCs,"supports"experimental"router"designs"

•  Click"modules"called"“elements”"– Each"element"performs"a"specified"task"– RouUng"table"lookup,"decrement"packet"TTL,"etc…"

•  Implemented"elements"for"two9level"table,"flow"classifier,"and"flow"scheduler"

Page 7: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

EvaluaUon"Setup"

•  Uses"a"49port"fat9tree"as"seen"previously"– Two9level"table"and"flow9based"schemes"analyzed"– Compared"against"hierarchical"tree"with"oversubscripUon"raUo"of"3.6:1"

•  Both"evaluated"using"Click"– Emulate"switches"and"hosts"on"PCs"

•  All"hosts"generate"96"Mbit/s"of"outgoing"traffic"– This"value"prevents"CPU"from"throiling"test"

EvaluaUon"Results"•  Percentages"indicate"aggregate"network"bandwidth"– Measured"as"amount"of"incoming"traffic"received"by"hosts""

Flow"Scheduler"Requirements"

•  Minimal"Ume"and"memory"requirements"for"flow"scheduler"

•  Feasible"to"use"at"least"unUl"k"grows"extremely"large"

Packaging"Problem"

•  Fat9tree"has"significant"cabling"overhead"– 1"GigE"switches"used"to"reduce"cost"– Lack"of"10"GigE"ports"leads"to"more"cabling"

•  Present"a"packaging"soluUon"for"k=48"– Generalizes"to"other"values"of"k"

Page 8: Fat-tree Data Center Topology - Electrical Engineering and ...sugih/courses/eecs589/... · A"Scalable,"Commodity"Data" Center"Network"Architecture" Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"

Packaging"SoluUon" Strengths"

•  Fat9tree"architecture"seems"to"outperform"hierarchical"soluUon"

•  Excellent"power"and"heat"reducUons"over"hierarchical"approach"

•  EvaluaUon"methods"were"good"overall"with"tests"performed"

•  Data"centers"can"easily"switch"to"this"new"method"

Weaknesses"

•  Language"used"in"paper"was"confusing"at"Umes"– Referred"to"pod"switches"as"“aggregate"switch”,"“upper9layer"switch”,"and"“upper"pod"switch”"at"various"points"

•  EvaluaUon"performed"with"small"value"of"k=4"– Would"have"been"nice"to"see"higher"values"of"k"tested"

– Academic"project"and"resources"were"obviously"a"factor"for"evaluaUon"

References"•  Al9Fares,"Loukissas,"Vahdat,""

A"Scalable,"Commodity"Data"Center"Network"Architecture,""Proc.&of&ACM&SIGCOMM&'08,"38(4):63974,"Oct."2008."

•  Cafarella,"M."(2013,"April"20)."Datacenters."EECS&485."Lecture"conducted"from"University"of"Michigan,"Ann"Arbor."

•  "Energy"Efficient"Cooling"SoluUons"for"Data"Centers.""Emerson&Network&Power."2007"Web."28"Oct."2013."<hip://www.emersonnetworkpower.com/documents/en9us/latest9thinking/edc/documents/white%20paper/energy_efficient_cooling_soluUons_for_data_centers.pdf>."

•  Hamilton,"James.""PerspecUves"9"Cost"of"Power"in"Large9Scale"Data"Centers.""Perspec>ves&@&James&Hamilton's&Blog."N.p.,"28"Nov."2008."Web."28"Oct."2013."<hip://perspecUves.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx>."

•  "Report"to"Congress"on"Server"and"Data"Center"Energy"Efficiency.""Energy&Star."U.S."Environmental"ProtecUon"Agency,"2"Aug."2007."Web."28"Oct."2013."<hip://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf?db729bf5a>."