fat-tree data center topology -...

8
A Scalable, Commodity Data Center Network Architecture Al9Fares, Loukissas, Vahdat, "A Scalable, Commodity Data Center Network Architecture," Proc. of ACM SIGCOMM '08, 38(4):63974, Oct. 2008. Presenter: William Beyer Paper Goals Point out faults with current data center designs Propose new architecture based on fat9tree Scalable interconnecUon bandwidth Economies of scale Backward compaUbility A Typical Data Center Data center topology is typically 293 level tree of switches and routers OversubscripUon RaUo of worst9case achievable aggregate bandwidth among end9hosts to the total bisecUon bandwidth of the network topology Ability of hosts to fully uUlize their uplink capacity 1:1 – All hosts can use full uplink capacity 5:1 – Only 20% of host bandwidth may be available Typical raUo is 2.5:1 (400 Mbps) to 8:1 (125 Mbps)

Upload: trinhque

Post on 13-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fat-tree Data Center Topology - web.eecs.umich.eduweb.eecs.umich.edu/~sugih/courses/eecs589/f13/24-FatTree.pdf · Fat-tree Data Center Topology.pptx Author: sugih Created Date: 10/31/2013

A"Scalable,"Commodity"Data"Center"Network"Architecture"

Al9Fares,"Loukissas,"Vahdat,""A"Scalable,"Commodity"Data"Center"Network"Architecture,""Proc."of"ACM"

SIGCOMM"'08,"38(4):63974,"Oct."2008.""

Presenter:"William"Beyer"

Paper"Goals"

•  Point"out"faults"with"current"data"center"designs"

•  Propose"new"architecture"based"on"fat9tree"– Scalable"interconnecUon"bandwidth"– Economies"of"scale"– Backward"compaUbility"

A"Typical"Data"Center"

•  Data"center"topology"is"typically"293"level"tree"of"switches"and"routers"

OversubscripUon"

•  RaUo"of"worst9case"achievable"aggregate"bandwidth"among"end9hosts"to"the"total"bisecUon"bandwidth"of"the"network"topology"– Ability"of"hosts"to"fully"uUlize"their"uplink"capacity"

•  1:1"–"All"hosts"can"use"full"uplink"capacity"•  5:1"–"Only"20%"of"host"bandwidth"may"be"available"

•  Typical"raUo"is"2.5:1"(400"Mbps)"to"8:1"(125"Mbps)"

Page 2: Fat-tree Data Center Topology - web.eecs.umich.eduweb.eecs.umich.edu/~sugih/courses/eecs589/f13/24-FatTree.pdf · Fat-tree Data Center Topology.pptx Author: sugih Created Date: 10/31/2013

MulU9path"RouUng"

•  “MulU9rooted”"tree"required"to"communicate"at"full"bandwidth"for"large"clusters"– Otherwise"limited"to"max"bandwidth"of"a"single"expensive"switch"(1289port"10"GigE)"

•  Use"mulU9path"rouUng"technique"such"as"ECMP"– Performs"staUc"load"splidng,"cannot"account"for"flow"sizes"

– RouUng"tables"become"very"large"with"mulUple"paths"

Cost"Analysis"

Cost"Analysis" Fat9tree"Architecture"

•  k9ary"fat9tree:"three9layer"topology"(edge,"aggregaUon,"core)"–  k"pods,"each"consists"of"(k/2)2"hosts"and"two"layers"(edge/aggregate)"each"with"k/2"k9port"switches"

–  Each"edge"switch"connects"to"k/2"hosts"and"k/2"aggregate"switches"

–  Each"aggregate"switch"connects"to"k/2"edge"and"k/2"core"switches"

–  (k/2)2"core"switches:"each"connects"to"k"pods"–  Supports"k3/4"hosts!"

Page 3: Fat-tree Data Center Topology - web.eecs.umich.eduweb.eecs.umich.edu/~sugih/courses/eecs589/f13/24-FatTree.pdf · Fat-tree Data Center Topology.pptx Author: sugih Created Date: 10/31/2013

Fat9tree"Topology"with"k"="4" Issues"with"Fat9tree"Topologies"

•  Backwards"compaUble"with"IP/Ethernet"– Good"thing,"but"rouUng"algorithms"will"naively"choose"a"single"shortest"path"to"use"between"subnets"

– Leads"to"boilenecks"quickly"–  (k/2)2"shortest"paths"available,"should"use"them"all"equally"

•  Complex"wiring"due"to"lack"of"high"speed"ports"

Addressing"in"Fat9tree"

•  Use"10.0.0.0/8"private"addressing"block"•  Pod"switches"have"address"10.pod.switch.1"– Pod"and"switch"in"[0,"k91]"based"on"posiUon"

•  Core"switches"have"address"10.k.j.i"–  i"and"j"denote"core"posiUon"in"(k/2)2"core"switches"

•  Hosts"have"address"10.pod.switch.ID"–  ID"is"host"ID"in"switch"subnet"([2,"(k/2)"+"1])"– k"<"256,"this"scheme"does"not"scale"indefinitely"

Two9Level"Lookup"Table"

•  Prefixes"used"for"forwarding"intra9pod"traffic"•  Suffixes"used"for"forwarding"inter9pod"traffic"

Page 4: Fat-tree Data Center Topology - web.eecs.umich.eduweb.eecs.umich.edu/~sugih/courses/eecs589/f13/24-FatTree.pdf · Fat-tree Data Center Topology.pptx Author: sugih Created Date: 10/31/2013

Two9Level"Lookup"ImplementaUon"

•  Implemented"in"hardware"using"a"TCAM"– Can"perform"parallel"lookups"across"table"– Stores"don’t"care"bits,"suitable"for"storing"variable"length"prefixes"

•  Prefixes"preferred"over"suffixes"

RouUng"Algorithm"

•  Prefixes"in"two9level"table"prevent"intra9pod"traffic"from"leaving"pod"

•  Inter9pod"traffic"handled"by"suffix"table"– Suffixes"based"off"host"IDs,"ensures"spread"of"traffic"across"core"switches"

– Prevents"packet"reordering"by"having"staUc"path"•  Each"host9to9host"communicaUon"has"a"single"staUc"path"– Beier"than"having"a"single"path"between"subnets"

RouUng"Algorithm"(cont.)"

•  Core"switches"contain"(10.pod.0.0/16,"port)"entries"–  StaUcally"forwards"inter9pod"traffic"on"specified"port"

•  Aggregate"switches"contain"(10.pod.switch.0/24,"port)"entries"–  Switch"value"is"the"edge"switch"number"

•  Assumes"a"central"enUty"with"full"knowledge"of"topology"generates"these"rouUng"tables"– Also"responsible"for"detecUng"switch"failures"and"re9rouUng"traffic"

RouUng"Algorithm"Example"

Page 5: Fat-tree Data Center Topology - web.eecs.umich.eduweb.eecs.umich.edu/~sugih/courses/eecs589/f13/24-FatTree.pdf · Fat-tree Data Center Topology.pptx Author: sugih Created Date: 10/31/2013

Dynamic"RouUng"Techniques"•  AlternaUves"to"two9level"rouUng"table"– Aiempt"to"classify"and"schedule"flows"rather"than"use"staUc"rouUng"

•  Flow"ClassificaUon"–  Periodically"reassigns"flow"output"ports"–  Prevents"compeUUon"between"flows"for"a"single"port"

•  Flow"Scheduling"–  IdenUfy"large"flows"and"establish"reserved"paths"for"them"

–  Requires"communicaUon"between"edge"switches"and"a"central"flow"scheduler"

Fault"Tolerance"

•  Many"possible"paths"between"hosts"leads"to"“easy”"fault"tolerance"

•  Each"switch"maintains"BidirecUonal"Forwarding"DetecUon"session"with"neighbors"– Allows"switch"to"determine"when"neighbors"fail"

•  Two"primary"types"of"link"failure"– Between"lower"and"upper"switches"– Between"upper"and"core"switches"

Router"Power"and"Heat"DissipaUon" Topology"Power/Heat"DissipaUon"

Page 6: Fat-tree Data Center Topology - web.eecs.umich.eduweb.eecs.umich.edu/~sugih/courses/eecs589/f13/24-FatTree.pdf · Fat-tree Data Center Topology.pptx Author: sugih Created Date: 10/31/2013

Cafarella,"2013"

Hamilton,"2008"

Cafarella,"2013"

Emerson"Network"Power,"2007"

Cafarella,"2013"

EPA,"2007"

Sotware"ImplementaUon"

•  Validated"in"sotware"using"Click"– Click"is"a"modular"sotware"router"architecture"–  Implement"routers"on"PCs,"supports"experimental"router"designs"

•  Click"modules"called"“elements”"– Each"element"performs"a"specified"task"– RouUng"table"lookup,"decrement"packet"TTL,"etc…"

•  Implemented"elements"for"two9level"table,"flow"classifier,"and"flow"scheduler"

Page 7: Fat-tree Data Center Topology - web.eecs.umich.eduweb.eecs.umich.edu/~sugih/courses/eecs589/f13/24-FatTree.pdf · Fat-tree Data Center Topology.pptx Author: sugih Created Date: 10/31/2013

EvaluaUon"Setup"

•  Uses"a"49port"fat9tree"as"seen"previously"– Two9level"table"and"flow9based"schemes"analyzed"– Compared"against"hierarchical"tree"with"oversubscripUon"raUo"of"3.6:1"

•  Both"evaluated"using"Click"– Emulate"switches"and"hosts"on"PCs"

•  All"hosts"generate"96"Mbit/s"of"outgoing"traffic"– This"value"prevents"CPU"from"throiling"test"

EvaluaUon"Results"•  Percentages"indicate"aggregate"network"bandwidth"– Measured"as"amount"of"incoming"traffic"received"by"hosts""

Flow"Scheduler"Requirements"

•  Minimal"Ume"and"memory"requirements"for"flow"scheduler"

•  Feasible"to"use"at"least"unUl"k"grows"extremely"large"

Packaging"Problem"

•  Fat9tree"has"significant"cabling"overhead"– 1"GigE"switches"used"to"reduce"cost"– Lack"of"10"GigE"ports"leads"to"more"cabling"

•  Present"a"packaging"soluUon"for"k=48"– Generalizes"to"other"values"of"k"

Page 8: Fat-tree Data Center Topology - web.eecs.umich.eduweb.eecs.umich.edu/~sugih/courses/eecs589/f13/24-FatTree.pdf · Fat-tree Data Center Topology.pptx Author: sugih Created Date: 10/31/2013

Packaging"SoluUon" Strengths"

•  Fat9tree"architecture"seems"to"outperform"hierarchical"soluUon"

•  Excellent"power"and"heat"reducUons"over"hierarchical"approach"

•  EvaluaUon"methods"were"good"overall"with"tests"performed"

•  Data"centers"can"easily"switch"to"this"new"method"

Weaknesses"

•  Language"used"in"paper"was"confusing"at"Umes"– Referred"to"pod"switches"as"“aggregate"switch”,"“upper9layer"switch”,"and"“upper"pod"switch”"at"various"points"

•  EvaluaUon"performed"with"small"value"of"k=4"– Would"have"been"nice"to"see"higher"values"of"k"tested"

– Academic"project"and"resources"were"obviously"a"factor"for"evaluaUon"

References"•  Al9Fares,"Loukissas,"Vahdat,""

A"Scalable,"Commodity"Data"Center"Network"Architecture,""Proc.&of&ACM&SIGCOMM&'08,"38(4):63974,"Oct."2008."

•  Cafarella,"M."(2013,"April"20)."Datacenters."EECS&485."Lecture"conducted"from"University"of"Michigan,"Ann"Arbor."

•  "Energy"Efficient"Cooling"SoluUons"for"Data"Centers.""Emerson&Network&Power."2007"Web."28"Oct."2013."<hip://www.emersonnetworkpower.com/documents/en9us/latest9thinking/edc/documents/white%20paper/energy_efficient_cooling_soluUons_for_data_centers.pdf>."

•  Hamilton,"James.""PerspecUves"9"Cost"of"Power"in"Large9Scale"Data"Centers.""Perspec>ves&@&James&Hamilton's&Blog."N.p.,"28"Nov."2008."Web."28"Oct."2013."<hip://perspecUves.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx>."

•  "Report"to"Congress"on"Server"and"Data"Center"Energy"Efficiency.""Energy&Star."U.S."Environmental"ProtecUon"Agency,"2"Aug."2007."Web."28"Oct."2013."<hip://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf?db729bf5a>."