issues in mainstream clusters - stanford university · trends: things are getting a bit toasty…...

46
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Issues in Mainstream Clusters Stanford EE380, 1 Oct 1008 Richard Kaufmann Scalable Computing & Infrastructure

Upload: others

Post on 03-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Issues in Mainstream ClustersStanford EE380, 1 Oct 1008

Richard KaufmannScalable Computing & Infrastructure

Page 2: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Disclaimer

Copyright (c) 2008, Hewlett Packard.

HP makes no warranties regarding the accuracy of this information. HP does not warrant or represent that it will introduce any product to which the information relates. It is presented for sheer entertainment value only. Trademarks remain the property of their original owners. So there.

Page 3: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

How Do Server Designers Figure Out New Designs?• Consider some server use cases (applications,

ecosystem, etc.)• Understand technology trends affecting server

designs• Look at the whole system• Think of opportunities for doing neat things• Rinse and repeat…

Copyright (c) 2008, Hewlett Packard.

Page 4: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Use cases: Workloads• Search• Web Tier• Database Servers• Traditional App Servers• Cache Nodes• “HPC” Compute Nodes

• e.g. 2*QC, 2GB/core, 95W SKUs, one PCI-E 8X slot for IB, checkpoint/restore model places medium-high value on node reliability, density useful because of large clusters (& sometimes cable lengths)

Copyright (c) 2008, Hewlett Packard.

Page 5: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Requirements Drive Server Designs• Socket count−2s has a lot of user acceptance, regardless of core

counts− “The herd” likes to move together

• Memory / Socket−Going up with core count for some applications

• Slots− “Glued down” NICs may reduce the need for slots for

many users and many uses• Accelerators a notable exception

• Drives / server, type, RAID or JBOD, internal or external, …

Copyright (c) 2008, Hewlett Packard.

Page 6: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Use Cases: Datacenters• Datacenters with10KW/rack: current “co-lo” sweet spot

− Many co-los still at 5KW− Some newer centers at ~15KW− PUEs ranging from 1.3 – 3.0 (PUE = total watts / watts doing stuff)

• HPC customers often aim much higher− Fully populated blade racks: 32KW+

• 2011− Scenario 1: 50KW+ in a datacenter. Or maybe not!− Scenario 2a: Extremely dense containers. High power density,

water-cooled. (PUE of 1.2 or better). − Scenario 2b: Containers cooled with outside air: less dense, even

better PUE (ref: Intel experiment)

Copyright (c) 2008, Hewlett Packard.PUE Definition: www.thegreengrid.org

Page 7: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Use Cases:Improved RAS Lowered TCO• Horses for courses−Redundant power supplies/fans, hot swap disks, CPUs,

memories, … can make sense for some applications−Other applications want the cheapest node possible,

and are willing to accept node failures and even wrong answers up to a point.

−Trend: reliability implemented in software, with less need for reliable hardware• Example: cloud file services use replication instead of RAID• Hard to do• Impractical for some applications• Only goes so far…

Copyright (c) 2008, Hewlett Packard.

Page 8: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Constrained Differentiation in the Mainstream• Customers reward designs that:−Minimize TCO via ease of management− Integrate technology in a way that increases RAS−Less power, more density−Are delivered TTM with the base technology

• Innovation constrained (but highly efficient) due to the horizontal nature of the business

• Historical perspective: mainstream technology has steamrollered “outliers” with grim efficiency−Being different means always being one mistake away

from oblivionCopyright (c) 2008, Hewlett Packard.

Page 9: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Trends: Things are getting a bit toasty…

PC Cooling Strategy

Air

2006

Page 10: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

And Why Your Power Bill Is Going Up…• Blame your gaming

habits!• Three GPUs @ 225W• 120W CPU−Before overclocking!

• 60W disks + mem• < 80% eff. P/S• > 1000W!

www.ocworkbench.com

Copyright (c) 2008, Hewlett Packard.

Page 11: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

2014 Desktop Heatsink?

Actually a big Van der Graaf generator. P

icture source: Unknow

n.

Page 12: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Harder To Find Efficiencies

Copyright (c) 2008, Hewlett Packard.

5 yrs. ago Today

PUE 2, 3, Higher 1.5 Good1.3 Great

UPS Efficiency(Part of PUE)

94% 99%

Power Supply Efficiency

75% 92%

Fan Power per 2s Node

60+ W 5-10 W

Page 13: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

TCO should be pervasive!• You really want to be able to “pay it forward,” and

select servers with (at least options for) more efficient power supplies, etc.

• Blades are built with TCO (power costs, management costs, etc.) for enterprise apps as their top design goals

• Most customers now buying servers based on 3-year total costs (initial purchase price not the biggest driver).

Page 14: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Buy Better Power Supplies−Configuration option for DL16X: USLP$199 for a power

supply upgrade (~90% eff.)• Compared to a “free” ~70% efficient supply• For 400W load, saves $516 ($199 – $715) over three years

− (Details: $0.10/kwh, $10/W/10yr infrastructure)

Page 15: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

This is one reason why (gratuitous plug: HP’s)

blades make sense for many customers• Engineered for TCO−Very efficient power supplies, fans− (Management also more efficient)

• Redundancy without efficiency compromise−Power supplies run best at full load

• Example: 3+3. Three power supplies providing the load at 100%; extra power supplies only brought on-line when required

−Effective cooling• Ducted fans, “clean sheet of paper” airflow design, baffles, …

−More connections via etch, not cables• 1st level network switching within enclosure

• Typical: ~25% power reduction compared to average 1U servers

Page 16: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Trend:CPUs must not draw more power:Moore’s Law Moore’s Cores

• Moore’s law is about transistor density, not processor speed−This has been blurred up to now

−65nm and smaller processes are forcing the issue, as trying to improve single-threaded performance is:• Unsustainable due to increased power consumption as

frequency goes up• Unmanageable due to the increased human complexity of

designing and validating ever-more-complicated processors

Page 17: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

From An Intel Presentation:

Source: Intel

Page 18: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

The ubiquitous foil about multi-core processors…

2007 2011 2015 etc.

Courtesy: M. McLaren

Page 19: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

It’s Hard To Go Against The Herd• Everyone moving towards higher core counts instead of

faster single-thread performance• Why?

− Putting more than one core on a die:• Bounds human complexity: identical cores are replicated• Stops the frequency rat race: each core can run at a more leisurely speed

− Constant power: ~20% fewer Hz, but two cores

• But will my program run faster?!− Requires rethinking algorithms

• Don’t even think about just using the same binaries!

• Has Computer Architecture Research abandoned us?− One bright spot: transactional memory?

Page 20: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Plotting The Herd• Mainstream Timeline−Dual-core became universal: 2006-7−Quad-core became dominant: 2007-8−6-core now, 8-core is in the future

• With designs for 16 in open discussion

• (Yes, there have been outliers)−Tera™ MTS™, Sun Coolthreads,™ Tilera™, et™ al™

• Each out of balance w.r.t. mainstream apps (e.g. limited or no FP)

−Cray 6600 I/O processors

Page 21: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

But How Will Those FLOPS Be Delivered?• Per-thread performance will remain somewhat

static−Perhaps “simplified cores” will enable core count

acceleration beyond what comes with shrinkage• Perhaps a few “fast” cores for the stubborn threads

−Speculation: We’re all going to get tired of “around 3GHz”

• Floating point units will get a lot more capable−Side effect of GPU arms race

• TF chips need very wiiiiiiiiiiiiiiiide floating point paths

• Core count overwhelms most software−Some niches OK (e.g. DCC, crypto)

Page 22: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Processor Specialization• X86− “Expensive” CPUs for >2s, more memory bandwidth− “Midrange” for 2s− “LV” for dense

• Itanium−RAS++, >>4s

• Others?− Interesting startups (using MIPS or custom cores)

• Watching, but non-x86 ecosystem problematic

−FPGAs, GPGPUs, custom ASICs• Some ready to break into the real world, esp. GPGPUs

Copyright (c) 2008, Hewlett Packard.

Page 23: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

How Are System Ratios Affected?• Takes a little pressure off of memory latency

− Memory used to grow twice as distant (in clocks) every eighteen months• Anybody remember when it was ONE clock?

− Today’s best case: 65ns / (1 / 3GHz) = 195 clocks• Now it just won’t get worse as quickly!

• Doesn’t take pressure off of memory bandwidth− Still needs to track Moore’s Law− Possible to do, but will require innovation at processor memory

interconnect• Don’t forget the I/O!

− IB: SDR DDR QDR ???− Ethernet: 1GigE 10GigE …− PCI-X PCI-E PCI-E Gen 2 …

Page 24: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Prognostications• Single thread performance matters, darn it!

− But x86 core speed will careen around 3GHz through 2007/8− Prediction: compilers will once again have to save the day

• Expect to hear a lot more about this over the next year or two− When will I get to just say “next year”?

• Desktops and Gaming will have a lot to say about our future− They will be forced to adopt parallelism

• Multi-core GPUs!

• Convergence of multi-core techniques and accelerators− “Alternative processors” incorporated onto the die instead of an

array of identical cores

Page 25: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

What Can We Expect?A TeraFLOP socket in 2013• If we assume Moore’s Cores holds, mainstream

x86 will achieve:−100ish GF End-09ish−200GF 1H11−400GF 2H12

• Commercially-inspired path for faster memory and I/O:−Always around 50% of HPTC’s desires−Often more than cloud computing’s needs− (That’s why they’re mainstream parts!)

Page 26: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

But, What If Acceleration Takes Off?• Breaks the “mainstream” rule• Over-provisioned in some dimension−GPUs, FPGAs, custom designs−Changes some rules of today’s server processors

• Expected: 2X to 4X boost of FLOPS/die− Intel’s Larabee, Nvidia CUDA, AMD/ATI CTM− Interesting tradeoff: memory bandwidth++ in exchange

for capacity- -?

Page 27: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Acceleration Giveth…• Keeping up with Moore’s Cores..

− GPGPUs on a faster growth curve• Larabee: convergence of ISAs• Sustained performance, startup overhead, applicability

− Too many “50X” claims− FPGAs work best when

• Numbers are relatively small• Highly parallel dataflow with significant locality of reference

• And taketh away!− Numerics

• 64-bit IEEE floating point, accuracy, …− Reliability at scale

• Remember: large clusters are wonderful particle detectors!

− ProgrammabilityCopyright (c) 2008, Hewlett Packard.

Page 28: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

But What Of The Memory Subsystem?• Memory bandwidth (still)

increasing incrementally over the next few years−Gently frequency bumps

• DDR DDR2 DDR3 etc.−FBD (nothing comes for free)

• Latency++, Power ++• Bandwidth++, pin count - -

−BoBs2002 2004 2006 2008 2010 2012 2014

Log(Memory BW )

Log(Flops)

A breakthrough is needed (optical!), but won’t happen for 5+ years

Page 29: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Chipsets• Trend: with direct memory

attach, chipsets are a LOT less interesting

• PCI-E Gen2− 2.5 5 GT/s, enables QDR IB

• Gen3− 5 8GT/s

• Coherency features− I/O devices can be better

system participants

• The 2008 “refresh” from Intel was quite interesting:− Seaburg/Stoakley: intended for

workstations, delivers 1600FSB and PCI-E Gen2

− San Clemente: low power focus, “accidentally” delivers excellent performance

• These alternatives allow interesting designs

Copyright (c) 2008, Hewlett Packard.

Page 30: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

30 September 2008

Mapping use cases to servers:Increased density, lower power consumption

Type I Mezzanine Slots

One x8 per server(both reside on bottom board)Optional SATA HDDs

Battery resides here when configured for external storage controller

2 CPUs per server 4 DIMM Slots

DDR2 533/667MHz2 x Embedded 1Gb Ethernet

Per server

Server interconnect

Server B

Server A

Internal USB

iLO

One per server

Copyright (c) 2008, Hewlett Packard.

Page 31: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

1,024 core rack (> 12TF)Revenge of the Mainstream!

San Clemente Chipset: lower latency, lower bandwidth, lower power consumption

• ~5300W per 32 nodes at 100% utilization –compared to ~7400W for other blades

• Brings peak FLOPs/watt in alignment with Blue Gene/L, and ~50% of Blue Gene/P and Cell hybrids

• Peak FLOPs are the mainstream’s worst metric

• Luckily, peak FLOPS have very little to do with sustained performance per {watt, dollar, space}

• Focus on Linpack razzle dazzle hides fundamental improvement in x86 technology

• Significantly more memory bandwidth per socket coming soon

Page 32: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Big Clusters Get Knocked Out Quickly

Copyright (c) 2008, Hewlett Packard.

June

August

Page 33: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

30 September 2008

Little clusters, even faster!• Mount into standard racks • Stand-alone pedestal

Copyright (c) 2008, Hewlett Packard.

Page 34: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Rack-Mounted Servers Not Standing Still• But what shape?−Benefits of redundant power supplies or fans?−Efficiency of 1U fans?

• “Horses for Courses”−Memory capacity++−Slots−Storage−FLEXIBILITY− (Opinion: always a place for head nodes, highly

virtualized environments)

Copyright (c) 2008, Hewlett Packard.

Page 35: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Ethernet, Ethernot• First, a word from our users:

− Bandwidth (0.01 – 1.0 B/FLOP) from a node• Various chokes for bisection bandwidth (trees, tori, meshes, …)• What would a TeraFLOP chip require?

− 0GB/s to 1TB/s per socket!− Latency (5 usecs good, 1 usec better) for tightly coupled apps

• E-mail ☺ for others…− Cheeeeep! $/B/F

• Now we can get into the horse race− “Ethernet always wins”− Infiniband has shown remarkable staying power

• Cheaper in $/B/s (NIC + Cable + amortized switch port)

− There’s (maybe) still room for proprietary interconnects• “Sea of Sockets” flexible SMPs• Higher bandwidth

Page 36: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Infiniband• DDR Mainstream (20GigE)• Optical Cable Ecosystem−CX4 E-O-E cables break even at ~10m

• QDR−Rollout time “soon”−CX4 QSFP

• Significant cost benefits over 10GigE−Continuation relies on IB maintaining b/w advantage−… And ethernet switch suppliers keeping port costs

high

• Takeaway: IB Good, but thar be dragons…

Page 37: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Interconnect Problem• Wide range of requirements:−0.001 B/FLOP * 100GF server ~= 1GigE−0.01B/FLOP * 100GF server ~= 10GigE−1.0 B/FLOP * 100GF server ~= 1000GigE!!!

• Rule of thumb: there is no rule of thumb!−Hint: more money at 0.001 than at 1.0

• But bandwidth tracks some combination of storage speed and aggregate CPU performance

−Don’t affect the base platform costs to enable higher bandwidth• Ideal: enable higher bandwidth through options• Not always possible

Page 38: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Optics• It will eventually replace copper

− (A really lame prediction without a date!)

• Copper showing remarkable fight− Current state: optics win for > 10m

• Why will optics (eventually) win?− 10Gb/s “wall” for copper links through etch, connectors, cables− Nasty cables

• (Folks will tolerate nastiness for significant cost savings, though!)− Power consumption− Bandwidth * Capacity for memory

Page 39: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

One of the EOE cables

Copyright (c) 2008, Hewlett Packard.

Page 40: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Petaflops• Congrats to LANL/AMD/IBM for a hybrid PF Top500

machine. Remember, it’s a Linpack machine!• 2012: O(TF) on a package

− Multicore, aggressive floating point− “Fusion”

• Challenges− Ratios (RMS, etc.)

• These guys really need optical!− Software

• Application strategy• O/S: quiet linux or enhanced embedded

30 September 2008 Copyright (c) 2008, Hewlett Packard.

Page 41: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Let’s look at a sustained* PetaFLOP in 2012 (just a thought exercise)

Required Sustained Rate (PF) 1 1 1 1Presumed % of Peak 7% 6% 5% 3%Required Peak Performance (PF) 14 17 20 33

Per Socket (GF) 100 200 340 1000

Number of Sockets Needed 142,858 83,334 58,824 33,334

B/FLOP Memory Bandwidth 0.35 0.18 0.10 0.04Bandwidth Per Socket (GB/s peak) 35 35 35 35

Sockets/SMP 8 8 8 8SMP Count 17,858 10,417 7,353 4,167

Acceleration

* Sustained: on a REAL application!

Page 42: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Where Are The Problems?• 0.04 B/F of memory bandwidth would be a

bottleneck for “today’s applications”• Two responses:−Change the game: go for “emerging” applications

• Intel’s “RMS” (Recognition, Mining and Synthesis)

−Fix the darned memory interface• Lots of folks talking about it• But…

Page 43: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Petaflops are so, so boring…• Exaflops are the new petaflops!• In silicon, an early calculation said O(GW) for just

the cores−How do you get that much power?

• Other challenges:−Transistors not as reliable as one would like−And if you thought programming petaflops was hard…

30 September 2008 Copyright (c) 2008, Hewlett Packard.

Page 44: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

http://a4nr.org/news-and-events/12.30-06-fresnonukeplant

Page 45: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Summary• Figure out who your users are• Figure out where they’re going to put their servers• Figure out the “soft” features they value, e.g. RAS• Understand the technology trends

− Worry about threats to the mainstream, e.g. acceleration

• Worry about the whole system− Interconnects, storage (not talked about today, but SSDs can keep you up at night),

etc.

• Start over

Copyright (c) 2008, Hewlett Packard.

Page 46: Issues in Mainstream Clusters - Stanford University · Trends: Things are getting a bit toasty… PC Cooling Strategy. Air. 2006

Copyright (c) 2008, Hewlett Packard.

Thank you for listening.

Questions you forgot to ask?

richard.kaufmann (at) hp.com