1 from 4-bit micros to multi-cores: a brief history, future challenges, and how ces can prepare for...

36
1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University of Utah CSR-SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis (the ``Gauss'' project) -TJ-1318 (SRC/Intel Customization), Scaling Formal Methods Towards Hierarchical Protoc Shared Memory Processors (the ``MPV'' project) osoft HPC Innovation Center, ``Formal Analysis and Code Generation Support for MPI''

Upload: loraine-york

Post on 28-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

1

From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them

Ganesh Gopalakrishnan,School of Computing,University of Utah

• NSF CSR-SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis (the ``Gauss'' project) • 2005-TJ-1318 (SRC/Intel Customization), Scaling Formal Methods Towards Hierarchical Protocols in Shared Memory Processors (the ``MPV'' project) • Microsoft HPC Innovation Center, ``Formal Analysis and Code Generation Support for MPI''

Page 2: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

2

Microprocessors are everywhere!

Page 3: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

3

“Desktops” turn into supercomputers 8 die x 2 CPUs x 2-way execution = 32-way shared memory machine!

Page 4: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

4

Supercomputers have become fundamentaltools that underlie all of engineering

(BlueGene/L - Image courtesy of IBM / LLNL) (Image courtesy of Steve Parker, CSAFE, Utah)

Page 5: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

5

From Simulation to Flight:Virtual Roll-out of Boeing 787 “Dreamliner”

Entire Airplane beingDesigned and Flown inside a Computer (Simulation Program).

The first plane to fly isthe real one (not a mockup model).

(Photo courtesy of Boeing.)

Page 6: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

6

This Talk Some history

How the micro came about Past predictions

The future Multicores Hardware Challenges Programming them How I am trying to help (my own research)

General awareness International matters Tips to survive, … and to excel

Page 7: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

7

The birth of the micro

Intel’s 4004 and TI’s TMS-1000 were the first 4004 – with cover removed (L) and on (R)

Patent awarded to TI ! Intel made single-chip computer for Datapoint Marketed it as 8008 when Datapoint did not use the

design

Page 8: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

8

Revolution of the 70s and 80s Intel : 4004, 4040, 8008, 8080, 8085, 8086,

80186, 80286, 80386, 80486, Pentium, PPro, … now “X86” (also Itanium)

Motorola: 6800, 6810, 6820, 68000, 68010, 68020, … then PowerPC (collab with IBM)

Other companies Burst of activity – EVERY student wanted to

build an embedded computer out of a micro in the 70s and 80s.

Page 9: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

9

The micro killed the mini

It became amply clear in the 80s that it was going to replace “mainframes” casual experiments conducted between Sun-2 (68020)

versus Digital’s VAX 11/750 and 780

The birth of the IBM PC around 1980 started things going mu-P’s way!

With the masses having a PC each, the Internet could be meaningfully reborn!

Page 10: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

10

… and is in every supercomputer John Hennessy’s prediction during SC’97: (

http://news-service.stanford.edu/news/1997/november19/supercomp1119.html

John Hennessy: “Today’s microprocessor chipping away at supercomputer market” Traditionally designed supercomputers will vanish

within a decade – it has! Clusters of them fill vast rooms now!

Page 11: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

11

IBM ASCI White Machine

Released in 2000-- Peak Performance : 12.3 teraflops. -- Processors used : IBM RS6000 SP Power3's - 375 MHz. -- There are 8,192 of these processors -- The total amount of RAM is 6Tb. -- Two hundred cabinets - area of two basket ball courts.

Page 12: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

12

IBM BlueGene/L

The first machine in the family, Blue Gene/L, is expected to operateat a peak performance of about 360 teraflops (360 trillion operations per second), and occupy 64 racks -- taking up only about the same space as half of a tennis court. Researchers at the Lawrence Livermore National Laboratory (LLNL) plan to use Blue Gene/L to simulate physical phenomena that require computational capability much greater than presently available, such as cosmology and the behavior of stellar binary pairs, laser-plasma interactions, and the behavior and aging of high explosives.

Page 13: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

13

Now it’s the era of Multi-cores: e.g., Sun Niagara processor 8 CPU cores (80 cores demoed by Intel already…)

Page 14: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

14

Energy advantages of multicores

Putting two simple CPUs achieves 80% performance per cpu with only 50% of the power per CPU chip as a whole gives 1.6x performance for same power PROVIDED we can keep the cores busy

Simple way to keep ‘em busy Virus-checker in background while user computes Photoshop in one and Windows on another

More complex ways to keep multiple cores busy are being investigated

Page 15: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

15

So what are the design issues?Lots! Here is a small subset:

Complex cache coherence protocols !

Silicon debugging is becoming a headache !

Programming apps is becoming hard !

The “Digital Divide”

Page 16: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

16

1. Dual and Quad-cores are the norm these days. Their caches are visibly central

(photo courtesy of

Intel Corporation.)

> 80% of chipsshipped will bemulti-core

Page 17: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

17

What is cache coherence?

Illusion of global shared memory is preferred Need mechanisms to keep caches consistent

Every read must fetch the data written by the latest write

P1 P2

read(a) write(a,1)

… ….

read(a) write(a,2)

Page 18: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

18

What is cache coherence?

Illusion of global shared memory is preferred Need mechanisms to keep caches consistent

Every read must fetch the data written by the latest write

P1 P2

read(a,2) write(a,1)

… ….

read(a,1) write(a,2)

With a coherent cache, theindicated outcome is not allowed

Page 19: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

19

What is cache coherence?

Illusion of global shared memory is preferred Need mechanisms to keep caches consistent

Every read must fetch the data written by the latest write

P1 P2

read(a,2) write(a,1)

… ….

read(a,2) write(a,2)

But this outcome is allowed

Page 20: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

20

Cache Coherence Protocol VerificationMy “MPV” research project develops techniques to ensure

that cache coherence protocols are correct

We use an approach called Model Checking

We control the complexity of model checking thru the Assume / Guaranteeapproach

dir dir

Chip-level protocols

Inter-cluster protocols

Intra-cluster protocols

mem mem

Page 21: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

21

A caching hierarchy such as this is too hard to verify

RAC

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

Global Dir

MainMemory

Home ClusterRemote Cluster 1

Remote Cluster 2

RAC

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

RAC

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

Page 22: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

22

So we create several “mutually supporting” abstractions

RAC

L2 Cache+Local Dir’

Global Dir

MainMemory

Home Cluster

Remote Cluster 1

Remote Cluster 2

RAC

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

RAC

L2 Cache+Local Dir’

Page 23: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

23

Abstracted Protocol #2

RAC

L2 Cache+Local Dir’

Global Dir

MainMemory

Home Cluster

Remote Cluster 1

Remote Cluster 2

RAC

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

RAC

L2 Cache+Local Dir’

Page 24: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

24

• On-chip instrumentation is one way to “see” what is inside

• One must put in several built-in test circuits

• One must design with the option of bypassing new features

cpu cpu cpu cpu

Invisible“miss” traffic

Visible“miss” traffic

Problem 2: Silicon Debugging: Can’t see “inside” CPUs without paying a huge price

Page 25: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

25

3: Programming Apps is hard! e.g. threadsThread and process interactions need to coordinate

Otherwise something analogous to this will happen !

Teller 1 Teller 2

Read bank balance ($100) Read bank balance ($100)

Add $10 on scratch paper ($110) Subtract $10 on scratch paper ($90)Enter $110 into account Enter $90 into account

USER LEFT WITH $90 – NOT WITH $100 !!

Page 26: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

26

Programming Msg. Passing Supercomputers can be quite trickyMy “Gauss” project (in collaboration with Robert M. Kirby) ensuresthat supercomputer programs do not contain bugs, and also perform efficiently

Virtually all supercomputers are programmed using the “MPI” communication library

Mis-using this library can often result in bugs that show up only after porting

P1

MPI_SEND(to P2, Msg)

MPI_RECV(from P2, Msg)

P2

MPI_SEND(to P1, Msg)

MPI_RECV(from P1, Msg)

If the system does not provide sufficient buffering, the sends may both block, thus causing a deadlock !

Page 27: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

27

Simulation code that does automatic load balancing isdifficult to write and debug

(Photo courtesy NHTSA)

Page 28: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

28

LOTS of hard problems remain open How to provide memory bandwidth?

Put multicore CPU chip on top of highly dense DRAM chip (e.g. 8 GB)

Most users will buy just “one of those” Others will buy SDRAM module add-ons

Slow access for now Optical interconnect is an active research area

Higher memory bandwidth solutions coming So the real challenge remains programming!

Insights from recent Microsoft visit

Page 29: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

29

Emerging Programming Paradigms Microsoft’s Task Pallallel Library Intel’s Thread Building Blocks OpenMP, Cluster OpenMP, Cuda, Cilk Transaction memories Special purpose paradigms

LINQ and PLINQ for Relational Databases Game Programming: roll customized solutions

Page 30: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

30

Emerging Programming Paradigms Transaction Memories!

Users cause too many bugs when programming using locks

Transaction memories allow shared memory threads to “watch” each others read/write actions

Conflicting accesses can rollback and retry

Page 31: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

31

Problem 4: Huge! The “digital divide” Need plenty of Outreach Better CE / CS projects in SLVSEF

Mentoring

Page 32: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

32

Learn from History – Learn Computer History If you want to understand today, you have to

search yesterday.  ~Pearl Buck

Things are changing SO fast that basic principles are often being diluted

Get excited by studying computer history and seeing how much better off we are (also be chagrined by all the lost opportunity!)

Page 33: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

33

Where to learn computer history? Computer History Museum, Mountain View

Intel Museum, Santa Clara

Boston Computer Museum

Many in the UK (Manchester, London, …)

Travel widely – be inspired by what you see!

Page 34: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

34

It is important to understand the International Scene Lessons from MSR India

Amazing talent-pool Relatively high availability of talent

Lessons from Intel India Talent-pool still lacks depth and abilities of many

of our CEs We can stay competitive in hardware for a LONG

time to come Apply for international internships!

Page 35: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

35

Gradual loss of manufacturing death Lots of manufacturing happening outside the

US Fear not – CE / CS jobs are still on the rise

Huge demand forecast within the US

THE REAL DANGER Loss of manufacturing kills pride and incentive to

learn – we don’t want that in CE

Page 36: 1 From 4-bit micros to Multi-cores: A brief history, Future Challenges, and how CEs can prepare for them Ganesh Gopalakrishnan, School of Computing, University

36

Recipe for success

The best ideas don’t always work Wait for the world to be ready for the ideas The devil is in the detail Too much established momentum Decide goal (short-term impact vs. long-term)

Quiet tenacity Tenacity without ruffling feathers needlessly Work hard! work smart! learn theory! be a champion

algorithm / program designer! learn advanced hardware design!

Learn to write extremely clearly and precisely! Learn to give inspiring talks! (be inspired first!)