agenda - university of california, berkeleycs61c/fa10/lectures/09...instr7 if& id& alu mem...

17
9/17/10 1 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaHerson hHp://inst.eecs.Berkeley.edu/~cs61c/fa10 1 Fall 2010 Lecture #9 9/17/10 Agenda InstrucTon Stages Revisited Administrivia Technology Break Rise of the WarehouseScale Computer 9/17/10 Fall 2010 Lecture #9 2

Upload: hoangngoc

Post on 21-Mar-2018

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

1  

CS  61C:  Great  Ideas  in  Computer  Architecture  (Machine  Structures)  

Instructors:  Randy  H.  Katz  

David  A.  PaHerson  hHp://inst.eecs.Berkeley.edu/~cs61c/fa10  

1  Fall  2010  -­‐-­‐  Lecture  #9  9/17/10  

Agenda  

•  InstrucTon  Stages  Revisited  •  Administrivia  

•  Technology  Break  •  Rise  of  the  Warehouse-­‐Scale  Computer  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9   2  

Page 2: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

2  

Agenda  

•  InstrucTon  Stages  Revisited  •  Administrivia  

•  Technology  Break  •  Rise  of  the  Warehouse-­‐Scale  Computer  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9   3  

InstrucTon  Level  Parallelism  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #8   4  

P  1  

Instr  1   IF   ID   ALU   MEM   WR  

P  2   P  3   P  4   P  5   P  6   P  7   P  8   P  9   P  10   P  11   P  12  

Instr  2   IF   ID   ALU   MEM   WR  IF   ID   ALU   MEM   WR  Instr  2  

Instr  3   IF   ID   ALU   MEM   WR  

IF   ID   ALU   MEM   WR  Instr  4  

IF   ID   ALU   MEM   WR  Instr  5  

IF   ID   ALU   MEM   WR  Instr  6  

IF   ID   ALU   MEM   WR  Instr  7  

IF   ID   ALU   MEM   WR  Instr  8  

Page 3: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

3  

Conceptual  MIPS  Datapath  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9   5  

Stages  of  the  Datapath  (1/5)  

•  There  is  a  wide  variety  of  MIPS  instrucTons:  so  what  general  steps  do  they  have  in  common?  

•  Stage  1:  Instruc(on  Fetch  – No  maHer  what  the  instrucTon,  the  32-­‐bit  instrucTon  word  must  first  be  fetched  from  memory  (the  cache-­‐memory  hierarchy)  

– Also,  this  is  where  we  Increment  PC    (that  is,  PC  =  PC  +  4,  to  point  to  the  next  instrucTon:  byte  addressing  so  +  4)  

9/17/10   6  Fall  2010  -­‐-­‐  Lecture  #8  

Page 4: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

4  

Stages  of  the  Datapath  (2/5)  

•  Stage  2:  Instruc(on  Decode  – Upon  fetching  the  instrucTon,  we  next  gather  data  from  the  fields  (decode  all  necessary  instrucTon  data)  

– First,  read  the  opcode  to  determine  instrucTon  type  and  field  lengths  

– Second,  read  in  data  from  all  necessary  registers  •  For  add,  read  two  registers  •  For  addi,  read  one  register  •  For  jal,  no  reads  necessary  

9/17/10   7  Fall  2010  -­‐-­‐  Lecture  #8  

Stages  of  the  Datapath  (3/5)  

•  Stage  3:  ALU  (ArithmeTc-­‐Logic  Unit)  – Real  work  of  most  instrucTons  is  done  here:  arithmeTc  (+,  -­‐,  *,  /),  shihing,  logic  (&,  |),  comparisons  (slt)  

– What  about  loads  and  stores?  •  lw      $t0,  40($t1)  •  Address  we  are  accessing  in  memory  =  the  value  in  $t1  PLUS  the  value  40  

•  So  we  do  this  addiTon  in  this  stage  

9/17/10   8  Fall  2010  -­‐-­‐  Lecture  #8  

Page 5: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

5  

Stages  of  the  Datapath  (4/5)  

•  Stage  4:  Memory  Access  – Actually  only  the  load  and  store  instrucTons  do  anything  during  this  phase;  the  others  remain  idle  during  this  phase  or  skip  it  all  together  

– Since  these  instrucTons  have  a  unique  step,  we  need  this  extra  phase  to  account  for  them  

– As  a  result  of  the  cache  system,  this  phase  is  expected  to  be  fast  

9/17/10   9  Fall  2010  -­‐-­‐  Lecture  #8  

Stages  of  the  Datapath  (5/5)  

•  Stage  5:  Register  Write  – Most  instrucTons  write  the  result  of  some  computaTon  into  a  register  

– E.g.,:  arithmeTc,  logical,  shihs,  loads,  slt  

– What  about  stores,  branches,  jumps?  •  Don’t  write  anything  into  a  register  at  the  end  •  These  remain  idle  during  this  fihh  phase  or  skip  it  all  together  

9/17/10   10  Fall  2010  -­‐-­‐  Lecture  #8  

Page 6: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

6  

Limits  to  Performance:  Latency  vs.  Bandwidth  

•  Latency:  the  Tme  to  access  first  item  

•  Bandwidth:  #  of  items  accessed  per  unit  Tme  

•  Historically,  bandwidth  has    improved  much  faster  than  latency  

•  Why?  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #8   11  

Bandwidth  improves    faster  than  latency    

Latency  improves    faster  than  bandwidth  

Latency  vs.  Bandwidth:  Physical  Analogy  

•  Time  to  first  drop  

•  Time  to  fill  glass  •  Water  per  Tme  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #8   12  

Water  Tank  

Glass  Glass   Glass   Length  and  diameter  of  pipes  affects  

latency  and  bandwidth  

Page 7: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

7  

Latency  vs.  Bandwidth:  Which  is  “Faster”?  

•  SD    SF,  1  Truck,  10  hours,    1000  by  1  TByte  Disks  (1  PByte)  – Time  to  first  byte:  – Time  to  last  byte:  – Bandwidth:  

•  SDSF,  100  gbps  fiber  link  (10  GB  per  sec)  – Time  to  first  byte:  – Time  to  last  byte:  – Bandwidth:    

9/17/10   Fall  2010  -­‐-­‐  Lecture  #8   13  

Latency  vs.  Bandwidth:  Which  is  “Faster”?  

•  SD    SF,  1  Truck,  10  hours,    1000  by  1  TByte  Disks  (1  PByte)  –  Time  to  first  byte:  10  hours  –  Time  to  last  byte:  10  hours  

–  Bandwidth:  100  TBytes/hr  (222  Gbps)  •  SDSF,  100  Gbps  fiber  link  (10  GB  per  sec)  

–  Time  to  first  byte:  2.6  ms  (speed  of  light  @  500  mi!)  –  Time  to  last  byte:  28  hours  –  Bandwidth:  10  GB/s  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #8   14  

Page 8: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

8  

Agenda  

•  Stages  of  an  InstrucTon  Revisited  •  Administrivia  

•  Technology  Break  •  Rise  of  the  Warehouse-­‐Scale  Computer  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9   15  

Administrivia  

•  Due  dates  for  Project  2/First  Part  (Saturday,  18  September)  and  Project  2/Second  Part  (Saturday,  25  September)  are  @23:59:59  

•  Midterm  ExaminaTon,  6  October,  6-­‐9  PM,  1  Pimentel  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9   16  

Page 9: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

9  

Agenda  

•  Stages  of  an  InstrucTon  Revisited  •  Administrivia  

•  Technology  Break  •  Rise  of  Warehouse-­‐Scale  Computers  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9   17  

Agenda  

•  Stages  of  an  InstrucTon  Revisited  •  Administrivia  

•  Technology  Break  •  Rise  of  Warehouse-­‐Scale  Computers  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9   18  

Page 10: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

10  

Growth  in  Access  Devices  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9   19  

The  ARM  Inside  the  iPhone  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #8   20  

Page 11: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

11  

iPhone  Innards  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #8   21  

1  GHz  ARM  Cortex  A8  

E.g.,  Google’s  Oregon  Datacenter  

23  9/17/10   Fall  2010  -­‐-­‐  Lecture  #9  

Page 12: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

12  

Energy  ProporTonal  CompuTng  

24

Figure  1.  Average  CPU  uTlizaTon  of  more  than  5,000  servers  during  a  six-­‐month  period.  Servers    are  rarely  completely  idle  and  seldom  operate  near  their  maximum  uTlizaTon,  instead  operaTng    most  of  the  Tme  at  between  10  and  50  percent  of  their  maximum  

It  is  surprisingly  hard  to  achieve  high  levels  of  uTlizaTon  of  typical    servers  (and  your  home  PC  or  laptop  is  even    worse)  

“The  Case  for    Energy-­‐ProporTonal    CompuTng,”  Luiz  André  Barroso,  Urs  Hölzle,  IEEE  Computer  December  2007    

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9  

Energy  ProporTonal  CompuTng  

25

Figure  2.  Server  power  usage  and  energy  efficiency  at  varying  uTlizaTon  levels,  from  idle  to    peak  performance.  Even  an  energy-­‐efficient  server  sTll  consumes  about  half  its  full  power  when  doing  virtually  no  work.  

“The  Case  for    Energy-­‐ProporTonal    CompuTng,”  Luiz  André  Barroso,  Urs  Hölzle,  IEEE  Computer  December  2007     Doing  nothing  well  …  

NOT!  

Energy  Efficiency  =  UTlizaTon/Power  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9  

Page 13: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

13  

Energy  ProporTonal  CompuTng  

26 Figure  3.  CPU  contribuTon  to  total  server  power  for  two  generaTons  of  Google  servers    at  peak  performance  (the  first  two  bars)  and  for  the  later  generaTon  at  idle  (the  rightmost  bar).  

“The  Case  for    Energy-­‐ProporTonal    CompuTng,”  Luiz  André  Barroso,  Urs  Hölzle,  IEEE  Computer  December  2007    

CPU  energy  improves,  but  what  about  the  rest  of  the  server  architecture?  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9  

Energy  ProporTonal  CompuTng  

27

Figure  4.  Power  usage  and  energy  efficiency  in  a  more  energy-­‐proporTonal  server.  This    server  has  a  power  efficiency  of  more  than  80  percent  of  its  peak  value  for  uTlizaTons  of    30  percent  and  above,  with  efficiency  remaining  above  50  percent  for  uTlizaTon  levels  as    low  as  10  percent.  

“The  Case  for    Energy-­‐ProporTonal    CompuTng,”  Luiz  André  Barroso,  Urs  Hölzle,  IEEE  Computer  December  2007    

Design  for    wide  dynamic    power  range  and    ac(ve  low  power  modes  

Doing  nothing    VERY  well  

Energy  Efficiency  =  UTlizaTon/Power  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9  

Page 14: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

14  

Energy  Use  In  Datacenters  

LBNL  Michael  PaHerson,  Intel  

Datacenter  Energy  Overheads  

9/17/10   28  Fall  2010  -­‐-­‐  Lecture  #9  

31

Datacenter  Power  

Peak  Power  %  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9  

Page 15: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

15  

32

Nameplate  vs.  Actual  Peak  

X.  Fan,  W-­‐D  Weber,  L.  Barroso,  “Power  Provisioning  for  a    Warehouse-­‐sized  Computer,”  ISCA’07,  San  Diego,  (June  2007).  

Component  CPU  Memory  Disk  PCI  Slots  Mother  Board  Fan  System  Total  

Peak  Power  40  W  9  W  12  W  25  W  25  W  10  W  

Count  2  4  1  2  1  1  

Total  80  W  36  W  12  W  50  W  25  W  10  W  213  W  

Nameplate  peak  145  W  Measured  Peak  

(Power-­‐intensive  workload)  In  Google’s  world,  for  given  DC  power  budget,  deploy  as  many  machines  as  possible  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9  

Server  Innards  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #8   33  

Page 16: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

16  

Server  Internals  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #8   34  

Server  Internals  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #8   35  

Google  Server  

Page 17: Agenda - University of California, Berkeleycs61c/fa10/lectures/09...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/17/10 3 Conceptual&MIPS&Datapath& 9/17/10 Fall&2010&PP&Lecture&#9&

9/17/10  

17  

Summary  

•  Five  Stages/Phases  of  an  InstrucTon  –  InstrucTon  Fetch  (IF)  –  InstrucTon  Decode  (ID)  –  Execute  (ALU)  –  Memory  (MEM)  –  Write  Results  (WR)  

•  Bandwidth  vs.  Latency  –  Easier  to  increase  bandwidth  than  reduce  latency  

•  Rise  of  the  Warehouse-­‐Scale  Computer  –  Energy  ProporTonal  CompuTng  

•  Power  (WaHs),  Energy  (Power  x  Time,  WaH-­‐hours)  •  Subject  to  responsiveness  goals,  drive  nodes  to  higher  uTlizaTon  to  achieve  beHer  energy  efficiency  

9/17/10   Fall  2010  -­‐-­‐  Lecture  #9   39