lecture 01: introduc/on - github pageslecture 01: introduc/on cse 564 computer architecture summer...

Lecture01:Introduc/on

CSE564ComputerArchitectureSummer2017

DepartmentofComputerScienceandEngineeringYonghongYan

[email protected]/~yan

1

CopyrightandAcknowledgement•  Mostslideswereadaptedfromlecturesnotesofthetwotextbookswithcopyrightofpublisherortheoriginal

authorsincludingElsevierInc,MorganKaufmann,DavidA.PaIersonandJohnL.Hennessy.•  Someslideswereadaptedfromthefollowingcourses:

–  UCBerkeleycourse“ComputerScience252:GraduateComputerArchitecture”ofDavidE.CullerCopyright2005UCB•  hIp://people.eecs.berkeley.edu/~culler/courses/cs252-s05/

–  GreatIdeasinComputerArchitecture(MachineStructures)byRandyKatzandBernhardBoser•  hIp://inst.eecs.berkeley.edu/~cs61c/fa16/

•  Ialsorefertothefollowingcoursesandlecturenoteswhenpreparingmaterialsforthiscourse–  ComputerScience152:ComputerArchitectureandEngineering,Spring2016byDr.GeorgeMichelogiannakisfrom

UCBerkeley•  hIp://www-inst.eecs.berkeley.edu/~cs152/sp16/

–  ComputerScience252:GraduateComputerArchitecture,Fall2015byProf.KrsteAsanovićfromUCBerkeley•  hIp://www-inst.eecs.berkeley.edu/~cs252/fa15/

–  ComputerScienceS250:VLSISystemsDesign,Spring2016byProf.JohnWawrzynekfromUCBerkeley•  hIp://www-inst.eecs.berkeley.edu/~cs250/sp16/

–  ComputerSystemArchitecture,Fall2005byDr.JoelEmerandProf.ArvindfromMIT•  hIp://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-823-computer-system-architecture-

fall-2005/–  SynthesisLecturesonComputerArchitecture

•  hIp://www.morganclaypool.com/toc/cac/1/1

•  Theusesoftheslidesofthiscourseareforeduca/onalpurposesonlyandshouldbeusedonlyinconjunc/onwiththetextbook.Deriva/vesoftheslidesmustacknowledgethecopyrightno/cesofthisandtheoriginals.Permissionforcommercialpurposesshouldbeobtainedfromtheoriginalcopyrightholderandthesuccessivecopyrightholdersincludingmyself. 2

Contents

•  Computersandcomputercomponents•  Computerarchitecturesandgreatideasinhistoryandnow•  Performance

3

TheComputerRevolu/on

•  Progressincomputertechnology–  UnderpinnedbyMoore’sLaw

•  Makesnovelapplicaconsfeasible–  Computersinautomobiles–  Cellphones–  Humangenomeproject–  WorldWideWeb–  SearchEngines

•  Computersarepervasive

4

ClassesofComputers

•  PersonalMobileDevice(PMD)–  e.g.smartphones,tabletcomputers–  Emphasisonenergyefficiencyandreal-cme

•  DesktopCompucng–  Emphasisonprice-performance

•  Servers–  Emphasisonavailability,scalability,throughput

•  Clusters/WarehouseScaleComputers–  Usedfor“SogwareasaService(SaaS)”–  Emphasisonavailabilityandprice-performance–  Sub-class:Supercomputers,emphasis:floacng-pointperformanceand

fastinternalnetworks

•  EmbeddedComputers–  Emphasis:price

5

ThePostPCEra

6

ThePostPCEra

•  PersonalMobileDevice(PMD)–  BaIeryoperated–  ConnectstotheInternet–  Hundredsofdollars–  Smartphones,tablets,electronicglasses

•  Cloudcompucng–  WarehouseScaleComputers(WSC)–  SogwareasaService(SaaS)–  PorconofsogwarerunonaPMDandaporconruninthe

Cloud–  AmazonandGoogle

7

OldSchoolComputer

8

NewSchoolComputer(#1)

PersonalMobileDevices

99

NewSchool“Computer”(#2)

1010

ComponentsofaComputer

•  Samecomponentsforallkindsofcomputer–  Desktop,server,

embedded

•  Input/outputincludes–  User-interfacedevices

•  Display,keyboard,mouse–  Storagedevices

•  Harddisk,CD/DVD,flash–  Networkadapters

•  Forcommunicacngwithothercomputers

TheBIGPicture

InsidetheProcessor(CPU)

•  Funcconalunits:performscomputacons•  Datapath:performsoperaconsondata•  Control:sequencesdatapath,memory,...•  Cachememory

–  SmallfastSRAMmemoryforimmediateaccesstodata

Apple A5

12

ASafePlaceforData

•  Volaclemainmemory–  Losesinstrucconsanddatawhenpoweroff

•  Non-volaclesecondarymemory–  Magneccdisk–  Flashmemory–  Opccaldisk(CDROM,DVD)

Contents

•  Computersandcomputercomponents•  Computerarchitecturesandgreatideasinhistoryandnow

•  Performance

14

Whatis“ComputerArchitecture”?

15

Applica'ons

InstrucconSetArchitecture

Compiler

OperacngSystem

Firmware

I/OsystemInstr.SetProc.

DigitalDesignCircuitDesign

Datapath&Control

Layout&fabSemiconductorMaterials

16

TheInstruc/onSet:aCri/calInterface

instrucconset

sogware

hardware

•  Propercesofagoodabstraccon–  Laststhroughmanygeneracons(portability)–  Usedinmanydifferentways(generality)–  Providesconvenientfuncconalitytohigherlevels–  Permitsanefficientimplementaconatlowerlevels

17

ElementsofanISA

•  Setofmachine-recognizeddatatypes–  bytes,words,integers,floacngpoint,strings,...

•  Operaconsperformedonthosedatatypes–  Add,sub,mul,div,xor,move,….

•  Programmablestorage–  regs,PC,memory

•  Methodsofidencfyingandobtainingdatareferencedbyinstruccons(addressingmodes)–  Literal,reg.,absolute,relacve,reg+offset,…

•  Format(encoding)oftheinstruccons–  Opcode,operandfields,…

ComputerArchitectureHowthingsareputtogetherindesignandimplementa/on

• Capabilices&PerformanceCharacterisccsofPrincipalFuncconalUnits

– (e.g.,Registers,ALU,Shigers,LogicUnits,...)• Waysinwhichthesecomponentsareinterconnected• Informaconflowsbetweencomponents• Logicandmeansbywhichsuchinformaconflowiscontrolled.

• ChoreographyofFUstorealizetheISA

18

Great Ideas in Computer Architectures

1.  Design for Moore’s Law

2.  Use abstraction to simplify design

3.  Make the common case fast

4.  Performance via parallelism

5.  Performance via pipelining

6.  Performance via prediction

7.  Hierarchy of memories

8.  Dependability via redundancy

19

GreatIdea:“Moore’sLaw”

GordonMoore,FounderofIntel•  1965:sincetheintegratedcircuitwasinvented,thenumberof

transistors/inch2inthesecircuitsroughlydoubledeveryyear;thistrendwouldconcnuefortheforeseeablefuture

•  1975:revised-circuitcomplexitydoubleseverytwoyears

20Imagecredit:Intel

MicroprocessorTransistorCounts1971-2011&Moore'sLaw

21

hcps://en.wikipedia.org/wiki/Transistor_count

Moore’sLawtrends•  Moretransistors=↑opportunicesforexploicngparallelisminthe

instrucconlevel(ILP)–  Pipeline,superscalar,VLIW(VeryLongInstrucconWord),SIMD(Single

InstrucconMulcpleData)orvector,speculacon,branchprediccon•  Generalpathofscaling

–  Widerinstrucconissue,longerpiepline–  Morespeculacon–  Moreandlargerregistersandcache

•  Increasingcircuitdensity~=increasingfrequency~=increasingperformance

•  Transparenttousers–  AneasyjobofgevngbeIerperformance:buyingfasterprocessors(higher

frequency)

•  Wehaveenjoyedthisfreelunchforseveraldecades,however(TBD)…

22

GreatIdea:PipelineFundamentalExecu/onCycle

23

Instruc'onFetch

Instruc'onDecode

OperandFetch

Execute

ResultStore

NextInstruc'on

Obtaininstrucconfromprogramstorage

Determinerequiredacconsandinstrucconsize

Locateandobtainoperanddata

Computeresultvalueorstatus

Depositresultsinstorageforlateruse

Determinesuccessorinstruccon

Processor

regs

F.U.s

Memory

program

Data

vonNeumanboIleneck

PipelinedInstruc/onExecu/on

24

I n s t r. O r d e r

Time (clock cycles)

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5

GreatIdea:Abstrac/on(LevelsofRepresenta/on/Interpreta/on)

lw $t0,0($2)lw $t1,4($2)sw $t1,0($2)sw $t0,4($2)

HighLevelLanguageProgram(e.g.,C)

AssemblyLanguageProgram(e.g.,MIPS)

MachineLanguageProgram(MIPS)

HardwareArchitectureDescrip/on(e.g.,blockdiagrams)

Compiler

Assembler

MachineInterpreta4on

temp=v[k];v[k]=v[k+1];v[k+1]=temp;

0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 !

LogicCircuitDescrip/on(CircuitSchema/cDiagrams)

ArchitectureImplementa4on

Anythingcanberepresentedasanumber,

i.e.,dataorinstruccons

25

TheMemoryAbstrac/on

•  Associaconof<name,value>pairs–  typicallynamedasbyteaddresses–  ogenvaluesalignedonmulcplesofsize

•  SequenceofReadsandWrites•  Writebindsavaluetoanaddress•  ReadofaddrreturnsmostrecentlywriIenvalueboundtothataddress

26

address (name) command (R/W)

data (W)

data (R)

done

Processor-DRAMMemoryGap(latency)

27

µProc 60%/yr. (2X/1.5yr)

DRAM 9%/yr. (2X/10 yrs)

1!

10!

100!

1000!19

80!

1981

!

1983

!19

84!

1985

!19

86!

1987

!19

88!

1989

!19

90!

1991

!19

92!

1993

!19

94!

1995

!19

96!

1997

!19

98!

1999

!20

00!

DRAM

CPU!19

82!

Processor-Memory Performance Gap: (grows 50% / year)

Perf

orm

ance

Time

“Joy’s Law”

ThePrincipleofLocality

•  ThePrincipleofLocality:–  Programaccessarelacvelysmallporconoftheaddressspace

atanyinstantofcme.•  TwoDifferentTypesofLocality:

–  TemporalLocality(LocalityinTime):Ifanitemisreferenced,itwilltendtobereferencedagainsoon(e.g.,loops,reuse)

–  SpacalLocality(LocalityinSpace):Ifanitemisreferenced,itemswhoseaddressesareclosebytendtobereferencedsoon(e.g.,straightlinecode,arrayaccess)

•  Last30years,HWreliedonlocalityforspeed

28

P MEM$

Greatidea:MemoryHierarchyLevelsoftheMemoryHierarchy

29

CPU Registers 100s Bytes << 1s ns

Cache 10s-100s K Bytes ~1 ns $1s/ MByte

Main Memory M Bytes 100ns- 300ns $< 1/ MByte

Disk 10s G Bytes, 10 ms (10,000,000 ns) $0.001/ MByte

Capacity Access Time Cost

Tape infinite sec-min $0.0014/ MByte

Registers

Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

Staging Xfer Unit

prog./compiler 1-8 bytes

cache cntl 8-128 bytes

OS 512-4K bytes

user/operator Mbytes

Upper Level

Lower Level

faster

Larger

JimGray’sStorageLatencyAnalogy:HowFarAwayistheData?

30Registers

On Chip Cache On Board Cache

Main Memory

Disk

1

2 10

100

Tape /Optical Robot

10 9

10 6

Lansing

This Campus

This Room My Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 Years

Andromeda

(ns)

JimGrayTuringAwardB.S.Cal1966Ph.D.Cal1969!

TheCacheDesignSpace

•  Severalinteraccngdimensions–  cachesize–  blocksize–  associacvity–  replacementpolicy–  write-throughvswrite-back

•  Theopcmalchoiceisacompromise–  dependsonaccesscharacterisccs

•  workload•  use(I-cache,D-cache,TLB)

–  dependsontechnology/cost•  Simplicityogenwins

31

Associativity

Cache Size

Block Size

Bad

Good Less More

Factor A Factor B

GreatIdea:Parallelism

32

DefiningComputerArchitecture

•  “Old”viewofcomputerarchitecture:–  InstrucconSetArchitecture(ISA)design–  i.e.decisionsregarding:

•  registers,memoryaddressing,addressingmodes,instrucconoperands,availableoperacons,controlflowinstruccons,instrucconencoding

•  “Real”computerarchitecture:–  Specificrequirementsofthetargetmachine–  Designtomaximizeperformancewithinconstraints:cost,

power,andavailability–  IncludesISA,microarchitecture,hardware

33

ComputerArchitectureTopics

34

Instruction Set Architecture

Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation, Vector, Dynamic Compilation

Addressing, Protection, Exception Handling

L1 Cache

L2/L3 Cache

DRAM

Disks, WORM, Tape

Coherence, Bandwidth, Latency

Emerging Technologies Interleaving Bus protocols

RAID

VLSI

Input/Output and Storage

Memory Hierarchy

Pipelining and Instruction Level Parallelism

Network Communication

Oth

er P

roce

ssor

s

WhyisArchitectureExci/ngToday?

35

CPUClockS

peed+15%

/year

CPUSpeedFlat

Problemsoftradi/onalILPscaling

•  Fundamentalcircuitlimitacons1–  delays⇑asissuequeues⇑andmulc-portregisterfiles⇑–  increasingdelayslimitperformancereturnsfromwiderissue

•  Limitedamountofinstruccon-levelparallelism1

–  inefficientforcodeswithdifficult-to-predictbranches

•  Powerandheatstallclockfrequencies

36

[1]Thecaseforasingle-chipmulcprocessor,K.Olukotun,B.Nayfeh,L.Hammond,K.Wilson,andK.Chang,ASPLOS-VII,1996.

ILPimpacts

37

Simula/onsof8-issueSuperscalar

38

Power/heatdensitylimitsfrequency

39

•  Somefundamentalphysicallimitsarebeingreached

Wewillhavethis…

40

41

Revolu/onishappeningnow•  Chipdensityis

concnuingincrease~2xevery2years–  Clockspeedisnot–  Numberofprocessor

coresmaydoubleinstead

•  ThereisliIleornohiddenparallelism(ILP)tobefound

•  Parallelismmustbeexposedtoandmanagedbysogware–  Nofreelunch

Source:Intel,Microsog(SuIer)andStanford(Olukotun,Hammond)

SingleProcessorPerformance

RISC

Move to multi-processor

42

IBMBG/L

ASCIWhite Pacific

EDSAC1 UNIVAC1

IBM7090 CDC6600

IBM360/195 CDC7600

Cray1

CrayX-MP Cray2

TMCCM-2

TMCCM-5 CrayT3D

ASCIRed

1950 1960 1970 1980 1990 2000 2010

1KFlop/s

1MFlop/s

1GFlop/s

1TFlop/s

1PFlop/s

Scalar

Super Scalar

Parallel

Vector

1941 1 (Floating Point operations / second, Flop/s) 1945 100 1949 1,000 (1 KiloFlop/s, KFlop/s) 1951 10,000 1961 100,000 1964 1,000,000 (1 MegaFlop/s, MFlop/s) 1968 10,000,000 1975 100,000,000 1987 1,000,000,000 (1 GigaFlop/s, GFlop/s) 1992 10,000,000,000 1993 100,000,000,000 1997 1,000,000,000,000 (1 TeraFlop/s, TFlop/s) 2000 10,000,000,000,000 2005 131,000,000,000,000 (131 Tflop/s)

Super Scalar/Vector/Parallel

(103)

(106)

(109)

(1012)

(1015)

2XTransistors/ChipEvery1.5Years

Thetrends

43

Recentmul/coreprocessors

44

RecentmanycoreGPUprocessors

45

��

An�Overview�of�the�GK110�Kepler�Architecture�Kepler�GK110�was�built�first�and�foremost�for�Tesla,�and�its�goal�was�to�be�the�highest�performing�parallel�computing�microprocessor�in�the�world.�GK110�not�only�greatly�exceeds�the�raw�compute�horsepower�delivered�by�Fermi,�but�it�does�so�efficiently,�consuming�significantly�less�power�and�generating�much�less�heat�output.��

A�full�Kepler�GK110�implementation�includes�15�SMX�units�and�six�64�bit�memory�controllers.��Different�products�will�use�different�configurations�of�GK110.��For�example,�some�products�may�deploy�13�or�14�SMXs.��

Key�features�of�the�architecture�that�will�be�discussed�below�in�more�depth�include:�

� The�new�SMX�processor�architecture�� An�enhanced�memory�subsystem,�offering�additional�caching�capabilities,�more�bandwidth�at�

each�level�of�the�hierarchy,�and�a�fully�redesigned�and�substantially�faster�DRAM�I/O�implementation.�

� Hardware�support�throughout�the�design�to�enable�new�programming�model�capabilities�

�

Kepler�GK110�Full�chip�block�diagram�

��

Streaming�Multiprocessor�(SMX)�Architecture�

Kepler�GK110)s�new�SMX�introduces�several�architectural�innovations�that�make�it�not�only�the�most�powerful�multiprocessor�we)ve�built,�but�also�the�most�programmable�and�power�efficient.��

�

SMX:�192�single�precision�CUDA�cores,�64�double�precision�units,�32�special�function�units�(SFU),�and�32�load/store�units�(LD/ST).�

��

Kepler�Memory�Subsystem�/�L1,�L2,�ECC�

Kepler&s�memory�hierarchy�is�organized�similarly�to�Fermi.�The�Kepler�architecture�supports�a�unified�memory�request�path�for�loads�and�stores,�with�an�L1�cache�per�SMX�multiprocessor.�Kepler�GK110�also�enables�compiler�directed�use�of�an�additional�new�cache�for�read�only�data,�as�described�below.�

�

�

64�KB�Configurable�Shared�Memory�and�L1�Cache�

In�the�Kepler�GK110�architecture,�as�in�the�previous�generation�Fermi�architecture,�each�SMX�has�64�KB�of�on�chip�memory�that�can�be�configured�as�48�KB�of�Shared�memory�with�16�KB�of�L1�cache,�or�as�16�KB�of�shared�memory�with�48�KB�of�L1�cache.�Kepler�now�allows�for�additional�flexibility�in�configuring�the�allocation�of�shared�memory�and�L1�cache�by�permitting�a�32KB�/�32KB�split�between�shared�memory�and�L1�cache.�To�support�the�increased�throughput�of�each�SMX�unit,�the�shared�memory�bandwidth�for�64b�and�larger�load�operations�is�also�doubled�compared�to�the�Fermi�SM,�to�256B�per�core�clock.�

48KB�Read�Only�Data�Cache�

In�addition�to�the�L1�cache,�Kepler�introduces�a�48KB�cache�for�data�that�is�known�to�be�read�only�for�the�duration�of�the�function.�In�the�Fermi�generation,�this�cache�was�accessible�only�by�the�Texture�unit.�Expert�programmers�often�found�it�advantageous�to�load�data�through�this�path�explicitly�by�mapping�their�data�as�textures,�but�this�approach�had�many�limitations.��

•  ~3kcores

CurrentTrendsinArchitecture

•  CannotconcnuetoleverageInstruccon-Levelparallelism(ILP)–  Singleprocessorperformanceimprovementendedin2003

•  Newmodelsforperformance:–  Data-levelparallelism(DLP)–  Thread-levelparallelism(TLP)–  Heterogeneity

•  Theserequireexplicitrestructuringoftheapplicacon

46

Parallelism

•  Classesofparallelisminapplicacons:–  Data-LevelParallelism(DLP)–  Task-LevelParallelism(TLP)

•  Classesofarchitecturalparallelism:–  Instruccon-LevelParallelism(ILP)–  Vectorarchitectures/GraphicProcessorUnits(GPUs)–  Thread-LevelParallelism–  Heterogeneity

47

ArchitecturalChallenges

•  Massive(ca.4X)increaseinconcurrency–  Mulccore(4-<100)àManycores(100s–1ks)

•  Heterogeneity–  System-level(accelerators)vschiplevel(embedded)

•  Computepowerandmemoryspeedchallenges(twowalls)–  500xcomputepowerand30xmemoryof2PFHW–  Memoryaccess'melagsfurtherbehind

48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Course Motivation: Research Perspective

��

��

��

� ��

��

��

��

��

��

� ��

��

!"�#��

"��$��

��%��

&�'(�)

(�*�!+(

�,��$��

'��'�+-��

��'.��

�/�0��1'��

&��2$��-��

'.��

(�$3

��

�4��5��2�6�&0�.&��7��8�9��1:*�4��9 ��'.�� $�� 4�6�� ;�� 2��<�� 4��9 ��'"<��5$�� <��5$�� ")

��

ECE 5950 Course Overview 18 / 35Data$Processing$in$Exascale1class$Computing$Systems$$|$$April$27,$2011$$|$$CRM$4"

Three"Eras"of"Processor"Performance"

Single4Core""Era"

Single1thread$$Performance$

?$

Time$

we#are#here#

o"

Enabled$by:$� ��$� Voltage$Scaling$� MicroArchitecture$

$

Constrained$by:$Power$Complexity$

Multi4Core""Era"

Throughput$$Performance$

Time$(##of#Processors)#

we#are#here#

o"

Enabled$by:$� ��$� Desire$for$Throughput$� 20$years$of$SMP$arch$

$

Constrained$by:$Power$Parallel$SW$availability$Scalability$

Heterogeneous"Systems"Era"

Targeted$Application$$

Performance$

Time$(Data1parallel#exploitation)#

we#are#here#

o"

Enabled$by:$� ��$� Abundant$data$parallelism$� Power$efficient$GPUs$

$

Currently)constrained$by:$Programming$models$Communication$overheads$

Source:ChuckMoore,DataProcessinginExaScale-ClassComputerSystems,Salishan,April2011

Exercise:InspectISAforsum

•  cp~yan/sum.c~(copysum.cfilefrommyhomefoldertoyourhomefolder)

•  gcc-save-tempssum.c–osum•  ./sum102400

•  visum.c•  visum.s•  Orcheckfrom:

–  hIps://passlab.github.io/CSE564/exercises/sum/

•  ViewthemfromHdrive•  Othersystemcommands:

–  cat/proc/cpuinfotoshowtheCPUand#cores–  topcommandtoshowsystemusageandmemory

49

Backup

50

New-SchoolMachineStructures

•  ParallelRequestsAssignedtocomputere.g.,Search“cats”

•  ParallelThreadsAssignedtocoree.g.,Lookup,Ads

•  ParallelInstruccons>[email protected].,5pipelinedinstruccons

•  ParallelData>[email protected].,Addof4pairsofwords

•  HardwaredescripconsAllgatesfuncconinginparallel

atsamecme

51

SmartPhone

Warehouse-Scale

Computer

SoLwareHardware

HarnessParallelism&AchieveHighPerformance

LogicGates

Core Core…

Memory(Cache)

Input/Output

Computer

MainMemory

Core

InstrucconUnit(s)

FuncconalUnit(s)

A3+B3A2+B2A1+B1A0+B0

51

CopingwithFailures

•  4disks/server,50,000servers•  Failurerateofdisks:2%to10%/year

–  Assume4%annualfailurerate•  Onaverage,howogendoesadiskfail?

a)  1/monthb)  1/weekc)  1/dayd)  1/hour

52

CopingwithFailures

•  4disks/server,50,000servers•  Failurerateofdisks:2%to10%/year

–  Assume4%annualfailurerate•  Onaverage,howogendoesadiskfail?

a)  1/monthb)  1/weekc)  1/dayd)  1/hour 50,000x4=200,000disks

200,000x4%=8000disksfail365daysx24hours=8760hours

53

GreatIdea:DependabilityviaRedundancy

•  Redundancysothatafailingpiecedoesn’tmakethewholesystemfail

1+1=2 1+1=2 1+1=1

1+1=22of3agree

FAIL!

Increasingtransistordensityreducesthecostofredundancy54

GreatIdea:DependabilityviaRedundancy

•  Appliestoeverythingfromdatacenterstostoragetomemorytoinstructors–  Redundantdatacenterssothatcanlose1datacenterbut

Internetservicestaysonline–  Redundantdiskssothatcanlose1diskbutnotlosedata

(RedundantArraysofIndependentDisks/RAID)–  Redundantmemorybitsofsothatcanlose1bitbutnodata

(ErrorCorreccngCode/ECCMemory)

55

UnderstandingComputerArchitecture

56de.pinterest.com

EndofMoore’sLaw?

57

Costpertransistorisrisingastransistorsizecon/nuesto

shrink

lecture 01: introduc/on - github pageslecture 01: introduc/on cse 564 computer architecture summer...

Documents