lecture 01: introduc/on - github pageslecture 01: introduc/on cse 564 computer architecture summer...
TRANSCRIPT
Lecture01:Introduc/on
CSE564ComputerArchitectureSummer2017
DepartmentofComputerScienceandEngineeringYonghongYan
[email protected]/~yan
1
CopyrightandAcknowledgement• Mostslideswereadaptedfromlecturesnotesofthetwotextbookswithcopyrightofpublisherortheoriginal
authorsincludingElsevierInc,MorganKaufmann,DavidA.PaIersonandJohnL.Hennessy.• Someslideswereadaptedfromthefollowingcourses:
– UCBerkeleycourse“ComputerScience252:GraduateComputerArchitecture”ofDavidE.CullerCopyright2005UCB• hIp://people.eecs.berkeley.edu/~culler/courses/cs252-s05/
– GreatIdeasinComputerArchitecture(MachineStructures)byRandyKatzandBernhardBoser• hIp://inst.eecs.berkeley.edu/~cs61c/fa16/
• Ialsorefertothefollowingcoursesandlecturenoteswhenpreparingmaterialsforthiscourse– ComputerScience152:ComputerArchitectureandEngineering,Spring2016byDr.GeorgeMichelogiannakisfrom
UCBerkeley• hIp://www-inst.eecs.berkeley.edu/~cs152/sp16/
– ComputerScience252:GraduateComputerArchitecture,Fall2015byProf.KrsteAsanovićfromUCBerkeley• hIp://www-inst.eecs.berkeley.edu/~cs252/fa15/
– ComputerScienceS250:VLSISystemsDesign,Spring2016byProf.JohnWawrzynekfromUCBerkeley• hIp://www-inst.eecs.berkeley.edu/~cs250/sp16/
– ComputerSystemArchitecture,Fall2005byDr.JoelEmerandProf.ArvindfromMIT• hIp://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-823-computer-system-architecture-
fall-2005/– SynthesisLecturesonComputerArchitecture
• hIp://www.morganclaypool.com/toc/cac/1/1
• Theusesoftheslidesofthiscourseareforeduca/onalpurposesonlyandshouldbeusedonlyinconjunc/onwiththetextbook.Deriva/vesoftheslidesmustacknowledgethecopyrightno/cesofthisandtheoriginals.Permissionforcommercialpurposesshouldbeobtainedfromtheoriginalcopyrightholderandthesuccessivecopyrightholdersincludingmyself. 2
Contents
• Computersandcomputercomponents• Computerarchitecturesandgreatideasinhistoryandnow• Performance
3
TheComputerRevolu/on
• Progressincomputertechnology– UnderpinnedbyMoore’sLaw
• Makesnovelapplicaconsfeasible– Computersinautomobiles– Cellphones– Humangenomeproject– WorldWideWeb– SearchEngines
• Computersarepervasive
4
ClassesofComputers
• PersonalMobileDevice(PMD)– e.g.smartphones,tabletcomputers– Emphasisonenergyefficiencyandreal-cme
• DesktopCompucng– Emphasisonprice-performance
• Servers– Emphasisonavailability,scalability,throughput
• Clusters/WarehouseScaleComputers– Usedfor“SogwareasaService(SaaS)”– Emphasisonavailabilityandprice-performance– Sub-class:Supercomputers,emphasis:floacng-pointperformanceand
fastinternalnetworks
• EmbeddedComputers– Emphasis:price
5
ThePostPCEra
6
ThePostPCEra
• PersonalMobileDevice(PMD)– BaIeryoperated– ConnectstotheInternet– Hundredsofdollars– Smartphones,tablets,electronicglasses
• Cloudcompucng– WarehouseScaleComputers(WSC)– SogwareasaService(SaaS)– PorconofsogwarerunonaPMDandaporconruninthe
Cloud– AmazonandGoogle
7
OldSchoolComputer
8
NewSchoolComputer(#1)
PersonalMobileDevices
99
NewSchool“Computer”(#2)
1010
ComponentsofaComputer
• Samecomponentsforallkindsofcomputer– Desktop,server,
embedded
• Input/outputincludes– User-interfacedevices
• Display,keyboard,mouse– Storagedevices
• Harddisk,CD/DVD,flash– Networkadapters
• Forcommunicacngwithothercomputers
TheBIGPicture
InsidetheProcessor(CPU)
• Funcconalunits:performscomputacons• Datapath:performsoperaconsondata• Control:sequencesdatapath,memory,...• Cachememory
– SmallfastSRAMmemoryforimmediateaccesstodata
Apple A5
12
ASafePlaceforData
• Volaclemainmemory– Losesinstrucconsanddatawhenpoweroff
• Non-volaclesecondarymemory– Magneccdisk– Flashmemory– Opccaldisk(CDROM,DVD)
Contents
• Computersandcomputercomponents• Computerarchitecturesandgreatideasinhistoryandnow
• Performance
14
Whatis“ComputerArchitecture”?
15
Applica'ons
InstrucconSetArchitecture
Compiler
OperacngSystem
Firmware
I/OsystemInstr.SetProc.
DigitalDesignCircuitDesign
Datapath&Control
Layout&fabSemiconductorMaterials
16
TheInstruc/onSet:aCri/calInterface
instrucconset
sogware
hardware
• Propercesofagoodabstraccon– Laststhroughmanygeneracons(portability)– Usedinmanydifferentways(generality)– Providesconvenientfuncconalitytohigherlevels– Permitsanefficientimplementaconatlowerlevels
17
ElementsofanISA
• Setofmachine-recognizeddatatypes– bytes,words,integers,floacngpoint,strings,...
• Operaconsperformedonthosedatatypes– Add,sub,mul,div,xor,move,….
• Programmablestorage– regs,PC,memory
• Methodsofidencfyingandobtainingdatareferencedbyinstruccons(addressingmodes)– Literal,reg.,absolute,relacve,reg+offset,…
• Format(encoding)oftheinstruccons– Opcode,operandfields,…
ComputerArchitectureHowthingsareputtogetherindesignandimplementa/on
• Capabilices&PerformanceCharacterisccsofPrincipalFuncconalUnits
– (e.g.,Registers,ALU,Shigers,LogicUnits,...)• Waysinwhichthesecomponentsareinterconnected• Informaconflowsbetweencomponents• Logicandmeansbywhichsuchinformaconflowiscontrolled.
• ChoreographyofFUstorealizetheISA
18
Great Ideas in Computer Architectures
1. Design for Moore’s Law
2. Use abstraction to simplify design
3. Make the common case fast
4. Performance via parallelism
5. Performance via pipelining
6. Performance via prediction
7. Hierarchy of memories
8. Dependability via redundancy
19
GreatIdea:“Moore’sLaw”
GordonMoore,FounderofIntel• 1965:sincetheintegratedcircuitwasinvented,thenumberof
transistors/inch2inthesecircuitsroughlydoubledeveryyear;thistrendwouldconcnuefortheforeseeablefuture
• 1975:revised-circuitcomplexitydoubleseverytwoyears
20Imagecredit:Intel
MicroprocessorTransistorCounts1971-2011&Moore'sLaw
21
hcps://en.wikipedia.org/wiki/Transistor_count
Moore’sLawtrends• Moretransistors=↑opportunicesforexploicngparallelisminthe
instrucconlevel(ILP)– Pipeline,superscalar,VLIW(VeryLongInstrucconWord),SIMD(Single
InstrucconMulcpleData)orvector,speculacon,branchprediccon• Generalpathofscaling
– Widerinstrucconissue,longerpiepline– Morespeculacon– Moreandlargerregistersandcache
• Increasingcircuitdensity~=increasingfrequency~=increasingperformance
• Transparenttousers– AneasyjobofgevngbeIerperformance:buyingfasterprocessors(higher
frequency)
• Wehaveenjoyedthisfreelunchforseveraldecades,however(TBD)…
22
GreatIdea:PipelineFundamentalExecu/onCycle
23
Instruc'onFetch
Instruc'onDecode
OperandFetch
Execute
ResultStore
NextInstruc'on
Obtaininstrucconfromprogramstorage
Determinerequiredacconsandinstrucconsize
Locateandobtainoperanddata
Computeresultvalueorstatus
Depositresultsinstorageforlateruse
Determinesuccessorinstruccon
Processor
regs
F.U.s
Memory
program
Data
vonNeumanboIleneck
PipelinedInstruc/onExecu/on
24
I n s t r. O r d e r
Time (clock cycles)
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5
GreatIdea:Abstrac/on(LevelsofRepresenta/on/Interpreta/on)
lw $t0,0($2)lw $t1,4($2)sw $t1,0($2)sw $t0,4($2)
HighLevelLanguageProgram(e.g.,C)
AssemblyLanguageProgram(e.g.,MIPS)
MachineLanguageProgram(MIPS)
HardwareArchitectureDescrip/on(e.g.,blockdiagrams)
Compiler
Assembler
MachineInterpreta4on
temp=v[k];v[k]=v[k+1];v[k+1]=temp;
0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 !
LogicCircuitDescrip/on(CircuitSchema/cDiagrams)
ArchitectureImplementa4on
Anythingcanberepresentedasanumber,
i.e.,dataorinstruccons
25
TheMemoryAbstrac/on
• Associaconof<name,value>pairs– typicallynamedasbyteaddresses– ogenvaluesalignedonmulcplesofsize
• SequenceofReadsandWrites• Writebindsavaluetoanaddress• ReadofaddrreturnsmostrecentlywriIenvalueboundtothataddress
26
address (name) command (R/W)
data (W)
data (R)
done
Processor-DRAMMemoryGap(latency)
27
µProc 60%/yr. (2X/1.5yr)
DRAM 9%/yr. (2X/10 yrs)
1!
10!
100!
1000!19
80!
1981
!
1983
!19
84!
1985
!19
86!
1987
!19
88!
1989
!19
90!
1991
!19
92!
1993
!19
94!
1995
!19
96!
1997
!19
98!
1999
!20
00!
DRAM
CPU!19
82!
Processor-Memory Performance Gap: (grows 50% / year)
Perf
orm
ance
Time
“Joy’s Law”
ThePrincipleofLocality
• ThePrincipleofLocality:– Programaccessarelacvelysmallporconoftheaddressspace
atanyinstantofcme.• TwoDifferentTypesofLocality:
– TemporalLocality(LocalityinTime):Ifanitemisreferenced,itwilltendtobereferencedagainsoon(e.g.,loops,reuse)
– SpacalLocality(LocalityinSpace):Ifanitemisreferenced,itemswhoseaddressesareclosebytendtobereferencedsoon(e.g.,straightlinecode,arrayaccess)
• Last30years,HWreliedonlocalityforspeed
28
P MEM$
Greatidea:MemoryHierarchyLevelsoftheMemoryHierarchy
29
CPU Registers 100s Bytes << 1s ns
Cache 10s-100s K Bytes ~1 ns $1s/ MByte
Main Memory M Bytes 100ns- 300ns $< 1/ MByte
Disk 10s G Bytes, 10 ms (10,000,000 ns) $0.001/ MByte
Capacity Access Time Cost
Tape infinite sec-min $0.0014/ MByte
Registers
Cache
Memory
Disk
Tape
Instr. Operands
Blocks
Pages
Files
Staging Xfer Unit
prog./compiler 1-8 bytes
cache cntl 8-128 bytes
OS 512-4K bytes
user/operator Mbytes
Upper Level
Lower Level
faster
Larger
JimGray’sStorageLatencyAnalogy:HowFarAwayistheData?
30Registers
On Chip Cache On Board Cache
Main Memory
Disk
1
2 10
100
Tape /Optical Robot
10 9
10 6
Lansing
This Campus
This Room My Head
10 min
1.5 hr
2 Years
1 min
Pluto
2,000 Years
Andromeda
(ns)
JimGrayTuringAwardB.S.Cal1966Ph.D.Cal1969!
TheCacheDesignSpace
• Severalinteraccngdimensions– cachesize– blocksize– associacvity– replacementpolicy– write-throughvswrite-back
• Theopcmalchoiceisacompromise– dependsonaccesscharacterisccs
• workload• use(I-cache,D-cache,TLB)
– dependsontechnology/cost• Simplicityogenwins
31
Associativity
Cache Size
Block Size
Bad
Good Less More
Factor A Factor B
GreatIdea:Parallelism
32
DefiningComputerArchitecture
• “Old”viewofcomputerarchitecture:– InstrucconSetArchitecture(ISA)design– i.e.decisionsregarding:
• registers,memoryaddressing,addressingmodes,instrucconoperands,availableoperacons,controlflowinstruccons,instrucconencoding
• “Real”computerarchitecture:– Specificrequirementsofthetargetmachine– Designtomaximizeperformancewithinconstraints:cost,
power,andavailability– IncludesISA,microarchitecture,hardware
33
ComputerArchitectureTopics
34
Instruction Set Architecture
Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation, Vector, Dynamic Compilation
Addressing, Protection, Exception Handling
L1 Cache
L2/L3 Cache
DRAM
Disks, WORM, Tape
Coherence, Bandwidth, Latency
Emerging Technologies Interleaving Bus protocols
RAID
VLSI
Input/Output and Storage
Memory Hierarchy
Pipelining and Instruction Level Parallelism
Network Communication
Oth
er P
roce
ssor
s
WhyisArchitectureExci/ngToday?
35
CPUClockS
peed+15%
/year
CPUSpeedFlat
Problemsoftradi/onalILPscaling
• Fundamentalcircuitlimitacons1– delays⇑asissuequeues⇑andmulc-portregisterfiles⇑– increasingdelayslimitperformancereturnsfromwiderissue
• Limitedamountofinstruccon-levelparallelism1
– inefficientforcodeswithdifficult-to-predictbranches
• Powerandheatstallclockfrequencies
36
[1]Thecaseforasingle-chipmulcprocessor,K.Olukotun,B.Nayfeh,L.Hammond,K.Wilson,andK.Chang,ASPLOS-VII,1996.
ILPimpacts
37
Simula/onsof8-issueSuperscalar
38
Power/heatdensitylimitsfrequency
39
• Somefundamentalphysicallimitsarebeingreached
Wewillhavethis…
40
41
Revolu/onishappeningnow• Chipdensityis
concnuingincrease~2xevery2years– Clockspeedisnot– Numberofprocessor
coresmaydoubleinstead
• ThereisliIleornohiddenparallelism(ILP)tobefound
• Parallelismmustbeexposedtoandmanagedbysogware– Nofreelunch
Source:Intel,Microsog(SuIer)andStanford(Olukotun,Hammond)
SingleProcessorPerformance
RISC
Move to multi-processor
42
IBMBG/L
ASCIWhite Pacific
EDSAC1 UNIVAC1
IBM7090 CDC6600
IBM360/195 CDC7600
Cray1
CrayX-MP Cray2
TMCCM-2
TMCCM-5 CrayT3D
ASCIRed
1950 1960 1970 1980 1990 2000 2010
1KFlop/s
1MFlop/s
1GFlop/s
1TFlop/s
1PFlop/s
Scalar
Super Scalar
Parallel
Vector
1941 1 (Floating Point operations / second, Flop/s) 1945 100 1949 1,000 (1 KiloFlop/s, KFlop/s) 1951 10,000 1961 100,000 1964 1,000,000 (1 MegaFlop/s, MFlop/s) 1968 10,000,000 1975 100,000,000 1987 1,000,000,000 (1 GigaFlop/s, GFlop/s) 1992 10,000,000,000 1993 100,000,000,000 1997 1,000,000,000,000 (1 TeraFlop/s, TFlop/s) 2000 10,000,000,000,000 2005 131,000,000,000,000 (131 Tflop/s)
Super Scalar/Vector/Parallel
(103)
(106)
(109)
(1012)
(1015)
2XTransistors/ChipEvery1.5Years
Thetrends
43
Recentmul/coreprocessors
44
RecentmanycoreGPUprocessors
45
��
An�Overview�of�the�GK110�Kepler�Architecture�Kepler�GK110�was�built�first�and�foremost�for�Tesla,�and�its�goal�was�to�be�the�highest�performing�parallel�computing�microprocessor�in�the�world.�GK110�not�only�greatly�exceeds�the�raw�compute�horsepower�delivered�by�Fermi,�but�it�does�so�efficiently,�consuming�significantly�less�power�and�generating�much�less�heat�output.��
A�full�Kepler�GK110�implementation�includes�15�SMX�units�and�six�64�bit�memory�controllers.��Different�products�will�use�different�configurations�of�GK110.��For�example,�some�products�may�deploy�13�or�14�SMXs.��
Key�features�of�the�architecture�that�will�be�discussed�below�in�more�depth�include:�
� The�new�SMX�processor�architecture�� An�enhanced�memory�subsystem,�offering�additional�caching�capabilities,�more�bandwidth�at�
each�level�of�the�hierarchy,�and�a�fully�redesigned�and�substantially�faster�DRAM�I/O�implementation.�
� Hardware�support�throughout�the�design�to�enable�new�programming�model�capabilities�
�
Kepler�GK110�Full�chip�block�diagram�
��
Streaming�Multiprocessor�(SMX)�Architecture�
Kepler�GK110)s�new�SMX�introduces�several�architectural�innovations�that�make�it�not�only�the�most�powerful�multiprocessor�we)ve�built,�but�also�the�most�programmable�and�power�efficient.��
�
SMX:�192�single�precision�CUDA�cores,�64�double�precision�units,�32�special�function�units�(SFU),�and�32�load/store�units�(LD/ST).�
��
Kepler�Memory�Subsystem�/�L1,�L2,�ECC�
Kepler&s�memory�hierarchy�is�organized�similarly�to�Fermi.�The�Kepler�architecture�supports�a�unified�memory�request�path�for�loads�and�stores,�with�an�L1�cache�per�SMX�multiprocessor.�Kepler�GK110�also�enables�compiler�directed�use�of�an�additional�new�cache�for�read�only�data,�as�described�below.�
�
�
64�KB�Configurable�Shared�Memory�and�L1�Cache�
In�the�Kepler�GK110�architecture,�as�in�the�previous�generation�Fermi�architecture,�each�SMX�has�64�KB�of�on�chip�memory�that�can�be�configured�as�48�KB�of�Shared�memory�with�16�KB�of�L1�cache,�or�as�16�KB�of�shared�memory�with�48�KB�of�L1�cache.�Kepler�now�allows�for�additional�flexibility�in�configuring�the�allocation�of�shared�memory�and�L1�cache�by�permitting�a�32KB�/�32KB�split�between�shared�memory�and�L1�cache.�To�support�the�increased�throughput�of�each�SMX�unit,�the�shared�memory�bandwidth�for�64b�and�larger�load�operations�is�also�doubled�compared�to�the�Fermi�SM,�to�256B�per�core�clock.�
48KB�Read�Only�Data�Cache�
In�addition�to�the�L1�cache,�Kepler�introduces�a�48KB�cache�for�data�that�is�known�to�be�read�only�for�the�duration�of�the�function.�In�the�Fermi�generation,�this�cache�was�accessible�only�by�the�Texture�unit.�Expert�programmers�often�found�it�advantageous�to�load�data�through�this�path�explicitly�by�mapping�their�data�as�textures,�but�this�approach�had�many�limitations.��
• ~3kcores
CurrentTrendsinArchitecture
• CannotconcnuetoleverageInstruccon-Levelparallelism(ILP)– Singleprocessorperformanceimprovementendedin2003
• Newmodelsforperformance:– Data-levelparallelism(DLP)– Thread-levelparallelism(TLP)– Heterogeneity
• Theserequireexplicitrestructuringoftheapplicacon
46
Parallelism
• Classesofparallelisminapplicacons:– Data-LevelParallelism(DLP)– Task-LevelParallelism(TLP)
• Classesofarchitecturalparallelism:– Instruccon-LevelParallelism(ILP)– Vectorarchitectures/GraphicProcessorUnits(GPUs)– Thread-LevelParallelism– Heterogeneity
47
ArchitecturalChallenges
• Massive(ca.4X)increaseinconcurrency– Mulccore(4-<100)àManycores(100s–1ks)
• Heterogeneity– System-level(accelerators)vschiplevel(embedded)
• Computepowerandmemoryspeedchallenges(twowalls)– 500xcomputepowerand30xmemoryof2PFHW– Memoryaccess'melagsfurtherbehind
48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Course Motivation: Research Perspective
���
�����
�����
� ���� ������ ����
��� ��� ��� ��� ���� ���� �� � �� � ����
��� ��� ��� ������� �� ��
�������
���������
����������
� �����
������
!"�#���
"��$������������
���%�����
&�'(�)
(�*�!+(
�,����$����
'�����'�+-��
�����'.�� �
�/�0��1'���
&��2$���-�������
'.�� �
(�$3
�������
�4��5��2�6�&0�.&���7��8�9��1:*�4��9 ��'.�� ���������$�� �4�6����� ����;����� ��2��<�� �4��9 ��'"<�����5$�� <��5$�� �")
��������
ECE 5950 Course Overview 18 / 35Data$Processing$in$Exascale1class$Computing$Systems$$|$$April$27,$2011$$|$$CRM$4"
Three"Eras"of"Processor"Performance"
Single4Core""Era"
Single1thread$$Performance$
?$
Time$
we#are#here#
o"
Enabled$by:$� ���������$� Voltage$Scaling$� MicroArchitecture$
$
Constrained$by:$Power$Complexity$
Multi4Core""Era"
Throughput$$Performance$
Time$(##of#Processors)#
we#are#here#
o"
Enabled$by:$� ���������$� Desire$for$Throughput$� 20$years$of$SMP$arch$
$
Constrained$by:$Power$Parallel$SW$availability$Scalability$
Heterogeneous"Systems"Era"
Targeted$Application$$
Performance$
Time$(Data1parallel#exploitation)#
we#are#here#
o"
Enabled$by:$� ���������$� Abundant$data$parallelism$� Power$efficient$GPUs$
$
Currently)constrained$by:$Programming$models$Communication$overheads$
Source:ChuckMoore,DataProcessinginExaScale-ClassComputerSystems,Salishan,April2011
Exercise:InspectISAforsum
• cp~yan/sum.c~(copysum.cfilefrommyhomefoldertoyourhomefolder)
• gcc-save-tempssum.c–osum• ./sum102400
• visum.c• visum.s• Orcheckfrom:
– hIps://passlab.github.io/CSE564/exercises/sum/
• ViewthemfromHdrive• Othersystemcommands:
– cat/proc/cpuinfotoshowtheCPUand#cores– topcommandtoshowsystemusageandmemory
49
Backup
50
New-SchoolMachineStructures
• ParallelRequestsAssignedtocomputere.g.,Search“cats”
• ParallelThreadsAssignedtocoree.g.,Lookup,Ads
• ParallelInstruccons>[email protected].,5pipelinedinstruccons
• ParallelData>[email protected].,Addof4pairsofwords
• HardwaredescripconsAllgatesfuncconinginparallel
atsamecme
51
SmartPhone
Warehouse-Scale
Computer
SoLwareHardware
HarnessParallelism&AchieveHighPerformance
LogicGates
Core Core…
Memory(Cache)
Input/Output
Computer
MainMemory
Core
InstrucconUnit(s)
FuncconalUnit(s)
A3+B3A2+B2A1+B1A0+B0
51
CopingwithFailures
• 4disks/server,50,000servers• Failurerateofdisks:2%to10%/year
– Assume4%annualfailurerate• Onaverage,howogendoesadiskfail?
a) 1/monthb) 1/weekc) 1/dayd) 1/hour
52
CopingwithFailures
• 4disks/server,50,000servers• Failurerateofdisks:2%to10%/year
– Assume4%annualfailurerate• Onaverage,howogendoesadiskfail?
a) 1/monthb) 1/weekc) 1/dayd) 1/hour 50,000x4=200,000disks
200,000x4%=8000disksfail365daysx24hours=8760hours
53
GreatIdea:DependabilityviaRedundancy
• Redundancysothatafailingpiecedoesn’tmakethewholesystemfail
1+1=2 1+1=2 1+1=1
1+1=22of3agree
FAIL!
Increasingtransistordensityreducesthecostofredundancy54
GreatIdea:DependabilityviaRedundancy
• Appliestoeverythingfromdatacenterstostoragetomemorytoinstructors– Redundantdatacenterssothatcanlose1datacenterbut
Internetservicestaysonline– Redundantdiskssothatcanlose1diskbutnotlosedata
(RedundantArraysofIndependentDisks/RAID)– Redundantmemorybitsofsothatcanlose1bitbutnodata
(ErrorCorreccngCode/ECCMemory)
55
UnderstandingComputerArchitecture
56de.pinterest.com
EndofMoore’sLaw?
57
Costpertransistorisrisingastransistorsizecon/nuesto
shrink