embedded computer architecture - engbloms.se · »interrupt latency ... 29 november 2002 embedded...
TRANSCRIPT
Jakob Engblom, PhDJakob Engblom, PhDUppsala Uppsala UniveUniversityrsity & Virtutech Inc.& Virtutech Inc.
[email protected]@[email protected]@virtutech.com
EmbeddedEmbedded Systems Systems ComputerComputer
ArchitectureArchitecture
techvirtutechvirtutechvirtutechvirtu
29 November 2002 Embedded Computer Architecture 2
Embedded Embedded SystemsSystems
29 November 2002 Embedded Computer Architecture 3
Embedded SystemsEmbedded Systems
It is a It is a snakesnake!!
No, a No, a wallwall!!
No, a No, a pillar!pillar!
No, it is a No, it is a treetrunktreetrunk!!
You’re You’re all all wrongwrong, it is a , it is a
fan!fan!
Now what Now what is this is this elephant thingelephant thing??
29 November 2002 Embedded Computer Architecture 4
Embedded SystemsEmbedded Systems
““A computer that doesn’t A computer that doesn’t look like a computer”look like a computer”
Interacts with worldInteracts with world
Primitive or no user interfacePrimitive or no user interface
Part of other productsPart of other products
29 November 2002 Embedded Computer Architecture 5
Embedded SystemsEmbedded Systems
Single purpose productsSingle purpose productsNot Not general purposegeneral purpose like desktop PCslike desktop PCsDo one thing very efficientlyDo one thing very efficiently
Software very important:Software very important:Gives character to productGives character to product
Used to differentiate inside a “platform”Used to differentiate inside a “platform”
Can be changed lateCan be changed lateProcessor cheaper than special HWProcessor cheaper than special HWTToday, dominates dev costoday, dominates dev cost
29 November 2002 Embedded Computer Architecture 6
"Desktop"2%
"Embedded"98%
Processor MarketProcessor Market
Embedded Embedded = most= most processors!processors!200 million PC and server200 million PC and server8000 million embedded8000 million embedded
29 November 2002 Embedded Computer Architecture 7
Processor MarketProcessor Market
Processors: Processors: 50% of all 50% of all semiconductor revenuesemiconductor revenueExplains why everyone Explains why everyone wants to do processorswants to do processors
3232--bit dominantbit dominant30% of total 30% of total semiconductorssemiconductors
PC processors: PC processors: 50% of CPU revenue50% of CPU revenue15% of total 15% of total semiconductorssemiconductorsAMD and Intel share itAMD and Intel share it
32-bit
16-bit
8-bit
4-bit
DSP
32-bit
16-bit
8-bit4-bitDSP
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Units Money
29 November 2002 Embedded Computer Architecture 8
RealReal--Time SystemTime System
Timing as important as resultTiming as important as resultHard realHard real--time:time:
Hard deadlinesHard deadlinesDead if missed deadlineDead if missed deadlineWorstWorst--casecase
Soft realSoft real--time:time:Fuzzier deadlinesFuzzier deadlinesCan miss some deadlinesCan miss some deadlinesAverageAverage--casecase
29 November 2002 Embedded Computer Architecture 9
RealReal--Time SystemsTime Systems
Embedded and RealEmbedded and Real--TimeTimeSynonymous?Synonymous?
Most embedded Most embedded systems are systems are realreal--timetimeMost realMost real--time time systems are systems are embeddedembedded
embeddedembedded
realreal--timetime
embedded embedded realreal--timetime
29 November 2002 Embedded Computer Architecture 10
Simple Embedded Simple Embedded SystemsSystems
8-bit Hitachi H8/30032 kB ROM, 32 kB RAM
Standard microcontroller chip
Byte-code machine, sensor drivers, …
8-bit Intel 8051, standard microcontroller
Behavior, talk, IR communications
29 November 2002 Embedded Computer Architecture 11
Fun App: Smart Beer GlassFun App: Smart Beer Glass
88--bbit, 8it, 8--pin pin PIC processorPIC processor
Capacitive Capacitive senssensor for or for fluid levelfluid level
InduInductive coil for ctive coil for RF ID activation RF ID activation
& power& power
CPU and reading coil in the table. Reports the level of fluid in the glass, alerts servers when close to empty
ContContactless actless transmission of transmission of
power and power and readingsreadings
29 November 2002 Embedded Computer Architecture 12
No Upgrades PossibleNo Upgrades Possible
Once a product ships…Once a product ships…
…it often cannot be serviced…it often cannot be servicedNo download abilityNo download abilityNo writable persistent storageNo writable persistent storageNo disksNo disksNo loaderNo loader
Software is writeSoftware is write--onceonce
(There are exceptions)(There are exceptions)
29 November 2002 Embedded Computer Architecture 13
Consumer ElectronicsConsumer Electronics
Heterogeneous Heterogeneous multiprocessormultiprocessor
88--bit Atmel AVR for UI, games, …bit Atmel AVR for UI, games, …1616--bit fixedbit fixed--point TI C54 DSP for point TI C54 DSP for GSM coding, radio interface, … GSM coding, radio interface, … 3232--bit ARM7 in Bluetooth modulebit ARM7 in Bluetooth module+ maybe ARM7 in IRDA interface+ maybe ARM7 in IRDA interface
All in custom chipsAll in custom chipsSoftware is large:Software is large:
16 MB of code in control part16 MB of code in control partPlus signal processing codePlus signal processing code
29 November 2002 Embedded Computer Architecture 14
AutomAutomotiveotive
Multiple networksMultiple networksCAN for body CAN for body electronics: 30+ nodeselectronics: 30+ nodesCAN for engine control: CAN for engine control: few nodesfew nodesLIN for instrumentsLIN for instruments
Many processorsMany processorsUp to 100Up to 100
Large diversity in processor types:Large diversity in processor types:88--bit CPUs (PIC, HC08) for door locks, lights, etc. bit CPUs (PIC, HC08) for door locks, lights, etc. 1616--bit CPUs (C167, HC11, HC12) for most functionsbit CPUs (C167, HC11, HC12) for most functions3232--bit CPUs (PPC,V850) for engine control, airbagsbit CPUs (PPC,V850) for engine control, airbags
Total amount of code: 40Total amount of code: 40--50 MB50 MB
29 November 2002 Embedded Computer Architecture 15
AutomotiveAutomotive
Form follows functionForm follows functionProcessing where the action isProcessing where the action isArchitecture given by applicationArchitecture given by applicationSensors and actuators distributedSensors and actuators distributed
Heterogeneous systemsHeterogeneous systemsMany Many different makes of different makes of CPUsCPUsStandardizedStandardized at the at the networknetwork/bus/bus
29 November 2002 Embedded Computer Architecture 16
Timing AspectsTiming Aspects
Interrupt latencyInterrupt latencyImportant criterion for embeddedImportant criterion for embeddedA few clock cycles at mostA few clock cycles at mostMeasure of RTOS performanceMeasure of RTOS performance
RealReal--Time = predictabilityTime = predictabilityInIn--order pipelinesorder pipelinesSRAM instead of cachesSRAM instead of cachesLockable cachesLockable cachesSeveral small CPUs instead of one bigSeveral small CPUs instead of one big
29 November 2002 Embedded Computer Architecture 17
Military Military ShShipboardipboardStandard multiprocessor UltraSparc servers for radar, target tracking, combat control, …
Many CPUs in missiles, gun controls, engines, …
29 November 2002 Embedded Computer Architecture 18
Mobile Phone Base StationMobile Phone Base Station
Handle signalsHandle signalsData streams to and from Data streams to and from phonesphonesMassively parallel systemMassively parallel systemThousands of DSP tasksThousands of DSP tasksPerfect parallel scalabilityPerfect parallel scalability
Custom or standard Custom or standard DSPsDSPsUp to 8 Up to 8 DSPsDSPs on a single chipon a single chip
29 November 2002 Embedded Computer Architecture 19
Embedded Embedded processingprocessing
29 November 2002 Embedded Computer Architecture 20
IntegrationIntegration
A single chip:A single chip:CPU CoreCPU CoreIntegrated memoryIntegrated memoryIntegrated peripheralsIntegrated peripheralsIntegrated servicesIntegrated services
Goal:Goal:System on one chipSystem on one chipNo external HWNo external HWFit application “perfectly” Fit application “perfectly”
CPUCore
RAM(small)
ROM(big)
UA
RT
A/D
Tim
er
LC
D D
Outside World
29 November 2002 Embedded Computer Architecture 21
ProcessorsProcessors: Wide Span: Wide Span
BitwidthsBitwidths: 4 to 64 bits: 4 to 64 bitsMost common: 8 bit (4G units)Most common: 8 bit (4G units)3232--bit growing fastestbit growing fastest32/6432/64--bit outnumbers desktopbit outnumbers desktop
Frequency: DC to Frequency: DC to 22 GhzGhz
Memory: From Memory: From 0.5 kB to 5 MB0.5 kB to 5 MB
Power: Power: mWmW (and up)(and up)
1/30 to 10 instructions per cycle1/30 to 10 instructions per cycle
29 November 2002 Embedded Computer Architecture 22
Devices on the ChipDevices on the Chip
Interface with the worldInterface with the worldDigital I/ODigital I/OAnalog/Digital conversionAnalog/Digital conversionDigital/Analog conversionDigital/Analog conversion
CommunicationsCommunicationsCAN networksCAN networksEthernet networksEthernet networksRadioRadioSerial ports (UART, USART)Serial ports (UART, USART)USB, FireWire, ... USB, FireWire, ...
29 November 2002 Embedded Computer Architecture 23
Devices on the ChipDevices on the Chip
TimersTimersTrigger interruptsTrigger interruptsWatchdogsWatchdogs
GraphicsGraphicsLCD driversLCD drivers2D/3D graphics acceleration2D/3D graphics acceleration
BusesBusesOnOn--chipchip:: between devices: AMBA, … between devices: AMBA, … OffOff--chip: PCI, chip: PCI, HyperTransportHyperTransport, , RapidIORapidIO … …
29 November 2002 Embedded Computer Architecture 24
TrendsTrends
MarketMarket3232--bit market is growing fastbit market is growing fastDSPsDSPs are growing fastare growing fast
TechnologyTechnologyConfigurable processorsConfigurable processorsConfigurable logic as helpConfigurable logic as helpFusion of DSP and microcontrollersFusion of DSP and microcontrollersMore complex architecturesMore complex architecturesHigher integration on each chipHigher integration on each chipMultiprocessors onMultiprocessors on--aa--chipchip
29 November 2002 Embedded Computer Architecture 25
TrendsTrends
Hardware to softwareHardware to softwareIncrease flexibility, lower costIncrease flexibility, lower costSoftware on fast processor can equal HWSoftware on fast processor can equal HW
Software to hardwareSoftware to hardwareBetter power consumption & performanceBetter power consumption & performanceDesign custom hardware for applicationDesign custom hardware for application
HardwareHardware--software software codesigncodesignDelay division HW/SW to late in projectDelay division HW/SW to late in projectObtain “optimal” HW/SW divisionObtain “optimal” HW/SW division
29 November 2002 Embedded Computer Architecture 26
Control Control vsvs DataData
Control plane:Control plane:MicrocontrollersMicrocontrollersDecisionDecision--makingmaking“Integer applications”“Integer applications”UI of a phone, packet routing, …UI of a phone, packet routing, …
Data plane:Data plane:Move or process dataMove or process dataPerformance is keyPerformance is keySignal processing, multimedia, … Signal processing, multimedia, … Floating/fixed pointFloating/fixed point
29 November 2002 Embedded Computer Architecture 27
On-chip bus
SystemSystem--onon--aa--chipchip
Integration Integration extremeextreme
Thanks to modern Thanks to modern semiconductorssemiconductors
Entire product Entire product on a chipon a chipOne or more One or more processors, processors, accelerators, …accelerators, …
DSP
LCD driver
CPU
Blu
eto
oth
GSM Radio
Code memory
Data mem
29 November 2002 Embedded Computer Architecture 28
PacPackagingkaging
29 November 2002 Embedded Computer Architecture 29
Packaging of ProcessingPackaging of Processing
MicroprocessorMicroprocessorStandard standStandard stand--alone processor chipalone processor chip
MicrocontrollerMicrocontrollerProcessor plus devicesProcessor plus devices
ASIPASIPApplicationApplication--Specific Integrated Processor Specific Integrated Processor
ASICASICApplicationApplication--Specific Integrated CircuitSpecific Integrated Circuit
FPGAFPGAFieldField--Programmable Gate ArrayProgrammable Gate Array
29 November 2002 Embedded Computer Architecture 30
MicrocontrollersMicrocontrollers
Classic embedded hardwareClassic embedded hardwareStandard partsStandard parts
Quite broad application domainsQuite broad application domainsSold in large seriesSold in large seriesDefined by hardware vendorsDefined by hardware vendorsAs cheap as a single dollarAs cheap as a single dollar
Single processor + devicesSingle processor + devicesHuge number of variantsHuge number of variantsUsually intended for control planeUsually intended for control plane
Mic
roco
ntr
olle
rs
29 November 2002 Embedded Computer Architecture 31
Example: PIC 12CE674Example: PIC 12CE674Memory arch:Memory arch: HarvardHarvard
Program memory:Program memory: 2048 x 14 (OTP/Flash)2048 x 14 (OTP/Flash)
EEPROM:EEPROM: 16 bytes16 bytes
RAM:RAM: 128 bytes128 bytes
ADC channels:ADC channels: 4 (8 bits)4 (8 bits)
I/O ports:I/O ports: 66
Timers:Timers: One 8One 8--bit, One WDTbit, One WDT
Clock:Clock: onchiponchip crystal, 10MHzcrystal, 10MHz
Package:Package: 8 pins (Pentium 4:8 pins (Pentium 4:700700 pins)pins)
Cost:Cost: <<$1.00 (Pentium 4:>$200.00)$1.00 (Pentium 4:>$200.00)
29 November 2002 Embedded Computer Architecture 32
Example: AT91M42800AExample: AT91M42800A
ARM7TDMI 32ARM7TDMI 32--bit corebit coreStatic design: 0 to 33 Static design: 0 to 33 MhzMhz
MemoryMemory8 8 kBkB SRAM on chipSRAM on chipExternal memory interface, 8/16 bit interfaceExternal memory interface, 8/16 bit interface
DevicesDevices6 timers6 timers2 serial ports2 serial ports
JTAG debug interfaceJTAG debug interfaceAbout 0.5 W powerAbout 0.5 W powerAbout 18 USDAbout 18 USD
144 Pin package144 Pin packageOne of 13 AT91 One of 13 AT91 variantsvariants
29 November 2002 Embedded Computer Architecture 33
ASIPsASIPs / / ASSPsASSPs
ApplicationApplication--specific specific integrated/standard processorintegrated/standard processor
Targeting a particular niche marketTargeting a particular niche marketMore targeted than microcontrollerMore targeted than microcontrollerDomainDomain--specific acceleratorsspecific accelerators
Usually more upscaleUsually more upscale3232--bit processorsbit processorsMultiprocessors Multiprocessors Expensive peripheralsExpensive peripheralsExternal memory assumedExternal memory assumedHigher performance, includes dataHigher performance, includes data--planeplane A
SIP
/ A
SS
P
29 November 2002 Embedded Computer Architecture 34
Example: Example: PowerQUICCPowerQUICC IIIIII
MotorolaMotorolaTarget marketTarget market
CommunicationsCommunications
Processing Processing PowerPC e500PowerPC e500666666--1000 1000 MhzMhz256 256 kBkB L2 cacheL2 cache
NetworkingNetworkingCPM module, RISCCPM module, RISC--based microcodebased microcode
About 160 USDAbout 160 USD
Features
Capabilities
256Multichannel HDLC (from MCC2)
2Utopia II ATM (from FCC)
2Ethernet 10/100/1000
3Ethernet, 10/100 (from FCC)
4Ethernet, 10 (from SCC)
2Ethernet 10/100/1000 controller
1RapidIO controller
1PCI-X/PCI controller
11DDR Memory controller
1I2C controller
1Serial Peripheral Interface (SPI)
2Serial Management Controller (SMC)
2Multi-Channel Controller (MCC2)
3Fast Communications Controller (FCC)
4Serial Communications Controller (SCC)
29 November 2002 Embedded Computer Architecture 35
Example: C167CSExample: C167CS
InfineonInfineon
Target MarketTarget MarketAutomotive controlAutomotive control
ProcessingProcessing1616--bit C16x corebit C16x core44--stage simple pipelinestage simple pipeline40 40 MhzMhz operationoperation16 MB memory space, 16 MB memory space, including ROM, RAM, including ROM, RAM, devicesdevices
144 pin package144 pin packageTolerates Tolerates --40 C to +125 C40 C to +125 C
About 25 USDAbout 25 USD
1Synchronous Serial Comms (SSC)
8 kBExtension Internal RAM (XRAM)
3 kBFast General Internal RAM (IRAM)
Devices
External Ports
32 kBROM
Memory
116-bit ports from devices
88-bit ports from devices
2CAN interfaces
2x16Capture/Compare Channels
1USART
24+8Analog-Digital Converter Channels
1Pulse-Width Modulator (PWM)
1Watch-Dog Timer (WDT)
5General-Purpose Timers (GPT)
2CAN 2.0b controllers
29 November 2002 Embedded Computer Architecture 36
Example: TI OMAP 5910Example: TI OMAP 5910
Texas InstrumentsTexas InstrumentsTarget marketTarget market
DataData--intense realintense real--timetimeAudio, biometrics, etc.Audio, biometrics, etc.
Processing Processing DualDual--core chipcore chipARM925T 150 ARM925T 150 MhzMhzTI C55 DSP 150 TI C55 DSP 150 MhzMhz
Power 230 Power 230 mWmWPrice 32 USDPrice 32 USD
ARM shared devices
ARM private devices
System devices
DSP shared devices
DSP private devices
C55xDSP Core
24k I$
64k data SRAM
96k instrSRAM
ARM925CPU Core
16k I$
8k D$
MMU
192k Shared SRAM
MemCtrl
75 Mhz
LCD Ctrl
USB 1.1LCD controllerMMC/SDcard intfcamera interface keyboard interfaceRTCI2C8 serial ports3 UARTs14 GPIO pins
USB 1.1USB 1.1LCD controllerLCD controllerMMC/MMC/SDcardSDcard intfintfcamera interface camera interface keyboard interfacekeyboard interfaceRTCRTCI2CI2C8 serial ports8 serial ports3 3 UARTsUARTs14 GPIO pins14 GPIO pins
29 November 2002 Embedded Computer Architecture 37
ASICsASICs
ApplicationApplication--specific specific integrated circuitintegrated circuit
Fully custom hardwareFully custom hardwareCustom for your applicationCustom for your applicationAs small or large as necessaryAs small or large as necessary
CharacteristicsCharacteristicsExpensive to developExpensive to develop
10s of engineers, often 100s10s of engineers, often 100s
Large series necessary to pay offLarge series necessary to pay offAt least 100 000 units necessary on averageAt least 100 000 units necessary on average
Mostly for large companiesMostly for large companiesTypically, they become Typically, they become SoCsSoCs A
SIC
29 November 2002 Embedded Computer Architecture 38
ASIC ComponentsASIC Components: ”IP”: ”IP”
IP BlocksIP BlocksIntellectual PropertyIntellectual PropertyCompanies sell pieces of hardwareCompanies sell pieces of hardware
Examples:Examples:CPU CoresCPU CoresMemoryMemoryBusesBusesNetwork interfacesNetwork interfacesAccelerator circuitsAccelerator circuits
DSP
LCD
driver
CPU
Blu
eto
oth
GSM
Radio
Code memory
Data
mem
29 November 2002 Embedded Computer Architecture 39
CPU CoresCPU Cores
The biggest “IP” businessThe biggest “IP” business
Biggest players:Biggest players:ARM (bestARM (best--selling 32selling 32--bit bit architecturearchitecture))MIPS (and its licensees)MIPS (and its licensees)
Crowded fieldCrowded fieldNew companies appear monthlyNew companies appear monthly““FablessFabless” semiconductor companies” semiconductor companiesTuned for a particular applicationTuned for a particular application
29 November 2002 Embedded Computer Architecture 40
Hard Hard vsvs Soft IPSoft IP
Hard IP:Hard IP:Customer buys a core as black boxCustomer buys a core as black boxExamples: ARM & MIPSExamples: ARM & MIPSGives good performanceGives good performanceHides trade secretsHides trade secrets
Soft IP:Soft IP:Get HDL code for the componentGet HDL code for the componentExamples: ARCExamples: ARC & & TenTensilicasilicaIntegrate with own or other Integrate with own or other logiclogicLoses some performance Loses some performance
29 November 2002 Embedded Computer Architecture 41
IP CIP Coreore: ARM 926EJ: ARM 926EJ--SS
Core Core ””macrocellmacrocell””CPU core, caches, bus interface, MMCPU core, caches, bus interface, MMUU as a packas a packageage
Instruction sets:Instruction sets:Von Neumann architectureVon Neumann architecture3232--bit ARM v5TE ISAbit ARM v5TE ISA1616--bit THUMB ISAbit THUMB ISAJava Java bytecodesbytecodes via via JazelleJazelle
Processing power:Processing power:Five stage pipeline, scaling toFive stage pipeline, scaling to 180180--270 270 MhzMhz8 8 kBkB icacheicache and 8 and 8 kBkB dcachedcache
Power: 0.2 to 0.9 Power: 0.2 to 0.9 mW/MhzmW/Mhz (P4: >35 (P4: >35 mW/MhzmW/Mhz))
MMU: for MMU: for SymbianSymbian, Windows CE, Linux , Windows CE, Linux
29 November 2002 Embedded Computer Architecture 42
IP Core: MIPS 24kIP Core: MIPS 24k
MacrocellMacrocell like the ARM926like the ARM926Processor, cache, memory interface, MMU, TLBsProcessor, cache, memory interface, MMU, TLBs
Instruction sets:Instruction sets:MIPS16eMIPS16eMIPS32 MIPS32 User extensions possible, via “User extensions possible, via “CorExtendCorExtend””
Performance:Performance:88--stage scalar pipeline, up to 550 stage scalar pipeline, up to 550 MhzMhzConfigurable cache, up to 64kB L1 I$ & D$Configurable cache, up to 64kB L1 I$ & D$Dynamic branch predictionDynamic branch prediction
Aimed at multiprocessor Aimed at multiprocessor SoCsSoCsCache coherency protocol standardCache coherency protocol standardAlmost equivalent to a 1990’s server processorAlmost equivalent to a 1990’s server processor
29 November 2002 Embedded Computer Architecture 43
Example: Ericsson BluetoothExample: Ericsson Bluetooth
ARM for protocol stackARM for protocol stack
Memory for the codeMemory for the code
Special hardware for RF partsSpecial hardware for RF parts
USB and serial connectionsUSB and serial connections
MarketMarketAiming at huge volumesAiming at huge volumesComponent in mobile phones etc.Component in mobile phones etc.
29 November 2002 Embedded Computer Architecture 44
Producing Your ASICProducing Your ASIC
Old way: “Old way: “InhouseInhouse””Build your own fab (everyone did!)Build your own fab (everyone did!)
New way: ”Silicon Foundries”New way: ”Silicon Foundries”Fabs are getting very expensiveFabs are getting very expensiveSpecialized fab companiesSpecialized fab companiesSell manufacturing capacitySell manufacturing capacityExamples: TSMC, UMC, IBM, TIExamples: TSMC, UMC, IBM, TICustomers: Nvidia, ATI, Sun, Cisco Customers: Nvidia, ATI, Sun, Cisco
=Rise of ”fabless” companies=Rise of ”fabless” companies
29 November 2002 Embedded Computer Architecture 45
FullFull--Custom SystemsCustom Systems
Volumes are high enoughVolumes are high enough
Needs are special enoughNeeds are special enough
InIn--house processor designhouse processor design
Examples:Examples:Ericsson APZ (now defunct)Ericsson APZ (now defunct)Cisco Toaster3 network proc (NPU)Cisco Toaster3 network proc (NPU)Ericsson Ericsson FlexASICFlexASIC DSPDSP
29 November 2002 Embedded Computer Architecture 46
Cisco Toaster3Cisco Toaster38 clusters of 2 8 clusters of 2
processors processors eacheach
Each TMC Each TMC is a is a VLIW machine VLIW machine
with 74 bit with 74 bit instructions, 2k instructions, 2k instructions in instructions in local memorylocal memory
Total caTotal capacity: pacity: about 5 GOps, at about 5 GOps, at around 160 Mhzaround 160 Mhz
Two 32Two 32--bit bit ALUs and three ALUs and three
control/data control/data movement units movement units
per TMCper TMC
Image from Microprocessor Report, Oct 2002
29 November 2002 Embedded Computer Architecture 47
Cisco Toaster3Cisco Toaster3
Massive Massive multiprocessingmultiprocessing
16 cores on a chip16 cores on a chip4 chips in serial4 chips in serialRouting:Routing:
10 10 GbpsGbps@ 20 @ 20 Mpackets/sMpackets/s1000 ops per packet 1000 ops per packet passing throughpassing through
29 November 2002 Embedded Computer Architecture 48
FPGAFPGA
Field Programmable Gate ArrayField Programmable Gate ArrayReconfigurable hardware: “soft logic”Reconfigurable hardware: “soft logic”
“Program” is circuit layout“Program” is circuit layoutCan be changed after Can be changed after iniinitial loadtial load
Kilos to Megs of Kilos to Megs of ””gates” availablegates” available
Competitor to Competitor to ASICsASICsMore expensive per unit, More expensive per unit, but no startbut no start--up cost for manufacturingup cost for manufacturingLess flexible, slightly slowerLess flexible, slightly slowerPerfect for lowPerfect for low--volume productsvolume products
FP
GA
29 November 2002 Embedded Computer Architecture 49
FPGA ArchitectureFPGA Architecture
Computation cellsComputation cellsProgrammable Programmable functionfunction
Adder, Logic Adder, Logic funcsfuncs, ..., ...Memory, Registers, ... Memory, Registers, ...
Input/Output cellsInput/Output cells
InterconnectInterconnectReconfigurableReconfigurableProgrammableProgrammable
29 November 2002 Embedded Computer Architecture 50
FPGA ArchitectureFPGA Architecture
Computation cellsComputation cellsLookLook--Up TableUp Table
Arbitrary 4Arbitrary 4--input, input, 11--output functionoutput function
CoarseCoarse--grainedgrainedLots of functionalityLots of functionalitySeveral Several LUTsLUTsPlus flipPlus flip--flops etc.flops etc.
FineFine--grainedgrainedLittle functionalityLittle functionality
ConfigRAM
LUT
29 November 2002 Embedded Computer Architecture 51
FPGFPGA with CPU CoresA with CPU Cores
CPU onCPU on--board FPGAboard FPGAHW accelerate critical HW accelerate critical tasks in FPGA tasks in FPGA fabfabricricData pumps in FPGAData pumps in FPGAControl in CPUControl in CPU
Cool new possibilitiesCool new possibilitiesReconfigure FPGA onlineReconfigure FPGA onlineAdapt to workloadsAdapt to workloads
CPU
29 November 2002 Embedded Computer Architecture 52
Soft CPUs in FPGAsSoft CPUs in FPGAs
Processor in the FPGA fabricProcessor in the FPGA fabric”Soft” processor”Soft” processorSpecial design considerationsSpecial design considerations
ExamplesExamplesAltera NiosAltera NiosXilinx MicroblazeXilinx MicroblazeResearch projectsResearch projects
Västerås ARM clone Västerås ARM clone Leon processor also prototypedLeon processor also prototyped
29 November 2002 Embedded Computer Architecture 53
ExamplesExamples
Altera Apex 20kCAltera Apex 20kC“Volume”“Volume”3030k to 1.5M gatesk to 1.5M gates
XilinxXilinx VirtexVirtex IIII: : “High“High--end”end”11--4 PPC405 cores 4 PPC405 cores (optional)(optional)10M gates10M gatesPrice at about $1000Price at about $1000
AlteraAltera StratixStratix“Advanced”“Advanced”10 10 MbitMbit RAMRAM28 DSP elements28 DSP elements100000 LE100000 LE1300 user I/O pins1300 user I/O pinsOptimized for Optimized for NiosNios
ATMEL FPSLIC: ATMEL FPSLIC: “Low“Low--end”end”AVR 8AVR 8--bit CPUbit CPU5050kk gatesgates
29 November 2002 Embedded Computer Architecture 54
Instruction Instruction SetsSets
29 November 2002 Embedded Computer Architecture 55
IS IS ArchiArchitetecturesctures
New life for old architecturesNew life for old architecturesZ80, 6502, 8051, PIC, …., Z80, 6502, 8051, PIC, …., 6800068000--ColdFireColdFire
New career for failed desktopsNew career for failed desktopsMIPS, PowerPCMIPS, PowerPC
Fresh architecturesFresh architecturesAVR, AVR, dsPICdsPIC, V850, SH, …, V850, SH, …
Digital signal processingDigital signal processingC5xxx, BlackFin, MSA, 56000, Oak, ... C5xxx, BlackFin, MSA, 56000, Oak, ...
29 November 2002 Embedded Computer Architecture 56
Instruction SetsInstruction Sets
Code Size importantCode Size importantVariable instruction lengthVariable instruction length
Common instructions shortCommon instructions shortShort and long branchesShort and long branchesRISC machines with 16RISC machines with 16--64 bit instructions64 bit instructionsLimited immediate operand sizesLimited immediate operand sizesTwoTwo--operand rather than threeoperand rather than three--operandoperand
Compact and powerful instructionsCompact and powerful instructionsPush/pop multiplePush/pop multipleSwitchSwitch
29 November 2002 Embedded Computer Architecture 57
Instruction SetsInstruction Sets
SpecialSpecial--purpose instructionspurpose instructionsDigital Signal ProcessingDigital Signal ProcessingBitBit--manipulationmanipulation
Set bit in memory, test bit in memorySet bit in memory, test bit in memorySeveral memory accesses per instructionSeveral memory accesses per instruction
ApplicationApplication--specificspecificFuzzy logic support (68HC12)Fuzzy logic support (68HC12)Table interpolation (68300)Table interpolation (68300)
Or even designed by customers!Or even designed by customers!
Do useful things=powerfulDo useful things=powerful
29 November 2002 Embedded Computer Architecture 58
Instruction SetsInstruction Sets
Compressed instruction setsCompressed instruction setsARM/Thumb & MIPS16ARM/Thumb & MIPS161616--bit encoding of (parts of) bit encoding of (parts of) 3232--bit instruction setsbit instruction setsPerforms better on narrow busesPerforms better on narrow busesLimitations in ARMLimitations in ARM//Thumb:Thumb:
Only access to 8 registersOnly access to 8 registersNo system operationsNo system operationsNo multiplyNo multiply--accumulateaccumulateNo general conditional execution No general conditional execution
29 November 2002 Embedded Computer Architecture 59
Instruction Sets: Code SizeInstruction Sets: Code Size
Some data on code size:Some data on code size:
Thumb ARM 386 8088 68020 SPARC
eqntott 10608 16768 17640 19106 20542 22256
0.63 1.00 1.05 1.14 1.23 1.33
xlisp 26388 40768 28097 29401 46746 44648
0.65 1.00 0.69 0.72 1.15 1.10
espresso 72596 109923 125686 137194 131854 142752
0.66 1.00 1.14 1.25 1.20 1.30
Source: Microprocessor Report, March 1995
29 November 2002 Embedded Computer Architecture 60
Instruction Sets: Code SizeInstruction Sets: Code Size
ARM Thumb: fixed 16ARM Thumb: fixed 16--bit sizebit sizeSaves 28% compared to 32Saves 28% compared to 32--bit ARMbit ARMRuns 20% slower than 32Runs 20% slower than 32--bit ARMbit ARM
ARM Thumb 2: mixed 16/32ARM Thumb 2: mixed 16/32Saves 26% compared to 32Saves 26% compared to 32--bit ARMbit ARMRuns 2% slower than 32Runs 2% slower than 32--bit ARMbit ARM(Note that some new instructions are (Note that some new instructions are introduced)introduced)
Conclusion: mixed length good!Conclusion: mixed length good!Source: Microprocessor Report, June 2003
29 November 2002 Embedded Computer Architecture 61
Instruction Sets: Code SizeInstruction Sets: Code Size
Compiler makes a differenceCompiler makes a differenceCompiler
ProgramA B C D
1 4316 4929 4974 5214
2 16826 18176 26705 15968
3 1632 2594 3450 3244
4 5514 13804 22694 15000+
Source: IAR Internal Benchmarking
29 November 2002 Embedded Computer Architecture 62
Instruction Sets: SIMDInstruction Sets: SIMD
Many applications see gains Many applications see gains from SIMD/Vector computationfrom SIMD/Vector computation
Add SIMD to regular ISAAdd SIMD to regular ISAMotorola Motorola AltivecAltivecARM SIMD extensionsARM SIMD extensionsMIPS have it tooMIPS have it toox86 MMXx86 MMX--SSESSE--SSE2SSE2--3Dnow!3Dnow!SPARC VISSPARC VIS
29 November 2002 Embedded Computer Architecture 63
Instruction Sets: SIMDInstruction Sets: SIMDTargetTarget
MotorolaMotorolaPPC 7455 (G4+)PPC 7455 (G4+)1 1 GhzGhz
EEMBC EEMBC TelemarkTelemark suitesuiteNetworking suiteNetworking suite
OOTB:OOTB:OutOut--ofof--thethe--boxbox
OPT:OPT:Manually tuned to use Manually tuned to use AltivecAltivec
Overall/Average:Overall/Average:33--4 times speed up 4 times speed up can be expectedcan be expected
35,1
0
1
2
3
4
5
6
7
8
9
10
Auto
corr
1
Convo
lutio
n 1
Bit
allo
c 1
FF
T 1
Vite
rbi 1
OS
PF
1
Route
1
Pack
et 5
12
OOTB OPT
29 November 2002 Embedded Computer Architecture 64
Instruction Sets: DSPInstruction Sets: DSP
Pure Pure DSPsDSPsNot additions to regular Not additions to regular ISAsISAs
Very specialized for DSP workVery specialized for DSP workKnown & narrow class of problemsKnown & narrow class of problemsOptimize for particular algorithmsOptimize for particular algorithms
CategoriesCategoriesVLIW vs. Regular VLIW vs. Regular Fixed vs. Floating PointFixed vs. Floating PointStationary vs. MobileStationary vs. Mobile
29 November 2002 Embedded Computer Architecture 65
Instruction Sets: DSPInstruction Sets: DSP
TI C64xxTI C64xxFixedFixed--point, 8point, 8--way VLIWway VLIW700700--1000 1000 MhzMhz, “Fastest DSP”, “Fastest DSP”Stationary applicationsStationary applications
TI C55xxTI C55xxSingle pipeline, complex instructionsSingle pipeline, complex instructionsUp to 300 Mhz approx.Up to 300 Mhz approx.Mobile phonesMobile phones
29 November 2002 Embedded Computer Architecture 66
Instruction Sets: DSPInstruction Sets: DSP
Assume very regular workloadsAssume very regular workloadsZeroZero--overhead loop instructionsoverhead loop instructionsBuilt to wade through large data setsBuilt to wade through large data sets
Register setsRegister setsAccumulators (often 40 bits)Accumulators (often 40 bits)Data registers (often 16 bits)Data registers (often 16 bits)Address registers (16 to 32 bits)Address registers (16 to 32 bits)
Addressing modesAddressing modesIndex registersIndex registersPost & Post & preincrementpreincrementBitBit--reverse addressingreverse addressingGoal: more parallelizable work per instructionGoal: more parallelizable work per instruction
29 November 2002 Embedded Computer Architecture 67
Instruction Sets: DSPInstruction Sets: DSP
Example instructions from C55:Example instructions from C55:”Finite impulse response filter””Finite impulse response filter”
FIRSADD Xmem, Ymem, Cmem, ACx, ACyFIRSADD Xmem, Ymem, Cmem, ACx, ACy
OperationOperation::ACy = ACy + (ACx * Cmem)ACy = ACy + (ACx * Cmem)ACx = (Xmem << #16) + (Ymem << #16)ACx = (Xmem << #16) + (Ymem << #16)
”Conditional add or sub””Conditional add or sub”ADDSUBCC Smem, ACx, TCx, ACyADDSUBCC Smem, ACx, TCx, ACy
Operation:Operation:If If TCxTCx = 1, then = 1, then ACyACy = = ACxACx + (+ (SmemSmem << #16)<< #16)If If TCxTCx = 0, then = 0, then ACyACy = = ACxACx -- ((SmemSmem << #16)<< #16)
Cmem, Xmem, Ymem: memory accesses +
address updating
CmemCmem, , XmemXmem, , YmemYmem: : memory accesses + memory accesses +
address updatingaddress updating
C55 DSP has three independent data buses, X, Y, and C
C55 C55 DSP has three DSP has three independent data independent data buses, X, Y, and Cbuses, X, Y, and C
Special condition register
Special Special condition condition registerregister
29 November 2002 Embedded Computer Architecture 68
Instruction Sets: ConfigureInstruction Sets: Configure
Configurable instruction setsConfigurable instruction setsAdapt to needs of applicationAdapt to needs of applicationUser can specialize the processorUser can specialize the processorLess waste on generalityLess waste on generalityFast evolution of instruction setsFast evolution of instruction sets
Traditionally:Traditionally:Chip manufacturers determine Chip manufacturers determine instruction sets aimed at some nicheinstruction sets aimed at some nicheSlow evolution of instruction setsSlow evolution of instruction sets
29 November 2002 Embedded Computer Architecture 69
Instruction Sets: ConfigureInstruction Sets: Configure
SubsetSubsettingtingThere is a limited and predefined set of There is a limited and predefined set of instructions availableinstructions availableEasy to compile for: restrict code Easy to compile for: restrict code gengenRemove instructions to simplify coreRemove instructions to simplify core
AdditionAdditionFFreedomreedom to to invent instructionsinvent instructionsTool chain: assemblyTool chain: assembly, C compilers, C compilersGenuine development of Genuine development of ISAsISAs
29 November 2002 Embedded Computer Architecture 70
Configurable Instruction SetsConfigurable Instruction Sets
Tight integration:Tight integration:Add to regular pipelineAdd to regular pipelineAdditional functional unitsAdditional functional unitsAdding fineAdding fine--grained instructionsgrained instructions
Loose integration:Loose integration:Coprocessor interfaceCoprocessor interfaceSlower communicationSlower communicationOffloading of macroOffloading of macro--scale tasksscale tasksMethod to invoke accelerator circuitsMethod to invoke accelerator circuits
29 November 2002 Embedded Computer Architecture 71
Configurability TrendConfigurability Trend
PioneersPioneersTensilicaTensilica XtensaXtensaArc ArctangentArc ArctangentConfigurability as key selling pointConfigurability as key selling point
Added to general architecturesAdded to general architecturesMIPS: “MIPS: “CorExtendCorExtend””PowerPC: “PowerPC: “BookEBookE ASU”ASU”Usually less tight integrationUsually less tight integration
29 November 2002 Embedded Computer Architecture 72
Benefit of ConfigurabilityBenefit of ConfigurabilityTargetTarget
XtensaXtensa IIIIII200 200 MhzMhz
EEMBC EEMBC TelemarkTelemark suitesuiteNetworking suiteNetworking suite
OOTB:OOTB:OutOut--ofof--thethe--boxbox25k gate core25k gate core
OPT:OPT:Tuned codeTuned code25k base core gates25k base core gates18k extra 18k extra instrinstr gatesgates100k DSP 100k DSP coproccoproc37k 37k configconfig gatesgates
SpeedupsSpeedups
Benchmark OOTB OPT Telemark overall 1 37
Autocorr 1 9
Convolution 1 1249
Bit alloc 1 34
FFT 1 24 Viterbi GSM 1 14
29 November 2002 Embedded Computer Architecture 73
ConfConfiguration Toolsiguration Tools
instruction set choices
Gate and memory size
counters
29 November 2002 Embedded Computer Architecture 74
Memory Memory SystemsSystems
29 November 2002 Embedded Computer Architecture 75
Microcontroller MemoryMicrocontroller Memory
RAM:RAM:Small (32 bytes and up)Small (32 bytes and up)Stacks, variables, loaded codeStacks, variables, loaded code
ROM ROM (or FLASH)(or FLASH)::Large (2kB and up)Large (2kB and up)Programs, constant dataPrograms, constant data
NonNon--volatile memoryvolatile memorySettings, writable code areasSettings, writable code areas
29 November 2002 Embedded Computer Architecture 76
Typical Memory TypesTypical Memory Types
CPUCore
Icache
ROM / FLASH
Dcache
TCMSpecial
memory, like CAM
RAM
EEPROM
L2 CacheExternal FLASH
External RAM
TCM
29 November 2002 Embedded Computer Architecture 77
CachesCaches
Used on highUsed on high--end partsend parts3232--bit, 64bit, 64--bit, bit, DSPsDSPs
Not like desktop cachesNot like desktop cachesSmaller, usually only single levelSmaller, usually only single levelOften high (128 ways) assocOften high (128 ways) assoc
Due to lockingDue to locking
LockableLockableSets or lines can be locked in cacheSets or lines can be locked in cacheImprove predictabilityImprove predictability
Icache
Dcache
L2 Cache
29 November 2002 Embedded Computer Architecture 78
TightlyTightly--Coupled MemoryCoupled Memory
Used on highUsed on high--end partsend partsHolds data and/or instructionsHolds data and/or instructions
Instead of/in addition to cachesInstead of/in addition to cachesProgrammerProgrammer--controlledcontrolledFast & close like cachesFast & close like cachesIn memory map, or tagged like cachesIn memory map, or tagged like caches
Multiple banksMultiple banksBetter bandwidthBetter bandwidthWork in one, DMA data to/from otherWork in one, DMA data to/from other
Special memories:Special memories:ContentContent--addressable memory (CAM)addressable memory (CAM)Needs for particular applicationsNeeds for particular applications
TCM
TCM
CAM
29 November 2002 Embedded Computer Architecture 79
OOnn--Chip RAMChip RAM
HighHigh--end partsend partsUsually cachedUsually cachedFaster & cheaper than offFaster & cheaper than off--chip memorychip memory
LowLow--end partsend partsOnly data memory availableOnly data memory availableSpecial “zeroSpecial “zero--page” memorypage” memory
ZeroZero--page on 8page on 8--bit bit MCUsMCUsSmall memory with singleSmall memory with single--cyclecycle accessaccessUsually 8Usually 8--bit index, contains 256 bytesbit index, contains 256 bytesSmall, fast, instructions access the memorySmall, fast, instructions access the memoryOften useable as extension to register setOften useable as extension to register set
RAM
29 November 2002 Embedded Computer Architecture 80
FLASH/ROMFLASH/ROM
Code storage onCode storage on--chipchipFLASH: FLASH:
Speed like regular RAMSpeed like regular RAMRewritable, typically 1000 times or moreRewritable, typically 1000 times or more
ROM: ROM: Must be put in silicon masksMust be put in silicon masksLonger turnLonger turn--around timearound timeGuaranteed not to changeGuaranteed not to change
FLASH is replacing ROMs, fastFLASH is replacing ROMs, fast
ROM / FLASH
External FLASH
29 November 2002 Embedded Computer Architecture 81
EEPROMEEPROM””ElectricallyElectrically--Erasable Programmable ReadErasable Programmable Read--Only Memory”Only Memory”
Only writable persistent memory until FLASH Only writable persistent memory until FLASH appearedappearedInfinitely rewritableInfinitely rewritable
Store persistent dataStore persistent dataUser settingsUser settingsEncryption keysEncryption keysPhone numbers etc.Phone numbers etc.
Being replaced with FLASHBeing replaced with FLASHFaster & easier to writeFaster & easier to writeCheaper to manufacture, higher capacityCheaper to manufacture, higher capacity
EEPROM
29 November 2002 Embedded Computer Architecture 82
Memory ArchitectureMemory Architecture
Narrow busesNarrow busesSaves silicon area, power, complexitySaves silicon area, power, complexityEEspecially to offspecially to off--chip memorychip memory(Not true for some high(Not true for some high--performance parts)performance parts)
Small registers Small registers == small pointerssmall pointers1616--bit register can only hold 16bit register can only hold 16--bits bits Extend addressing using tricksExtend addressing using tricks
Banks (separate bank register)Banks (separate bank register)Segments (base + offset)Segments (base + offset)Memory remapping (virtual memory)Memory remapping (virtual memory)
29 November 2002 Embedded Computer Architecture 83
Hierarchy of PointersHierarchy of Pointers
FarFar
NearNear
TinyTiny
8 b
its
24 b
its
16 b
its
= on= on--chip zerochip zero--pagepage
88--bit/16bit/16--bitbit
Design for Design for small pointerssmall pointers
Visible to Visible to programmerprogrammer
Data placement Data placement Pointer typesPointer typesLinker & Linker & compilercompiler
29 November 2002 Embedded Computer Architecture 84
Memory ArchitectureMemory Architecture
HarvardHarvardTwo or more address spaces Two or more address spaces
ProgramProgramDataDataAccompanied by physical separationAccompanied by physical separation
Sometimes even more dividedSometimes even more divided
NULL pointerNULL pointer implementation?implementation?All addresses are valid... All addresses are valid...
29 November 2002 Embedded Computer Architecture 85
Memory ArchitectureMemory Architecture
Example: ATMEL AVR 8Example: ATMEL AVR 8--bbit MCUit MCU
FarFar
NearNear
TinyTiny256 B256 BRAMRAM
RegistersRegistersI/OI/O
Code spaceCode space
DDataata spspaceace
Read constants?Read constants?Different instructions!Different instructions!Very slow processVery slow processHard to compile forHard to compile forCopy to RAM?Copy to RAM?
FarFar
NearNear
29 November 2002 Embedded Computer Architecture 86
Banked MemoryBanked Memory
Extend addressing beyond N bitsExtend addressing beyond N bitsLike 8086/80286 segmentsLike 8086/80286 segments
Concept:Concept:Separate memoriesSeparate memoriesMapped to same set of addressesMapped to same set of addressesOne memory at a time accessibleOne memory at a time accessibleEasier for code than for dataEasier for code than for data
Selecting banks:Selecting banks:Write value in bankWrite value in bank--switch registerswitch register
29 November 2002 Embedded Computer Architecture 87
Bank 0Bank 0
_bank0
__bank
_bank1 _bank2 _bank3
8 bi
ts
16 b
its
Code Code memorymemory
16 b
its
__constptr Synthetic pointer to any
data bank
Hardware pointer = efficient
Bank 1Bank 1 Bank 2Bank 2 Bank 3Bank 3
Banked MemoryBanked Memory
Example: Microchip PIC familyExample: Microchip PIC family
29 November 2002 Embedded Computer Architecture 88
PowerPowerAspectsAspects
29 November 2002 Embedded Computer Architecture 89
Why is Power Important?Why is Power Important?
BatteryBattery--powered applicationspowered applicationsLonger battery life is very desirableLonger battery life is very desirable
Automotive applicationsAutomotive applicationsElectronics consumes up 30% of fuel!Electronics consumes up 30% of fuel!
LowLow--maintenance applicationsmaintenance applicationsPower=heat=cooling=moving partsPower=heat=cooling=moving parts
Server farmsServer farmsCooling & electricity Cooling & electricity are big costsare big costs
29 November 2002 Embedded Computer Architecture 90
CMOS PowerCMOS Power
Power Power = area*clock*voltage= area*clock*voltage22
Area: transistors that are switchingArea: transistors that are switchingClock: speed of switchingClock: speed of switchingVoltage: to keep runningVoltage: to keep running
Save power by minimizingSave power by minimizingClock speedClock speedActive areaActive areaFeed voltageFeed voltage
29 November 2002 Embedded Computer Architecture 91
Area ReductionArea Reduction
Simpler chips use less powerSimpler chips use less power88--bit CPUsbit CPUsSimple RISC like ARMSimple RISC like ARMVLIW instead of superscalarVLIW instead of superscalar
Turn off inactive unitsTurn off inactive unitsPipelines that are not usedPipelines that are not usedOnOn--chip memory and cacheschip memory and cachesSleep/Nap/Idle modes on CPUSleep/Nap/Idle modes on CPU
Remove unnecessary featuresRemove unnecessary features
29 November 2002 Embedded Computer Architecture 92
Clock SpeedsClock Speeds
Clock and voltage relatedClock and voltage relatedHigher operating frequency Higher operating frequency requires higher voltagerequires higher voltage
Use lower clock speedsUse lower clock speedsReduce speed until app barely worksReduce speed until app barely works
Use more processorsUse more processors1/2 speed = 1/4 power1/2 speed = 1/4 power2 CPUs @ 100 2 CPUs @ 100 MhzMhz = 1 CPU @ 200 = 1 CPU @ 200 MhzMhz, , but requires half the powerbut requires half the power
29 November 2002 Embedded Computer Architecture 93
Dynamic Voltage ScalingDynamic Voltage Scaling
Adjust CPU speed to workloadAdjust CPU speed to workloadReduce operating voltage when clock Reduce operating voltage when clock speed is reducedspeed is reducedCubic power savings possible!Cubic power savings possible!Analyze load to determine speed Analyze load to determine speed More advanced than sleep modesMore advanced than sleep modes
Special hardware required:Special hardware required:TransmetaTransmeta CrusoeCrusoe was was a a pioneerpioneerIntel Intel XscaleXscale, getting , getting commoncommon
29 November 2002 Embedded Computer Architecture 94
Power, Power, VoltageVoltage, , FrequencyFrequency
0
200
400
600
800
1000
1200
1400
1600
1800
400
600
800
1000
1200
Voltage (mV)
Power (mW)
Samsung HallaSamsung HallaARM 1020E ARM 1020E corecore66--stage stage pipeline (!)pipeline (!)0.13 0.13 um um process process Clock: Clock: 400 400 Mhz Mhz to 1.2 to 1.2 GhzGhz
(source: Microprocessor Report, Oct 16, 2002)
3x clock freq, 9x power!
29 November 2002 Embedded Computer Architecture 95
Other FactorsOther Factors
Manufacturing process:Manufacturing process:Smaller features=lower powerSmaller features=lower power
(0.13 micron mobile PIII, for example)(0.13 micron mobile PIII, for example)
Tweak process for lower powerTweak process for lower power
Development effort:Development effort:Tweak the lowTweak the low--level chip layoutlevel chip layout
(Classic (Classic StrongARMStrongARM))“Assembly language programming”“Assembly language programming”
Cannot be synthesized efficientlyCannot be synthesized efficiently= Not possible for IP blocks= Not possible for IP blocks
29 November 2002 Embedded Computer Architecture 96
System IssueSystem Issue
Much more than CPUMuch more than CPUDisplays, Displays, LEDsLEDs, … , … Memory, Disks, …Memory, Disks, …Radio interfaces, Networks, …Radio interfaces, Networks, …
Turn off unused peripheralsTurn off unused peripheralsGSM phones: 300 hours standby GSM phones: 300 hours standby vs. 60 minutes talk time vs. 60 minutes talk time Ericsson: reduce Ericsson: reduce frequency of LED blinkfrequency of LED blink
29 November 2002 Embedded Computer Architecture 97
Memory & PowerMemory & Power
Large area=high power:Large area=high power:Use smallest possible memoryUse smallest possible memory
Talking to DRAM is expensiveTalking to DRAM is expensiveUse onUse on--chip SRAM/ROMchip SRAM/ROMReduce external memory activityReduce external memory activityUse caches to keep activity internalUse caches to keep activity internal
Use energyUse energy--efficient efficient RAMsRAMsLowLow--power DRAM is coming!power DRAM is coming!RAMBUS is horribleRAMBUS is horrible
29 November 2002 Embedded Computer Architecture 98
Memory & PowerMemory & Power
codememory
datamemory
energy(nJ)
ratio
off-chip off-chip 115.8 100%
off-chip on-chip 51.6 44.6%
on-chip off-chip 76.5 66.1%
on-chip on-chip 16.4 14.2%
Power for a LOAD instruction on an Power for a LOAD instruction on an ARM7 development boardARM7 development board
Source: Compilation Techniques for Energy-, Code-Size-, and Run-Time Efficient Embedded Software (Marwedel et al 2001)
29 November 2002 Embedded Computer Architecture 99
Future Issues: StaticFuture Issues: Static
Dynamic PowerDynamic PowerDissipated when circuits activeDissipated when circuits activeDiscussion so farDiscussion so farDominant down to 0.13µ Dominant down to 0.13µ
Static Power Static Power ”Leakage current””Leakage current”Becoming significant at < 0.13µ Becoming significant at < 0.13µ Much harder to reduceMuch harder to reduce
Major problem looming!Major problem looming!
29 November 2002 Embedded Computer Architecture 100
Closing Closing RemarksRemarks
29 November 2002 Embedded Computer Architecture 101
This is where the action is!This is where the action is!
Fragmented marketFragmented marketNo dominant big player like PCsNo dominant big player like PCsIncredibly wide span of productsIncredibly wide span of products
TailorTailor--made, not massmade, not mass--producedproducedEverybody searches for perfect fitEverybody searches for perfect fit
High innovation in comp archHigh innovation in comp arch
Large number of new playersLarge number of new players
29 November 2002 Embedded Computer Architecture 102
AbbreviationsAbbreviations
DSP DSP Digital Signal ProcessorDigital Signal Processor
NPUNPUNetwork Processing UnitNetwork Processing Unit
MCUMCUMicrocontroller UnitMicrocontroller Unit
ASICASICApplicationApplication--Specific Integrated CircuitSpecific Integrated Circuit
FPGAFPGAFieldField--Programmable Gate ArrayProgrammable Gate Array