low-energy system designbwrcs.eecs.berkeley.edu/faculty/jan/jansweb... · 2000. 1. 3. · why...
Post on 18-Jan-2021
0 Views
Preview:
TRANSCRIPT
Low-Energy System DesignLow-Energy System Design
Jan M. RabaeyJan M. Rabaey
BWRCBWRC
University of California @ BerkeleyUniversity of California @ Berkeley
http://http://bwrcbwrc..eecseecs..berkeleyberkeley..eduedu
Why Low-Energy Design?Why Low-Energy Design?A holistic perspectiveA holistic perspective
Energy = upper bound on the amount of availableEnergy = upper bound on the amount of availablecomputationcomputation
–– Total Energy of Milky Way Galaxy: 10Total Energy of Milky Way Galaxy: 1059 59 JJ
–– Minimum switching energy for digital gateMinimum switching energy for digital gate(1 electron@100 mV): 1.6 10(1 electron@100 mV): 1.6 10-20-20 J (limited by thermal J (limited by thermalnoise)noise)
–– Upper bound on number of digital operations: 6 10Upper bound on number of digital operations: 6 107878
–– Operations/year performed by 1 billion 100 MOPSOperations/year performed by 1 billion 100 MOPScomputers: 3 10computers: 3 102424
–– Energy consumed in 180 years assuming a doublingEnergy consumed in 180 years assuming a doublingof computational requirements every year.of computational requirements every year.
Why Low-Energy Design?Why Low-Energy Design?More down to earthMore down to earth
ll Projected energy per digital operation (2004):Projected energy per digital operation (2004):5050 pJ pJ
ll Lithium-Ion: 220 Watt-hours/kg == 800Lithium-Ion: 220 Watt-hours/kg == 800Joules/Joules/grgr
ll At 50At 50 pJ pJ/operation:10 /operation:10 teraOpsteraOps//grgr!!–– Equivalent to continuous operation at 100 MOPSEquivalent to continuous operation at 100 MOPS
for 30 hours (or average power dissipation of 6for 30 hours (or average power dissipation of 6mWmW))
The Battery LimitationThe Battery Limitation
The Distributed Approach to InformationThe Distributed Approach to InformationProcessingProcessing
Source: Richard Newton
The Changing MetricsThe Changing Metrics
Flexibility
Power
Cost
Performance as a Functionality Constraint(“Just-in-Time Computing”)
Why Low-Energy Design?Why Low-Energy Design?The wired perspectiveThe wired perspective
ll Electronics becoming sizable chunk of worldsElectronics becoming sizable chunk of worldsenergy budget (> 10% in US)energy budget (> 10% in US)–– Major impact on building cost (HVAC)Major impact on building cost (HVAC)–– Important load on environmentImportant load on environment
ll Americans spend more than 3 B$ each year toAmericans spend more than 3 B$ each year topower their home electronics when they arepower their home electronics when they areswitched off! (Source: Energy Star®)switched off! (Source: Energy Star®)
No relief in sight ...No relief in sight ...
0
10
20
30
40
50
60
70
Wat
ts/c
m2
386486
Pentium (R)
Pentium Pro (R)
1
10
100
1,000
10,000
1985 1990 1995 2000 2005 2010
Po
wer
(W
atts
)
100-2,000W
Due to 30% Vdd scaling
Contradictory to common beliefs that the problem is solved
Source: Intel
Surpassed hot-plate power density (10 W/cm2) in 0.6 µm technologyMajor challenge to system cost and reliability
Summary and PerspectiveSummary and Perspective
ll Power and/or EnergyPower and/or Energy have become dominant have become dominantdriversdrivers–– Cost and reliability limiting factor in wall-pluggedCost and reliability limiting factor in wall-plugged
applicationsapplications
–– Enabler for wide-spread use of distributedEnabler for wide-spread use of distributedcomputing and data accesscomputing and data access
ll Major inroads only possible when consideredMajor inroads only possible when consideredfrom a systems viewpointfrom a systems viewpoint
ll Energy reduction requires joint optimizationEnergy reduction requires joint optimizationprocess between application andprocess between application andimplementationimplementation
A Case StudyA Case StudyThe Smart HomeThe Smart Home
SecurityEnvironment monitoring and controlObject taggingIdentification
Dense network of Dense network of sensor and monitor nodessensor and monitor nodes
A Case StudyA Case StudyPicoRadioPicoRadio
Sensor and monitor networks for the Smart HomeSensor and monitor networks for the Smart Home
Properties:Properties:ll Stringent requirements on size (< 10 cmStringent requirements on size (< 10 cm33) and cost (< 25$) per node) and cost (< 25$) per node
ll Wired solution too labor intensive; prevents penetration andWired solution too labor intensive; prevents penetration andexpansionexpansion
ll Energy consumption per node must be kept to an absolute minimum;Energy consumption per node must be kept to an absolute minimum;time-between-recharging > yearstime-between-recharging > years
ll System should be self-assembling; and operation should be foolproofSystem should be self-assembling; and operation should be foolproof
Specifications:Specifications:ll Large numbers of nodes (between 0.05 andLarge numbers of nodes (between 0.05 and 1 nodes/m1 nodes/m22))
ll Limited operation range of network (maximum 50-100 m)Limited operation range of network (maximum 50-100 m)
ll Low data rates (1 - 10 Low data rates (1 - 10 kbitkbit/sec)/sec)
Two Design ExtremesTwo Design Extremes
ll Option 1: A generic wireless network sharedOption 1: A generic wireless network sharedwith Multimedia Networking (stream based)with Multimedia Networking (stream based)and Internet Data Browsing (burst mode)and Internet Data Browsing (burst mode)–– Example: 802.11Example: 802.11
Incompatibility between requirements makes itIncompatibility between requirements makes itimpossible to reach stated energy goalsimpossible to reach stated energy goals
ll Option 2: A dedicated hardwired andOption 2: A dedicated hardwired andfunction-specific sensor nodefunction-specific sensor node–– Example: Current wireless security systemsExample: Current wireless security systems
Only optimized for one operation point; sub-optimalOnly optimized for one operation point; sub-optimalin real environment with changing conditionsin real environment with changing conditions
Tra
nsm
it P
ow
er
-70dBm
-30dBm
10dBm
100 Kbps
50dBm
90dBm
Distance1m 10m 100m 1Km 10Km
Tra
nsc
eive
r P
ow
er
50dBm
90dBm
10dBm
-30dBm
-70dBm
Assumes R-4 loss due to ground wave(@ 1 GHz)
Bluetooth goal • 700 Kbps• 10 m• 1 mW Tx
PicoRadioPicoRadio Energy Optimization Energy OptimizationThe Cost of CommunicationThe Cost of Communication
1 megawattfor 100Kbps!
Tra
nsm
it P
ow
er
-70dBm
-30dBm
10dBm
100 Kbps
50dBm
90dBm
Distance1m 10m 100m 1Km 10Km
Tra
nsc
eive
r P
ow
er
50dBm
90dBm
10dBm
-30dBm
-70dBm
Assumes R-4 loss due to ground wave(@ 1 GHz)
PicoRadioPicoRadio Energy Optimization Energy OptimizationThe Varying Communication DistanceThe Varying Communication Distance
Communication versus ComputationCommunication versus Computation
ll Computation cost (2004): 60 Computation cost (2004): 60 pJpJ/operation/operation
ll Communication cost (thermal energy minimum):Communication cost (thermal energy minimum):–– 100 m distance: 20100 m distance: 20 nJ nJ/bit @ 1.5 /bit @ 1.5 GHzGHz
–– 10 m distance: 210 m distance: 2 pJ pJ/bit @ 1.5 /bit @ 1.5 GHzGHz
ll Computation versus CommunicationsComputation versus Communications–– 100 m distance: 300 operations == 1bit100 m distance: 300 operations == 1bit
–– 10 m distance: 0.03 operation == 1bit10 m distance: 0.03 operation == 1bit
Computation/Communication requirements varyComputation/Communication requirements varywith distance, data type, and environmentwith distance, data type, and environment
Requires Adaptive and Time-Varying Solution
Communicating over Long DistancesCommunicating over Long DistancesMulti-hop NetworksMulti-hop Networks
Source
Dest
Example:Example:
ll 1 hop over 50 m1 hop over 50 m1.25 1.25 nJnJ/bit/bit
ll 5 hops of 10 m each5 hops of 10 m each5 5 ×× 2 2 pJpJ/bit = 10 /bit = 10 pJpJ/bit/bit
ll Multi-hop reducesMulti-hop reducestransmission energy by 125!transmission energy by 125!(ignoring overhead and cost of(ignoring overhead and cost ofretransmissions)retransmissions)
Energy-Optimizing Multi-hop NetworksEnergy-Optimizing Multi-hop Networks
Optimal number of hops needed for free spaceOptimal number of hops needed for free spacepath loss.path loss.
γ
α
β10=where and ceil is the ceiling function
( )γ10Totaloptimal distceilhops =fs
A constant relating the energy required to transmit a bitsuccessfully for a given set of parameters.
A constant relating the computational costfor receiving the bitαα
ββ
OPNETOPNETNetwork SimulatorNetwork Simulator
Simulations consist of a network of nodes which are definedby its Node Model. The behavior of each block within theNode Model is then described by a state transition diagramdefined as the Process ModelOpNet also features an Analysis Viewer to quickly evaluatedata with custom or existing data filters. Less time extractingdata and writing post-processing scripts!!!OpNet also has editors for PDFs, Packet Format,DataProbes, Antenna Patterns, Modulation Curves, Link Models,and has animation capability to visualize dynamic behavior.
Analysis ViewerAnalysis Viewer
Network ModelNetwork Model Node ModelNode Model Process ModelProcess Model
Example:Table-Driven Network RoutingExample:Table-Driven Network Routing
AssumptionsAssumptions• Max # nodes represented in single update = 50• Checkerboard Placement• Mobiles Enter Stable Network Simultaneously• No Packet Loss• Num of Nodes 50 - 55• Update to Neighbors Only• DSVD Routing
Network maintains routing information proactivelyNetwork maintains routing information proactively
Additional Updates RequiredAdditional Updates Required
Time to disseminate New InfoTime to disseminate New Info
Other options:Other options:Source Initiated or Reactive RoutingSource Initiated or Reactive Routing
Adding the Activity FactorAdding the Activity Factor
ll Energy = Energy = activityactivity * cost * intensity_ * cost * intensity_levellevelnn
ll Activity in sensor networks is low and randomActivity in sensor networks is low and randomMajor opportunity for power managementMajor opportunity for power management
ll Best addressed at the Best addressed at the media-accessmedia-access (MAC) (MAC)layer of the protocol stacklayer of the protocol stack–– Non-active nodes should be in sleep mode asNon-active nodes should be in sleep mode as
much as possiblemuch as possible
–– Media-access should be such that collisions andMedia-access should be such that collisions andretransmissions are minimizedretransmissions are minimized
Energy-Efficient Media AccessEnergy-Efficient Media Access
Example: Collision-sense multiple access (CSMA)Example: Collision-sense multiple access (CSMA)with with overlayedoverlayed locally-synchronized TDMA framing locally-synchronized TDMA framing
RX/TX in sleep mode time
Sender 1
Sender 2
CSMA
Evaluation tools: statistical analysis, performance simulation (NS, Evaluation tools: statistical analysis, performance simulation (NS, OptnetOptnet))
Functional specification of protocol stack in VCC® (Cadence)CFSM Model of Computation
Implementation in hard- and software
Source(xs,ys)
Dest(xd,yd)
Communication Request(Data type, BW, latency, BER)
Physical Layer(Band,Modulation)
ll Based on well-defined abstraction layersBased on well-defined abstraction layers
ll Step-wise refinement (partitioning, resource mappingStep-wise refinement (partitioning, resource mappingand sharing) enables correctness verificationand sharing) enables correctness verification
ll Automatic synthesis of adaptive protocols in hard-Automatic synthesis of adaptive protocols in hard-and softwareand software
Refinement-based Network DesignRefinement-based Network DesignMethodologyMethodology
Network layer(Point-to-Point, multi-hop, star)
Media Access Layer(T-C-F-DMA)
The Implementation ChallengeThe Implementation Challenge
System-on-a-ChipSystem-on-a-Chip
RAM
500 k Gates FPGA+ 1 Gbit DRAMPreprocessing
Multi-
SpectralImager
µµCsystem+2 GbitDRAMRecog-nition
Ana
log
64 SIMD ProcessorArray + SRAM
Image Conditioning100 GOPS
ll Embedded applications whereEmbedded applications wherecost, performance, and energycost, performance, and energyare the real issues!are the real issues!
ll DSP and control intensiveDSP and control intensive
ll Mixed-modeMixed-mode
ll Combines programmable andCombines programmable andapplication-specific modulesapplication-specific modules
ll Software plays crucial roleSoftware plays crucial role
SOC SOC anno anno 20102010
The System-on-a-Chip NightmareThe System-on-a-Chip Nightmare
“Femme se“Femme se coiffant coiffant””Pablo Pablo Ruiz PicassoRuiz Picasso19401940
The System-on-a-Chip NightmareThe System-on-a-Chip Nightmare
Bridge
DMA CPU DSP
MemCtrl.
MPEG
C I O O
System Bus
PeripheralBus
Control Wires
CustomInterfaces
The “Board-on-a-Chip”Approach
Courtesy of Sonics, Inc
System-on-a-ChipSystem-on-a-ChipA Renaissance in DesignA Renaissance in Design
ApplicationsApplicationsMultimediaConsumerCommunications
ImplementationImplementationFabricsFabricsSilicon substrateSilicon fabrics
DesignDesignMethodologyMethodologyHard+Soft
Aart De GeusDAC’99
ConvergenceConvergence
The Single-Chip The Single-Chip PicoNodePicoNode
Physical+ RF
Mac/Data Link
NetworkApplicationDataData
Data Acquisition
DataEncoding
DataFormatting
Mod/Demod
UI
ControlControl
Synchron-ization
SlotAllocation
CallSetup
Data and Time Granularity
nsecµ secmsecsecbitspacketsstreamssource data
RadioRadio
Yet needsYet needs adaptivity adaptivity and flexibility at all levels of granularity and flexibility at all levels of granularity
The Software RadioThe Software Radio
A/D ConverterD/A Converter
DSP
ll Idea: Digitize (Idea: Digitize (widebandwideband) signal at antenna and use) signal at antenna and usesignal processing to extract desired signalsignal processing to extract desired signal
ll Leverages of advances in technology, circuit design,Leverages of advances in technology, circuit design,and signal processingand signal processing
ll Software solution enables flexibility and Software solution enables flexibility and adaptivityadaptivity,,but at huge price in power and costbut at huge price in power and cost
ll 16 bit A/D converter at 2.2 16 bit A/D converter at 2.2 GHz GHz dissipates 1 to 10 Wdissipates 1 to 10 W
The Mostly Digital RadioThe Mostly Digital Radio
DigitalBasebandReceiver
RF input(fc = 2GHz)
LNA
cos[2π (2GHz)t]
RF filter
chip boundary
I (50MS/s)
Q (50MS/s)
A/D
A/D
sin[2π (2GHz)t]
Analog Digital
Architectural ChoicesArchitectural Choices
µ P
Prog Mem
MACUnit
AddrGenµ P
Prog Mem
µP
Prog Mem
Satellite
ProcessorDedicated
Logic
Satellite
Processor
Satellite
Processor
GeneralPurpose
µP
Software
DirectMapped
Hardware
HardwareReconfigurable
Processor
ProgrammableDSP
Fle
xibi
lity
1/Efficiency
An Architectural RenaissanceAn Architectural Renaissance
Embedded ARM-8Microprocessor
(Hard IP)
Tensilica Synthesized andConfigurable µProcessor
(Soft IP)
Courtesy of ARM, Tensilica Inc
An Architectural RenaissanceAn Architectural Renaissance
DSPCore
Memory
MCUCore
WCDMA
CDMAIS-136
GSM
Fixed logic…
MorphICsMorphICs Dynamically Reconfigurable Architecture (DRA) Processor Dynamically Reconfigurable Architecture (DRA) Processor
DRA ProcessorDRA Processor
Software programmableHardware reconfigurable
Software
Download
WCDMA (mode, param)
CDMA (mode, param)
WTDMA (mode, param)
TDMA (mode, param)
• SIM Card• Handset Memory• POS Programming• Network Download• OTA Download
Realizes cost, size and power targets similar to traditional core+hardwired
An Architectural RenaissanceAn Architectural Renaissance
Philips Nexperia NX-2700A programmable HDTVmedia processor
Combines Trimedia VLIW withConfigurable media co-processors
Implementation Fabrics forImplementation Fabrics forData ProcessingData Processing
Signal Update BlockAcquisition andTiming Recovery Signal Update Block
AdaptivePilot
Correlator
AdaptiveData
Correlator
C0 CL-1
Digital Baseband
Sk
...
Data Out
Receiver
ChannelCoefficientEstimates
AdaptivePilot
Correlator
Dat
a In
300 million multiplications/sec357 million add-sub’s/sec
Adaptive Multi-User DetectionAdaptive Multi-User DetectionA Direct Mapping ApproachA Direct Mapping Approach
Correlator
Power and area are dominated by MACs and multipliesOnly 36% of power of DSP-processor solution going into arithmetic
The Energy-Flexibility GapThe Energy-Flexibility Gap
Embedded ProcessorsSA1100.4 MIPS/mW
ASIPsDSPs 2 V DSP: 3 MOPS/mW
DedicatedHW
Flexibility (Coverage)
Ene
rgy
Eff
icie
ncy
MO
PS/
mW
(or
MIP
S/m
W)
0.1
1
10
100
1000
ReconfigurableProcessor/Logic
Pleiades10-80 MOPS/mW
Implementation Fabrics forImplementation Fabrics forProtocolsProtocols
BU
FMemory
Slot_Set_Tbl2x16
addr
BU
F
slot_set<31:0>
Slot_no<5:0>
Slotstart
Pktend
RACHreq
RACHakn
W_ENA
R_ENAupdate
idle
writereadslotset
RACH
idle
A protocol =Extended FSM
Intercom TDMA MAC
Intercom TDMA MACIntercom TDMA MACImplementation alternativesImplementation alternatives
ll ASIC: 1V, 0.25 ASIC: 1V, 0.25 µµ m CMOS processm CMOS process
ll FPGA: 1.5 V 0.25 FPGA: 1.5 V 0.25 µµ m CMOS low-energy FPGAm CMOS low-energy FPGAll ARM8: 1 V 25 MHz processor; n = 13,000ARM8: 1 V 25 MHz processor; n = 13,000
ll Ratio: 1 - 8 - >> 400Ratio: 1 - 8 - >> 400
ASIC FPGA ARM8Power 0.26mW 2.1mW 114mWEnergy 10.2pJ/op 81.4pJ/op n*457pJ/op
Idea: Exploit model of computation: concurrent finite state machines,communicating through message passing
DSPCPU
MPEG
MemCtrl.
C
I O O
DMA
Bridge
The Communications PerspectiveThe Communications Perspective
DSP MPEGCPUDMA
C MEM I O
Example: “The Silicon Example: “The Silicon BackplaneBackplane””(Sonics, Inc)(Sonics, Inc)
Open CoreProtocolTM
SiliconBackplaneAgentTM
Communications-based DesignCommunications-based DesignGuaranteed Bandwidth
Arbitration
Reconfigurable Computing:Reconfigurable Computing:Merging Efficiency and VersatilityMerging Efficiency and Versatility
“Hardware” customized tospecifics of problem.
Direct map of problemspecific dataflow, control.
Circuits “adapted” asproblem requirementschange.
Spatially programmed connection of processing elements.Spatially programmed connection of processing elements.
Multi-granularity Reconfigurable Architecture:Multi-granularity Reconfigurable Architecture:The Berkeley The Berkeley PleiadesPleiades Architecture Architecture
Communication Network
ControlProcessor
ArithmeticProcessor
ArithmeticProcessor
ArithmeticProcessor
ConfigurableDatapath
ConfigurableLogic
Configuration Bus
Network Interface
DedicatedArithmetic
Configuration
Satellite ProcessorSatellite Processor
• Computational kernels are “spawned” to satellite processors• Control processor supports RTOS and reconfiguration• Order(s) of magnitude energy-reduction over traditional programmable architectures
Matching Computation and ArchitectureMatching Computation and Architecture
AddressGen AddressGen
Memory Memory
MAC MAC
ControlProcessor
L CG
Convolution
Two models of computation:communicating processes + data-flow
Two architectural models:sequential control+ data-driven
Example: Covariance Matrix ComputationExample: Covariance Matrix Computation
f o r ( i =1; i <=l e ng t h; i ++) {f o r ( k=i ; k<=l e ng t h; k++) { phi [ i ] [ k] = phi [ i - 1 ] [ k- 1 ] +
i n[ NP- i ] *i n[ NP- k] - i n[ NA- 1 - i ] *i n[ NA- 1- k] ;
} }
AddrGen
Mem :i n
MPY
AddrGen
Mem:ph i
ALU
ALU
Adaptive Multi-User Detector for W-CDMAAdaptive Multi-User Detector for W-CDMAPilot Pilot Correlator Correlator Unit Using LMSUnit Using LMS
AG
MULSUB
ADDMEM
MEM
MEM
MEMAG
MUL
MUL
MUL
Filter
Coefficient Update
MEM
MEMAG
ACC
ACC
MAC
MAC
MUL
MUL
SUB
SUB
MULSUB
ADD
MUL
MUL
MUL
SUB
SUB
alt
alt
alt
alt
alt
alt
alt
s_r
s_i
y_r
y_iADD
ADD
Zmf_r
Zmf_i
s_r
s_iZmf_r
Zmf_i
y_r
y_i
Architecture ComparisonArchitecture ComparisonLMS LMS Correlator Correlator at 1.67 at 1.67 MSymbolsMSymbols Data Rate Data RateComplexity: 300 Complexity: 300 MmultMmult/sec and 357 /sec and 357 MaccMacc/sec/sec
Note: TMS implementation requires 36 parallel processors to meet data rate -validity questionable
16 Mmacs/mW!
Data-driven SynchronizationData-driven SynchronizationBased on Finite StreamsBased on Finite Streams
ll “Smart” satellites able to handle data inputs of different types“Smart” satellites able to handle data inputs of different types
ll Support of multi-dimensional signal processingSupport of multi-dimensional signal processing
ll Introduction of data types: scalars, vectors, matricesIntroduction of data types: scalars, vectors, matrices
1
11
1
nnMPY MPY
n
n1MAC
Interconnect networkInterconnect network
• A mesh structure within local clusters
• A higher-level mesh to connect clusters
• Compared to pure mesh:
» Smaller switch sizes and less switches per connection
» Less wires and switchboxescluster
clustercluster
• A switch box at each cross-point
• Compared to cross-bar:
» Shorter average interconnect length
» Less switches per connection
Generalized Mesh
Hierarchical Generalized Mesh
MaiaMaia: Reconfigurable : Reconfigurable BasebandBasebandProcessor for WirelessProcessor for Wireless
• 0.25um tech: 4.5mm x 6mm
• 1.2 Million transistors
• 40 MHz at 1V
• 1 mW VCELP voice coder
• Hardware
• 1 ARM-8
• 8 SRAMs & 8 AGPs
• 2 MACs
• 2 ALUs
• 2 In-Ports and 2 Out-Ports
• 14x8 FPGA
The Software-Defined RadioThe Software-Defined Radio
ReconfigurableDataPath
FPGA Embedded uP
Dedicated FSM
DedicatedDSP
TCI - A First Generation TCI - A First Generation PicoNodePicoNode
TensilicaEmbedded Proc.
TensilicaEmbedded Proc.
MemorySub-system
MemorySub-system
Baseband ProcessingBaseband Processing
ConfigurableLogic
(Physical Layer)
ConfigurableLogic
(Physical Layer)
ProgrammableProtocol StackProgrammableProtocol Stack
Sonics Backplane
Architecture Design MethodologyArchitecture Design Methodology
ll Requires Requires architecture explorationarchitecture exploration over overheterogeneous heterogeneous implementationimplementation fabrics fabrics
ll Should support Should support refinement refinement and and co-designco-design of ofbehavior and architecturebehavior and architecture, as well as, as well ashardware and software,hardware and software,
ll CommunicationCommunication analysis is as important as analysis is as important ascomputationcomputation
ll Should consider all important metrics, andShould consider all important metrics, andpresent present PDA PDA (Power-Delay-Area) perspective(Power-Delay-Area) perspective
Merging Behavior and ArchitectureMerging Behavior and Architecture
Fast Design Space ExplorationFast Design Space ExplorationArchitecture ModelsArchitecture Models
Output: Estimate, Profile
ArchitectureParameters
RetargetableEstimator
...Architectural
Choices
...
Application (Generic C code)
ParameterizedArchitecture
Model
Designer’s Input:Architect
Profiler
Example:Retargetable estimation[Ghazal]
Fast Design Space ExplorationFast Design Space ExplorationInterconnect ModelsInterconnect Models
N Inputs
B Buses
M Outputs
Multi-Bus
cluster
cluster
cluster
Hierarchical MeshMesh
Module
Model:Model:•• Interconnect energy and delay model Interconnect energy and delay model•• Algorithm mapping Algorithm mapping•• Graph-based place and route Graph-based place and route
Pleiades Mapping FlowPleiades Mapping Flow
Algorithms
Kernel Detection
Estimation/Exploration
Partitioning
Software CompilationReconfig. Hardware Mapping
Interface Code Generation
Power & Timing Estimation of Various Kernel Implementations
PDA Models
PremappedKernels
Acceleratorµproc &
Behavioral
C++ Module Libraries
C++
SUIF+ C-IF
Abstract Architecture DesignAbstract Architecture Designenables Circuit Innovationenables Circuit Innovation
Energy (mW/MIPs)
Dh
ryst
one
2.1
MIP
s
0.0 1.0 2.0 3.0 4.0 5.0 6.0
100
80
60
40
20
0
EntireSystem
CPUOnly
90% converterefficiency @ high speed
80% converterefficiency @ lowest speed
Integrateddc-dc
converter
Energy ScavengingEnergy ScavengingThe Holy Grail in Low-Energy System DesignThe Holy Grail in Low-Energy System Design
Integrated micro-vibrator provides 10-100 µµW of free power (equivalent to2340 free DSP operations/sec) [Amirtharajah & Chandrakasan, DISPS99]Other options: solar energy (1mW/cm2), acoustic and mechanical vibrations, pressure ...
SummarySummary
ll Low-energy design ascends to prime timeLow-energy design ascends to prime timeforced mainly by the forced mainly by the last-meterlast-meter problem problem
ll Design for low-energy impacts all stages of theDesign for low-energy impacts all stages of thedesign process — design process — the earlier the betterthe earlier the better
ll Energy reduction requires clear Energy reduction requires clear communication andcommunication andcomputationcomputation abstractions abstractions
ll Efficient and Efficient and abstract modelingabstract modeling of energy at behavior of energy at behaviorand architecture level is crucialand architecture level is crucial
ll Low-energy embedded system design causes theLow-energy embedded system design causes theemergence of emergence of innovative and non-intuitiveinnovative and non-intuitiveimplementation paradigmsimplementation paradigms
top related