platform-based design: an automotive example
TRANSCRIPT
Page 1
Platform-based Design: an Automotive ExamplePlatform-based Design: an Automotive Example
Alberto Sangiovanni VincentelliAlberto Sangiovanni Vincentelli
The Edgar L. and Harold H. Buttner Chair ofThe Edgar L. and Harold H. Buttner Chair of Electr Electr. Eng. And Comp.. Eng. And Comp.ScienceScience
University of California at BerkeleyUniversity of California at Berkeley
Co-Founder, Chief Technology Advisor and Board MemberCo-Founder, Chief Technology Advisor and Board Member
Cadence Design SystemCadence Design System
Founder and Scientific DirectorFounder and Scientific Director
PARADES (Cadence, Magneti-Marelli, ST)PARADES (Cadence, Magneti-Marelli, ST)
Work in collaboration withWork in collaboration with
A.Ferrari, M. Baleani, L. A.Ferrari, M. Baleani, L. Mangeruca Mangeruca (PARADES)(PARADES)
And the Magneti-Marelli/ST Design TeamsAnd the Magneti-Marelli/ST Design Teams
PA
RA
DE
S
June 9, 2000 2
HW Micro-controller Architecture basic componentsHW Micro-controller Architecture basic components
�� Processing Units (CPU)Processing Units (CPU)
�� MemoriesMemories
�� I/O UnitsI/O Units
I/O sub system
CPU
Memory
Page 2
June 9, 2000 3
An Example of Platform space: Micro-controllersAn Example of Platform space: Micro-controllers
I/O
Memory
CPU
40 MIPS
30 MIPS
10 MIPS
June 9, 2000 4
The criteria for selection are manyThe criteria for selection are many......
�� CostCost
�� PerformancePerformance
�� FlexibilityFlexibility
�� ReliabilityReliability
�� SizeSize
�� PowerPower
�� Re-useRe-use
�� ……
Page 3
June 9, 2000 5
Power-Train Control SystemPower-Train Control System
�� Electronic device controlling an internal combustion engine and aElectronic device controlling an internal combustion engine and a
gearboxgearbox
�� The goalThe goal
�� offer appropriate driving performance (e.g. torque, comfort, safety)offer appropriate driving performance (e.g. torque, comfort, safety)
�� minimize fuel consumption and emissionsminimize fuel consumption and emissions
�� Relevant characteristicsRelevant characteristics�� strictly coupled with mechanical partsstrictly coupled with mechanical parts
�� hard real-time constraintshard real-time constraints
�� complex algorithms for controlling fuel injection, spark ignition, throttlecomplex algorithms for controlling fuel injection, spark ignition, throttleposition, gear shift …position, gear shift …
June 9, 2000 6
Engine Management: BehaviorEngine Management: Behavior
�� Failure detection and recovery of input sensors (6 CFSMs + 1Failure detection and recovery of input sensors (6 CFSMs + 1
Timer)Timer)
�� Computation ofComputation of�� engine phase, status and angle (6 CFSMs + 6 Timers)engine phase, status and angle (6 CFSMs + 6 Timers)
�� crankshaft revolution speed and acceleration (3 CFSMs + 1 Timer)crankshaft revolution speed and acceleration (3 CFSMs + 1 Timer)
�� Injection and ignition control law (Injection and ignition control law (18 CFSMs)18 CFSMs)
�� Injection and ignition actuation drivers (Injection and ignition actuation drivers (56 CFSMs + 48 Timers)56 CFSMs + 48 Timers)
Page 4
June 9, 2000 7
Behavioral ValidationBehavioral Validation
InjTop
InjectionParametSync
IgnTop
IgnitionParameteSync
IgnDrvTopCyl1Coil
Cyl1RealAdvance
Cyl2Coil
Cyl2RealAdvance
Cyl3Coil
Cyl3RealAdvance
Cyl4Coil
Cyl4RealAdvance
EngineAngle
EngineSyncStatu
IgnitionParamete
sync
InjDrvTopCyl1Inj
Cyl2Inj
Cyl3Inj
Cyl4Inj
EngineAngle
EngineSyncStatus
InjectionParameter
sync
SyncSysEngineAngle
SystemStop
SystemSyncStatus
sync_t
Synchronization& Data
Acquisition
Angle-driven
DataAcquisition
Time-driven
Ign&InjControl
AlgorithmsInjector&Coil
Drivers
MSE on RPM = 0.29rpmSimulationSimulation––I/O traces of 4s of engine managementI/O traces of 4s of engine management––Fault injection (sensors)Fault injection (sensors)––about 20s on a 450MHz PII CPU with 256MBabout 20s on a 450MHz PII CPU with 256MBRAMRAM
June 9, 2000 8
Single-Core Architectural ModelingSingle-Core Architectural Modeling
Hitachi SH2 or ARM7TDMI core based µ-controller
CPU FUZZY I.S.
RTOSSCHEDULINGPOLICY ANDOVERHEAD ISR
OVERHEAD
TIME UNIT(ATU)
Page 5
June 9, 2000 9
Mapping to SH2: SW Execution timeMapping to SH2: SW Execution time
020406080
100120140160180
Time
GAP TDC MTDC
Task Name
Execution Time (microsec)
Emulation Board
VCC Estimation
8.50%
9.00%
9.50%
10.00%
10.50%
11.00%
11.50%
Error
GAP TDC MTDC
Task
Estimation Error %
Error %
CPU Load = 2%CPU Load Error = 11.3%
June 9, 2000 10
Core Exploration: ARM7TDMICore Exploration: ARM7TDMI
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
Speed Up %
GAP TDC MTDC
Task
Speed Up
Speed Up
020406080
100120140160180
Time (microsec)
GAP TDC MTDC
Task
Task Execution Time
SH7055
ARM7TDMI
CPU Load = 1.44%
Page 6
June 9, 2000 11
ARM7TDMI Code SizeARM7TDMI Code Size
Code Size (published, publicly available, benchmarks):Code Size (published, publicly available, benchmarks):
�� ARM7 (32 bit mode) is 25-35% larger than ARM7TDMIARM7 (32 bit mode) is 25-35% larger than ARM7TDMI
�� Hitachi SH2 is 20% larger than ARM7TDMIHitachi SH2 is 20% larger than ARM7TDMI
�� CPU32 (68K) is 30% larger than ARM7TDMICPU32 (68K) is 30% larger than ARM7TDMI
�� PPC is 60-100% larger than ARM7TDMIPPC is 60-100% larger than ARM7TDMI
�� ST10/C167 is 15-35% larger than ARM7TDMIST10/C167 is 15-35% larger than ARM7TDMI
�� M-Core same code sizeM-Core same code size
June 9, 2000 12
I/O exploration (Sensor Management)I/O exploration (Sensor Management)
�� Mapping A Mapping A�� maximize software re-use and portability (i.e. not maximize software re-use and portability (i.e. not optimizedoptimized for Hitachi for Hitachi
architecture, only “essential” timers included)architecture, only “essential” timers included)
�� 1313 CFSMs CFSMs + 5 timers + 5 timers
�� A1 two tasks, A2 three tasksA1 two tasks, A2 three tasks
�� Mapping B Mapping B�� optimizeoptimize for the Hitachi architecture (I.e., utilize special purpose peripheral for the Hitachi architecture (I.e., utilize special purpose peripheral
called ATU that makes more timers available)called ATU that makes more timers available)
�� 1515 CFSMs CFSMs + 7 timers + 7 timers
�� B1 two tasks, B2 three tasksB1 two tasks, B2 three tasks
�� Mapping C Mapping C�� minimize hw/minimize hw/swsw communication introducing a new virtual hardware component communication introducing a new virtual hardware component
(I.e. replaces ATU with additional functionality)(I.e. replaces ATU with additional functionality)
�� 1818 CFSMs CFSMs + 6 timers + 6 timers
Page 7
June 9, 2000 13
Mapping: performanceMapping: performance
Mapping C
CPU load * 1.4%
1528
6
1673
B2B1A2A1
IRQs
Task switching
Task number
6.8%
8626
3
7745
6.7%
8626
2
7616
10.3%
7739
3
7745
10.2%
7739
2
7616
Cpu load%
Taskswitching
IRQs
* Deferred ISR scheme used
June 9, 2000 14
Flash and Code Size TrendFlash and Code Size Trend
2048
16384
131072
1048576
4096
32768
64
512
2048 24576
4640
10
100
1000
10000
100000
1000000
10000000
1990 1995 2000 2005
Kbi
ts
Discrete Flash Embedded Flash Code Size
Page 8
June 9, 2000 15
Flash SpeedFlash Speed
0
20
40
60
0K]
1995 2000
0HPRU\�6SHHG�
Embedded Flash Discrete Flash
Embedded: 60Mhz (ISSCC@1999)External: 20-30Mhz (AMD/STM/INTEL)
June 9, 2000 16
Platform CostPlatform Cost
�� External FlashExternal Flash::�� Cost = Cost(Flash IC) + Cost(MCU) + Cost = Cost(Flash IC) + Cost(MCU) + IntegrationCostIntegrationCost(Flash,MCU)(Flash,MCU)
�� Micro-controller likely to be I/O boundedMicro-controller likely to be I/O bounded
�� Embedded FlashEmbedded Flash::�� Cost=Cost(MCU) + Cost=Cost(MCU) + IntegrationCostIntegrationCost(MCU)(MCU)
Other issues: speed, reliability, re-use, ...Other issues: speed, reliability, re-use, ...
Page 9
June 9, 2000 17
Embedded Flash SolutionEmbedded Flash Solution
Embedded Flash:Embedded Flash:
��Speed: embedded is fasterSpeed: embedded is faster
��Cost: embedded is cheaperCost: embedded is cheaper
��Size: micro-controllers with higher code density reduceSize: micro-controllers with higher code density reducesystem cost since size is mostly dictated by Flashsystem cost since size is mostly dictated by Flash
June 9, 2000 18
Power-Train Control SystemPower-Train Control System
Solutions:Solutions:�� Increase Clock FrequencyIncrease Clock Frequency
� Performance bounded by memory (FLASH)� A cache memory must be introduced.� Higher clock frequency (80Mhz) might have a worse EMI impact.
�� Increase ParallelismIncrease Parallelism� Instruction Level: VLIW (larger code size)� Processor Level: Multiprocessor
Extend behavior to gearbox control
ARM7/SH2 core overloaded
Page 10
June 9, 2000 19
The Dual-Arm ArchitectureThe Dual-Arm Architecture
A symmetric dual processor architecture with a high-bandwidthA symmetric dual processor architecture with a high-bandwidthinterconnection network among processors, memory, and I/Ointerconnection network among processors, memory, and I/Osub-systemssub-systems
The micro-controller has been designed as a collaborative effortThe micro-controller has been designed as a collaborative effortamong:among:
�� PARADES for architecture conceptsPARADES for architecture concepts
�� Magneti-Marelli for peripherals and requirementsMagneti-Marelli for peripherals and requirements
�� ST for detailed architecture and IC designST for detailed architecture and IC design
�� Accent for X-bar switch and Interrupt Controller detailed designAccent for X-bar switch and Interrupt Controller detailed design
�� Cadence for system exploration and evaluation tools and methodologyCadence for system exploration and evaluation tools and methodology
Excellent example of supply-chain integration in electronic system design forExcellent example of supply-chain integration in electronic system design fornext generation platforms!next generation platforms!
June 9, 2000 20
CPU
WatchDo
IC
FirstLevelDecoder
CPU
WatchDo
IC
FirstLevelDecoder
ARBITRER
FLASH2ARBITRER
FLASH1
RAM1
ARBITRER
ARBITRER
RAM2
ARBITRER
ARM7NATIVEBUS
APB
AHB
A2D1 A2D2 TimingUnit
SPICAN IPCP
IPCP IPCP
Generic
CPU1 CPU2
DMA I/O
IRQ I/OFLASHRAM
DMAPortP_36P_37
DMAInterface
P_38
P_39
P_38
P_39DMAInterface
P_38
P_39DMAInterface
Output
DMAInterface
IPCP SPICAN
Dual-ARM ArchitectureDual-ARM Architecture
Page 11
June 9, 2000 21
Why dual-core ?Why dual-core ?
A simple back of the envelope calculationA simple back of the envelope calculation
�� Estimated size:Estimated size:�� (32KB.RAM + 512KB.FLASH) ~ 5.8M (32KB.RAM + 512KB.FLASH) ~ 5.8M TrsTrs. equivalent to 65-75% of. equivalent to 65-75% of
chip areachip area
�� (ARM7TDMI+Debug+IC+...) < 200K (ARM7TDMI+Debug+IC+...) < 200K TrsTrs. equivalent to less than 4%. equivalent to less than 4%of chip areaof chip area
�� The MCU size is driven by FLASH sizeThe MCU size is driven by FLASH size
Dual-core solution:Dual-core solution:�� 4% increment of MCU cost4% increment of MCU cost
�� twofold performancetwofold performance
June 9, 2000 22
Dual-ARM VCC Architectural ModelDual-ARM VCC Architectural Model
PeripheralGatew ay
IRQ
CANIRQ
IRQ
Peripherals
Generic
IRQ
SCI
PERIPHERALS
MasterGateway
DMA
Fast A/DExternalInterface
RAMFLASH
RAM
FLASH
MicroRTOS parent
ARM
IBUS
DBUS
IRQ
MicroRTOS parent
IRQA/D
ARM
IBUS
DBUS
IRQ
RTOS
SchedulerTask
InterruptHandlers
SchedulerStatic
SchedulerStatic
SchedulerStatic
SchedulerStatic
RTOS
SchedulerTask
InterruptHandlers
SchedulerStatic
SchedulerStatic
SchedulerStatic
SchedulerStatic
Page 12
June 9, 2000 23
Sensors
Actuators
HardwareNet
BIOS
Sensors
Actuators
HardwareNet
BIOS
RT
OS
Behavioral MappingBehavioral Mapping
�� Engine Management Engine Management �� CPU1 (as single core) CPU1 (as single core)SeleSpeeed SeleSpeeed �� CPU2 CPU2
Transformation
controls
CPU2CPU1 Transformation
controls
RT
OS
Com
mun
icat
ion
Engine Management
SeleSpeed
June 9, 2000 24
Interrupt LatencyInterrupt Latency
�� XBAR arbitration policy: round robin and 1 clock ownerXBAR arbitration policy: round robin and 1 clock owner
cyclecycle
�� Best Case: 15 Best Case: 15 clksclks
�� Worst Case with 1 master: Worst Case with 1 master: 26 26 clksclks
�� Worst Case with 2 masters: 37 Worst Case with 2 masters: 37 clksclks
�� Worst Case with 3 masters: 42 Worst Case with 3 masters: 42 clksclks
�� A ‘typical ‘ value of 27 A ‘typical ‘ value of 27 clk clk (by simulation)(by simulation)
Page 13
June 9, 2000 25
Interrupt Latency (2)Interrupt Latency (2)
June 9, 2000 26
Output DevicesInput devices
Hardware Platform
I O
Hardware
network
DUAL-CORE
RT
OS
BIOS
Device Drivers
Net
wor
k C
omm
unic
atio
n
DUAL-CORE
Architectural Space (Performance)
Application Space (Features)
Platform DefinitionPlatform Definition
Platform Instance
Application Instances
SystemPlatform(no ISA)
Platform Design SpaceExploration
PlatformSpecification
Platform API
Software Platform
Output DevicesInput devices
Hardware Platform
I O
Hardware
network
HITACHI
RT
OS
BIOS
Device Drivers
Net
wor
k C
omm
unic
atio
n
HITACHI
RT
OS
BIOS
Device Drivers
Net
wor
k C
omm
unic
atio
n
Output DevicesInput devices
Hardware Platform
I O
Hardware
network
ST10
RT
OS
BIOS
Device Drivers
Net
wor
k C
omm
unic
atio
n
ST10
Application Software
Application Software
Page 14
June 9, 2000 27
ConclusionsConclusions
�� Complex application of system design methodology:Complex application of system design methodology:
automotive power-train controlautomotive power-train control
�� Several architectures (different CPU, different I/Os,Several architectures (different CPU, different I/Os,
different memory organization, e.g. external vs. internaldifferent memory organization, e.g. external vs. internal
flash) evaluatedflash) evaluated
�� New dual-core architecture developed and validated forNew dual-core architecture developed and validated for
Magneti Marelli applications by joint design teamsMagneti Marelli applications by joint design teams
(PARADES, Magneti-Marelli, ST)(PARADES, Magneti-Marelli, ST)
June 9, 2000 28
The Past, the Present and the Future of the Dual-The Past, the Present and the Future of the Dual-core Platformcore Platform
�� Architecture Conceived in PARADES in March 1999Architecture Conceived in PARADES in March 1999
�� First Discussion with M.M. and ST in April 1999First Discussion with M.M. and ST in April 1999
�� Specification Team with M.M. and ST started in Sept’99Specification Team with M.M. and ST started in Sept’99
�� Performance exploration started in Dec’99Performance exploration started in Dec’99
�� Close specification in Jan’00 (IRQ list)Close specification in Jan’00 (IRQ list)
�� FPGA Prototype in Oct ’00FPGA Prototype in Oct ’00
�� Tape out Jan ‘01 (0.18 Tape out Jan ‘01 (0.18 µµm)m)
�� First Silicon 2001First Silicon 2001
�� Architecture spins for different applications 2001-2002Architecture spins for different applications 2001-2002
�� MM power-train controller Start-Of-Production mid-end 2003MM power-train controller Start-Of-Production mid-end 2003