경종민 [email protected] in-system design verification of processors
TRANSCRIPT
In-System Design Verification of Processors
– Macro Instruction Level Simulator (Behavioral)
• General Purpose Register, Memory
– Micro-code Level Verifier + Internal Bus
– Verilog Hardware Model + Clock-cycle Accurate Description
SUBADD
macroSUB end
ISS
Cycle-based
Verilog(HDL)
Introduction• Design Hierarchy
What is ISV?
• ISV = In-System Verification• When is ISV required?
– 1) Design refinement down along the design hierarchy
• Comparison between design levels
specificationspecification
CnCn
specificationspecification
C1C1
C2C2
C3C3
CnCn
C1: ISS (Instruction Set Simulator)C1: ISS (Instruction Set Simulator)C2: Cycle-based ModelC2: Cycle-based ModelC3: RTL ModelC3: RTL Model
C1: ISS (Instruction Set Simulator)C1: ISS (Instruction Set Simulator)C2: Cycle-based ModelC2: Cycle-based ModelC3: RTL ModelC3: RTL Model
vs.
What is ISV? (cont’d)2) In-system operation : confirm correct
behavior in system environment
system
chip
I/F
chip
SW
SW
HW (slowed)
HW(FPGA)
HW
SW
(a) simulation (c) emulation
(b) all-software (d) Virtual Chip
Simulation
• Consistency check between models of different abstraction levels– Instruction Set Simulator (behavioral)– RTL model (structural)
• Test Vector– Test Pattern– Random Pattern– Test Program– Application Program
Stimulus at the I/F
SW
Various Levels of Design Verification(Test Vectors in Simulation)
High efficiency = # of bugs detected size of test vector
Confined to the designer’sunderstanding
Covers rare casesautomatic generation Coverage not reliable
Available good
compromise between
coverage & efficiency
as benchmark
Requires
many programs
to obtain sufficient
coverageSimulatesreal situationsHigh coverage
Excessive verificationLow efficiency
Advantage Disadvantage
Testpattern
Randompattern
Testprogram
Applicationprogram
All-Software Approach
• Modeling System Part in Software• Test Vector
– System Software (BIOS, OS)– Application Programs
• compatible processor design
• Helps detect bugs– When the situation is difficult to reproduce with random
patterns (i.e., Instruction’s sensible behavior requires some pre-setting)
– When instruction behavior is complex, i.e., CISC instruction
• Modeling system parts is difficult when– no source code for the application programs is available
SW(system)
SW(chip)
Emulation• Mapping Gate-level Model in
FPGA-based System• Fast ISV
– in simulation speed– in design stage
Slowed-down System
HW in FPGA
Timesecond
seconds
minutes
minutes
hours
day
days
months
1
10
2
16
3
1
12
3
Speed up factor107
106
105
104
103
102
101
1
Actual HardwareActual Hardware
Logic EmulationLogic Emulation
Software SimulationSoftware Simulation
Verification Gap
Concurrent Verification
Early toMarket!!
Design CodeSystem
IntegrationSW
Design Build
Design Fab
HW
CHIP Debug
HardwareIntegration
Debug
Debug
Without Emulation
Back annotation Time
With EmulationSW
Design Fab
HW
CHIP
ChipDebug
HW emulationHW integration
& HW DebugSys integration
& SW Debug
FinalIntegration Debug
Sequential Verification
Concurrent Verification
Back annotation
Design Code
Design Build
Virtual Chip• Validate the functionality and performance
evaluation of algorithm in real situations, i.e., with real-world vectors and real hardware environment.– verify the algorithm in the early design stage
• Concept of Virtual Chip HW
SW
In PL(C, C++…)
In PL(C, C++…) ProcessorProcessor
In HDL (Verilog...)
In HDL (Verilog...) FPGAFPGA
Functional model
Functional model
Busmodel
Busmodel
System Description
Bridge between SW and HW
Virtual Chip [DAC98]
• [DAC98] Virtual Chip: Making Functional Models Work on Real Target System
• Example: Simulating ISS with real target system– ISV with application program in early design stageTarget board
Hostcomputer cable
PSG(pin signal generator)
daughter board
Chip Model
Why Virtual Chip ?
• No need to model external system in software as in all-software approach
• Inexpensive solution compared to emulator– small number of FPGAs
• HW slow-down is not necessary– no need to modify target system for emulation
Hardware Emulation
slowed slowed
Bus ModelTargetBoard
Virtual Chip
slowed normal
Bus ModelTargetBoard
Buffer
Benefit in Design Time
idle
Conventional design flowConventional design flow
Architecturalmodel
RTLmodel
Gate-levelmodel
H/WEmulation
Verification w/ H/W
H/W prototype(H/W emulation)
Boarddesign
H/W
ApplicationS/W
idle
design
Virtual-Chip-based design flowVirtual-Chip-based design flow
Architecturalmodel
H/W prototype(Virtual ChipVirtual Chip)
Boarddesign
H/W
design
RTLmodel
Gate-levelmodel
H/WEmulation
Verification w/ H/W
ApplicationS/W
Design time is drastically reduced
2. 2. K486 : The attempt to full custom(1997)K486 : The attempt to full custom(1997)•1,000,000 Tr. count1,000,000 Tr. count•8KB on-chip cache : full-custom design8KB on-chip cache : full-custom design•die size : 1.5cm x 1.5cmdie size : 1.5cm x 1.5cm
x86-compatible Microprocessor Design
1. 1. HK386 : The first step to x86(1994)HK386 : The first step to x86(1994)•300,000 Tr. count300,000 Tr. count•5V, 0.8um DLM CMOS technology5V, 0.8um DLM CMOS technology•die size : 1cm x 1cmdie size : 1cm x 1cm
3. 3. Marcia : Superscalar architecture(1997)Marcia : Superscalar architecture(1997)•3,000,000 Tr. count3,000,000 Tr. count•3V, 0.6um TLM CMOS technology3V, 0.6um TLM CMOS technology•die size : 1.2cm x 1.2cmdie size : 1.2cm x 1.2cm
Overall Functional Verification Flow
RTL SimulationRTL Simulation
SynthesisSynthesis
Gate Level SimulationGate Level Simulation
HardwareEmulationHardwareEmulation
Verification CompletedVerification Completed
MicrocodeDescriptionMicrocodeDescription
Architecture DefineArchitecture Define
RTL Description(Verilog HDL)
RTL Description(Verilog HDL)
MicrocodeVerifier
MicrocodeVerifier
For versioncontrol
Design Verification Methodology
InstructionBehavior
In C(Polaris)
Micro-architecture
in C
RT-Levelin Verilog
Gate-Levelin Verilog
RealMother-board
H/W
Virtual PC in C language(VPC)
C Language
Peripherals
HDL
• MCV : Microcode VerifierMCV : Microcode Verifier• PLI : Programming Language PLI : Programming Language
InterfaceInterface
more refined model
MCVFlexPCVirtual Chip Using
PLI
CPU
Polaris: ISS (Instruction Set Simulator)
• ISS for x86 processors : Polaris– a standard reference model for the design of
x86 processors– about 10,000 line code written in C language– Polaris can execute all the programs which run
on real PC’s– Polaris is used for verifying the functionality of
each instruction
• Polaris helps microcode design and debugging with the verified reference model
MCV (Micro-Code Verifier)
• Behavior simulation at micro-operation level
• Debugging feature– trace each micro-
operation result– operation
backward– source code trace
MCV debugging environment
DOS simulation window
symbolic microcode in
executionstates before executing this microcode can
be restored
internal states
(registers and buses)
StreC (Structural Level C Model)
• RTL Model using C language– A cycle is levelized into 4
phases– Static scheduling of logic
behavior– No timing delay
• Cycle-based simulator– High simulation
speed(1.4KHz)
• Structural Analysis of Design– Signal Flow Graph– Static timing verification– Resource estimation at RTL– RTL floorplan
CP1_EDGE();DP1_EDGE();FP1_EDGE();SP1_EDGE();KPP1_EDGE();BP1_EDGE();XP1_EDGE();
P1
P1_EDGE P1_LEVEL P2_LEVELP2_EDGE
CP2_EDGE();DP2_EDGE();FP2_EDGE();SP2_EDGE();KPP2_EDGE();BP2_EDGE();XP2_EDGE();
CP1_LEVEL();DP1_LEVEL();FP1_LEVEL();SP1_LEVEL();KPP1_LEVEL();BP1_LEVEL();XP1_LEVEL();
XP2_LEVEL_1(); KPP2_LEVEL_1();CP2_LEVEL_1();DP2_LEVEL_1();CP2_LEVEL_2();FP2_LEVEL_1();DP2_LEVEL_2();SP2_LEVEL_1();FP2_LEVEL_2();KPP2_LEVEL_2();BP2_LEVEL_1();XP2_LEVEL_2();
RTL C Model (StreCTM)• RTL description in C
– Functional Verification– Cycle-based simulation
• about 100 times speed-up • compared to VCS
– Translated to Verilog RTL model– Reducing total simulation time
PolarisPolarisMCVMCVStreCStreCVCSVCSChipChip
210KHz50KHz1.4KHz17Hz
33MHz
20min.50min.2days
120days12sec.
speed time
time: Windows 3.1 running timetime: Windows 3.1 running timetime: Windows 3.1 running timetime: Windows 3.1 running time
Verilog simulationVerilog simulation
WorkingVerilog code
WorkingVerilog code
Functional + timingFunctional + timing
WorkingVerilog code
WorkingVerilog code
conversionconversion
Conventional Method
Conventional Method
C modelMethod
C modelMethod
timetime
FunctionalFunctional
Static Timing verificationStatic Timing verification
StreC SimulationStreC Simulation
timingtimingWorkingC code
WorkingC code
VPC (Virtual PC) library
• Library of PC chipset model– software model of PC board– capable of interface to CPU model of any level– provides interfaces for workstation platform
• keyboard, graphic card: X Windows• floppy disk, hard disk: UNIX file system
– C code of 20,000 lines
• BIOS code– mostly consists of x86 assembly program– speed-critical part is implemented with C
functions• disk, graphic routine• register values are transferred via I/O port
VPC(Virtual PC) Environment
MemoryMemory
PCChipsetmodel
PCChipsetmodel
BIOS(Assembly and
C routine)
BIOS(Assembly and
C routine)
Virtual PC
X window
Keyboard with Xlib
UNIX file system
Debuggingfeature
Debuggingfeature
Interfaceroutines
inteli386
CPU model
x86interface
platforminterface
Simulation& Debugging
PC model Platform
HK386
• Design Specification– compatibility : Instruction level, Pin-
to-Pin compatible with i386– performance : Similar to i386– operation speed : 40 MHz– process : 0.8 m DLM CMOS
• Test Programs– MS DOS 6.0, Windows 3.1, Office 4.0– CAD tools, games, etc..
MS Win. 3.1 MS Office MaxPlus II
HK387
• Design specification– compatibility : Instruction
level, Pin-to-Pin compatible with i387
– operation speed : 33 MHz– process : 0.8 m 2LM CMOS– performance
• PC magazine coprocessor benchmark
[ops/sec]
intel 387intel 387
3206.30
ULSI 387ULSI 387
3646.34
HK 387HK 387
3950.20
Cyrix 387Cyrix 387
4533.28
AutoCAD R11 Mathematica 3.0 Design Center ...
Simulation Input Vector
• Off-the-shelf Test Vector– Regression test– Intensive instruction
test programs– more than 500
programs
• Random Test Vector Generator (Pandora)– Template based– Improve the test
coverage
• Real applications– DOS, Windowssequence of testing
determine type of instructions
processorstatus
PandoraPandora
Saver with ‘Modify and Restart’ Capability
• Conventional Saver– Dump all running information at arbitrary time
points.– Any modification forces the simulation to be
rewound to the beginning.
• Proposed Saver– Find the nearest suitable points to save snapshot,
then save only internal states rather than all simulation context.
– Can be restarted at any save points by triggering a signal in spite of design modification.
Save point isactively adjustedto a stable point
conventional proposed
Reduction of Simulation Time
TBD+ TBD+ TBD TDBG
Bug Detected
Signal dump generationfor debugging
Resimulationfrom the beginning
Size of debugging loopfor failure of bug-fix
Debugging
Timing overhead for a bug-fix
TBD+ TBD TDBGTSD+
TSD+TBD TDBG TSD+
Simulation Started
Without Saver
Conventional
Proposed Saver
x86 Emulation Configuration
QuickturnHardwareEmulator
QuickturnHardwareEmulator
Slow-Down PCSlow-Down PCProbeModuleProbe
Module
Target InterfaceBoard
Target InterfaceBoard
Debugging Progress Traces
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 5 10 15 20 25 31 36 41
Time (weeks)
Inst
ructions
(thousa
nd)
HDL saverAttached
Windows
DOS
HDL Simulation HardwareEmulation
setupversionupdate 1
versionupdate 2
versionupdate 3
Catched-Bug Categories
Typical
Bugs detected % of Running
Time % of Bugs
Test Program Error in Normal Operation ~30% ~75%
Random Test Vector
Data-related Error ~5% > 1%
Software Emulation
Error at Exceptional Cases ~50% ~20%
Hardware Emulation
Protocol Error in External Bus Operation
~20% ~5%
1. Test Program and Random Test Vector are concurrently verified.2. Exceptional cases of complex instructions are hard to fully verify only with test vectors.
Conclusions
• ISV (In-System Verification) is a MUST for assuring the successful working of the APPLICATION programs on the WHOLE SYSTEM, and reducing Time-to-Market.
• We have presented various approaches for in-system verification of microprocessors and DSP processors.
Reference
• J.H.Yang et al, “MetaCore: An Application-Specific DSP Development System”, 1998 DAC Proceedings, pp. 800-803.
• J.H.Yang et al, “MetaCore: An Application-Specific Programmable DSP Development System”, IEEE Trans. VLSI Systems, vol 8, April 2000, pp173-183.
• B.W.Kim et al, “MDSP-II:16-bit DSP with Mobile Communication Accelerator”, IEEE JSSC, vol 34, March 1999, pp397-404.
Part I : ASIP in general
• ASIP is a compromise between GPP(General-Purpose Processor) which can be used anywhere with low performance and full-custom ASIC which fits only a specific application but with very high performance.
• GPP, DSP, ASIP, FPGA, ASIC(sea of gates), CBIC(standard cell-based IC), and full custom ASIC in the order of increasing performance and decreasing adaptability.
• Recently, ASIC as well as FPGA contains processor cores.
Cost, Performance,Programmability, and TTM(Time-to-Market)
• ASIP (Application-Specific Instruction set Processor)– ASIP is a tradeoff between the advantages of ‘general-
purpose processor’ (flexibility, short development time) and those of ‘ASIC’ (fast execution time).
Execution time
Development time
Cost (NRE+chip area)Rigidity
ASIC
ASIP
General-purposeprocessor
Depends on volume of product
Comparison of TypicalDevelopment Time
MetaCore (ASIP)
General-purpose processor
ASIC
MetaCore developmentCore generation +
application code development
Application code development
Chip manufacturer time Customer time
Core generation
20 months
20 months
10 months
2 months
3 months
Issues in ASIP Design
• For high execution speed, flexibility and small chip area;– An optimal selection of micro-architecture & instruction set is
required based on diverse exploration of the design space.
• For short design turnaround time;– An efficient means of transforming higher-level specification
into lower-level implementation is required.
• For friendly support of application program development;– A fast development of a suite of supporting software
including compiler and ISS(Instruction Set Simulator) is necessary.
Various ASIP Development Systems
EPICS(Philips)
CD2450(Clarkspur)
PEAS-I(Univ. Toyohashi)
ASIA(USC)
MetaCore(KAIST)
Instruction set customization
Selection frompredefinedsuper set
User-definedinstructions
Yes
Yes
Yes
Generates proper instructionset based on predefined
datapath
Yes
No
No
No
Yes
Applicationprogramming
level
assembly
assembly
C-language
C-language
C-language
Year
1993
1995
1991
1993
1997
Risc-likeMicro-architecture
(register basedoperation)
DSP-orientedMicro-architecture
(memory basedoperation)
Part II : MetaCore System
• Verification with co-generated compiler and ISS
• MetaCore system– ASIP development environment– Re-configurable fixed-point DSP architecture– Retargetable system software
• C-compiler, ISS, assembler– MDSP-II : a 16-bit DSP targeted for GSM
applications.
The Goal of MetaCore System
• Supports efficient design methodology for ASIP targeted for DSP application field.
Diverse design explorationPerformance/costefficient design
Automatic design generation
Short chip/core designturnaround time
In-situ generation ofapplication programdevelopment tools
Overview: How to Obtain a DSP Core from MetaCore System
InstructionsPrimitive class
Optional class
add suband or
minmaxmac
. . . .
. . . .
Functional blocks
Simulation
OK?
BenchmarkPrograms
Selectinstructions
HDL code generation
Logic synthesis
Select functional blocks
Architecturetemplate
Yes
No No
Select architecturalparameter
Pipeline model
Bus structure
Data-path structure
AdderMultiplierShifter
. . . .
Add or deleteinstructions
Add or deletefunctional blocks
Modify architecture
System Library & Generator Set: Key Components of MetaCore System
Generator set
BenchmarkPrograms
ISSgenerator
HDLgenerator
ProcessorSpecification
SynthesizableHDL code
Compilergenerator
C compiler ISS
Evaluation
Simulation
Architecturetemplate
Set ofinstructions
- parameterized HDL code- I/O port information- gate count
- instruction’s definition
- related func. block
- pipeline model
- bus structure
accept
modify
Modifyspecification
Set offunctional
blocks
- data-path structure
System Lib.
Add AddModify
Processor Specification (example)
• Specification of target core– defines instruction set & hardware configuration.– is easy for designer to use & modify due to high-
level abstraction.
//Specification of EM1(hardware ACC 1 AR 4 pmem 2k, [2047: 0]
)
(def_inst ADD (operand type2 ) (ACC <= ACC + S1 ) (extension sign ) (flag cvzn ) (exestage 1)
...
...
Instruction setdefinition
Hardwareconfiguration
Benchmark analysis
• is necessary for deciding the instruction set.• produces information on
– the frequency of each instruction to obtain cost-effective instruction set.
– the frequent sequence of contiguous instructions to reduce to application-specific instructions.
abs a0, ar1clr a1
cmp a1, a0bgtz L1clr a1add a1, a0
L1:
; a0=|mem[ar1]|; a1=0
; a1=0; a1=a1+a0
; if(a1>a0) pc=L1
add a1, ar2 ; a1=a1+|mem[ar2]|
abs a0, ar1clr a1add a1, ar2
max a1, a0
L1:
; a1=max(a1, a0)
Frequent sequenceof contiguous instructions
Application-specificinstruction
HDL Code GeneratorProcessor
Specification
SynthesizableHDL code
Connectivity synthesisConnects I/O and control portsof each functional block tobuses and control signals
Control-path synthesisGenerates decoder logic for each pipeline stage
Deco
der lo
gic
Memory size,address space
Macro-block generation
Instantiates the parametervariables of each functionalblock
ALUMultiplier
Shifter
Register file
Datamemory1
AGU1Data
memory0
Programmemory
Peripherals (Timer, SIO)
Controller
Bit-width offunctional blocks
BMU
Target core
Design Example (MDSP-II)
• GSM(Global System for Mobile communication)• Benchmark programs
– C programs (each algorithm constructing GSM)
• Procedure of design refinement
EM0 EM1EM2
(MDSP-II)
• Initial design containingall predefined instructions
• Final design containing application-specific instructions
Remove infrequentinstructions based on
instruction usage count
Turn frequent sequenceof contiguous instructions
into a new instruction
Evolution of MDSP-II Corefrom The Initial Machine
Machine
EM0 (initial) 53.0 Millions 18.1K
EM1 (intermediate) 53.1 Millions 15.0K
EM2 (MDSP-II) 27.5 Millions 19.3K
Number of clock cycles(for 1 sec. voice data processing) Gate count
Gate count
EM0
EM2 (MDSP-II)
Number of clock cycles
50M
40M
30M
20M
10M
5K 10K 15K 20K
EM1
Design Turnaround Time (MDSP-II)
• Design turnaround is significantly reduced due to the reduction of HDL design & functional simulation time.
• Only hardware blocks for application-specific instructions, if any, need to be designed by the user.
Designprogress
Time (months)Application analysis
HDL design,Functional simulation
Layout,Timing simulation
MetaCore
1 2 3
5 weeks
1 week
7 weeks
Tape-out
Overview of EM2 (MDSP-II)
MCAUMCAU
Pro
gra
m M
em
ory
Pro
gra
m M
em
ory
Data MemoryData Memory
PU
(SIO
, Tim
er)
PU
(SIO
, Tim
er)
DALUDALU
PCUPCU
AGUAGU
MCAU (Mobile Comm. AccelerationUnit) consists of functional blocks for application-specific instructions
16x16 multiplier 32-bit adder
DALU (Data Arithmetic Logic Unit) 16x16 multiplier32-bit adder
16-bit barrel shifterData switch network
PCU (Program Control Unit)
AGU (Address Generation Unit)supports linear, modulo andbit-reverse addressing modes
PU (Peripheral Unit)Serial I/O Timer
16-bit fixed-point DSPOptimized for GSM0.6 m CMOS (TLM), 9.7mm x 9.8mm55 MHz @5.0V
Conclusions
• MetaCore, an effective ASIP design methodology for DSP is proposed.
1) Benchmark-driven & high-level abstraction of processor specification enables performance/cost effective design.
2) Generator set with system library enables short design turnaround time.