page 1 validating a modern microprocessor bob bentley intel corporation enterprise microprocessor...
TRANSCRIPT
Page 1
Validating A Modern MicroprocessorValidating A Modern Microprocessor
Bob BentleyBob Bentley
Intel CorporationIntel Corporation
Enterprise Microprocessor GroupEnterprise Microprocessor Group
Hillsboro, Oregon U.S.A.Hillsboro, Oregon U.S.A.
Page 2
Moore’s Law - 1965Moore’s Law - 1965
Source: Intel Museum
Page 3
Process Name P854 P856 P858 Px60 P1262 P1264 P1266
1st Production 1995 1997 1999 2001 2003 2005 2007
Lithography 0.35m 0.25m 0.18m 0.13m 90nm 65nm 45nm
Gate Length 0.35m 0.20m 0.13m <70nm <50nm <35nm <25nm
Wafer Size 200 200 200 200/300 300 300 300 (mm)
Moore’s Law - 40 Years LaterMoore’s Law - 40 Years Later
A new process every two yearsA new process every two years
Source: Intel
Page 4
300mm Semiconductor Economics300mm Semiconductor Economics
FabFab $3 billion$3 billion
Pilot linePilot line $1-2 billion$1-2 billion
R&D process team R&D process team $0.5-1 billion$0.5-1 billion
$5 billion investment requires high volume to achieve reasonable unit cost
$5 billion investment requires high volume to achieve reasonable unit cost
Source: Intel
Page 5
The Validation ChallengeThe Validation Challenge Microprocessor validation continues to be driven by the Microprocessor validation continues to be driven by the
economics of Moore’s Laweconomics of Moore’s Law– Each new process generation doubles the number of transistors Each new process generation doubles the number of transistors
available to microprocessor architects and designersavailable to microprocessor architects and designers– Some of this increase is consumed by larger structures (caches, Some of this increase is consumed by larger structures (caches,
TLB, etc.), which have no significant impact to validationTLB, etc.), which have no significant impact to validation– The rest goes to increased complexity:The rest goes to increased complexity:
– Out-of-order, speculative execution machinesOut-of-order, speculative execution machines– Deeper pipelinesDeeper pipelines– New technologies (New technologies (Hyper-Threading, 64-bit extensions, Hyper-Threading, 64-bit extensions,
virtualization, security, …virtualization, security, …– Multi-core designsMulti-core designs
– Increased complexity => increased validation effort and riskIncreased complexity => increased validation effort and risk
High volumes magnify the cost of a validation escape
High volumes magnify the cost of a validation escape
Page 6
Microprocessor DesignMicroprocessor Design
Page 7
Microprocessor Design ScopeMicroprocessor Design Scope Typical lead CPU design requires:Typical lead CPU design requires:
– 500+ person design team:500+ person design team:– logic and circuit designlogic and circuit design– physical designphysical design– validation and verificationvalidation and verification– design automationdesign automation
– 2-2½ years from start of RTL development to A0 2-2½ years from start of RTL development to A0 tapeouttapeout
– 9-12 months from A0 tapeout to production qual (may 9-12 months from A0 tapeout to production qual (may take longer for workstation/server products)take longer for workstation/server products)
One design cycle = 2 process generationsOne design cycle = 2 process generations
Page 8
PentiumPentium® 4 Processor® 4 Processor
RTL coding started: 2H’96RTL coding started: 2H’96– First cluster models released: late ’96First cluster models released: late ’96– First full-chip model released: Q1’97First full-chip model released: Q1’97
RTL coding complete: Q2’98RTL coding complete: Q2’98– ““All bugs coded for the first time!”All bugs coded for the first time!”
RTL under full ECO control: Q2’99RTL under full ECO control: Q2’99 RTL frozen: Q3’99RTL frozen: Q3’99 A-0 tapeout: December ’99A-0 tapeout: December ’99 First packaged parts available: January 2000First packaged parts available: January 2000 First samples shipped to customers: Q1’00First samples shipped to customers: Q1’00 Production ship qualification granted: October 2000Production ship qualification granted: October 2000
Page 9
0
5 0 0 0 0 0
1 E +0 6
2 E +0 6
2 E +0 6
3 E +0 6
3 E +0 6
4 E +0 6
4 E +0 6
1996
-02
1996
-04
1996
-06
1996
-08
1996
-10
1996
-12
1997
-02
1997
-04
1997
-06
1997
-08
1997
-10
1997
-12
1998
-02
1998
-04
1998
-06
1998
-08
1998
-10
1998
-12
1999
-02
1999
-04
1999
-06
1999
-08
1999
-10
1999
-12
2000
-02
2000
-04
2000
-06
2000
-08
2000
-10
2000
-12
2001
-02
2001
-04
0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
# Files Checked In
Total # Lines of RTL
# Lines Changed
RTL – A Moving TargetRTL – A Moving Target
3000 files, 1.3M lines total (including comments, white space)
A0 tapeout
First Full-Chip RTL Model
250K lines changed in one week
RTL Coding Complete
Timing FocusedFunctionality Focused
Page 10
Microprocessor ValidationMicroprocessor Validation
Page 11
RTL validation environmentRTL validation environment RTL model is MUCH slower than real siliconRTL model is MUCH slower than real silicon
– A full-chip simulation with checkers runs at ~20 Hz on a A full-chip simulation with checkers runs at ~20 Hz on a PentiumPentium®® 4 class machine 4 class machine
– We use a compute farm containing ~6K CPUs running 24/7 to We use a compute farm containing ~6K CPUs running 24/7 to get tens of billions of simulation cycles per weekget tens of billions of simulation cycles per week
– The sum total of PentiumThe sum total of Pentium®® 4 RTL simulation cycles run prior to 4 RTL simulation cycles run prior to A0 tapeout < A0 tapeout < 1 minute on a single 2 GHz system1 minute on a single 2 GHz system
Pre-silicon validation has some advantages …Pre-silicon validation has some advantages …– Fine-grained (cycle-by-cycle) checkingFine-grained (cycle-by-cycle) checking– Complete visibility of internal stateComplete visibility of internal state– APIs to allow event injectionAPIs to allow event injection
… … but no amount of dynamic validation is enoughbut no amount of dynamic validation is enough– A single dyadic extended-precision (80-bit) FP instruction has A single dyadic extended-precision (80-bit) FP instruction has
O(10**50) possible combinationsO(10**50) possible combinations– Exhaustive testing is impossible, even on real siliconExhaustive testing is impossible, even on real silicon
Page 12
PentiumPentium® 4 ® 4 Formal VerificationFormal Verification
First large-scale effort at Intel (~60 person years) to First large-scale effort at Intel (~60 person years) to apply formal verification techniques to CPU designapply formal verification techniques to CPU design
– Applying FV to a moving target is a big challenge!Applying FV to a moving target is a big challenge!
Mostly model checking, with some later work using Mostly model checking, with some later work using theorem proving to connect FP proofs to IEEE 754theorem proving to connect FP proofs to IEEE 754
More than 14,000 properties in key areas:More than 14,000 properties in key areas:– FP Execution unitsFP Execution units– Instruction decodeInstruction decode– Out-of-order control mechanismsOut-of-order control mechanisms
Found ~20 “high quality” bugs that would have been Found ~20 “high quality” bugs that would have been hard to detect by dynamic testinghard to detect by dynamic testing
No silicon bugs found to date in areas proved by FV No silicon bugs found to date in areas proved by FV
Page 13
Sources of BugsSources of Bugs
0
200
400
600
800
1000
1200
Q4'96 Q1'97 Q2'97 Q3'97 Q4'97 Q1'98 Q2'98 Q3'98 Q4'98 Q1'99 Q2'99 Q3'99 Q4'99
CTE Full-chip Inspection Other
Page 14
Unit-Level CoverageUnit-Level Coverage
Page 15
Pre-silicon Bug RatePre-silicon Bug Rate
0
10
20
30
40
50
60
70
80
90
1004
0'9
8
43
'98
46
'98
49
'98
52
'98
03
'99
06
'99
09
'99
12
'99
15
'99
18
'99
21
'99
24
'99
27
'99
30
'99
33
'99
36
'99
39
'99
42
'99
45
'99
48
'99
51
'99
Page 16
Formal Verification Formal Verification
Page 17
PentiumPentium®® 4 Formal Property 4 Formal Property VerificationVerification
Pentium® 4 Basic Block DiagramPentium® 4 Basic Block Diagram
System Bus
Bus Unit
Level 2 Cache
Memory Subsystem
Level 1 Data Cache
Execution Units
Integer and FP Execution Units
Fetch/Decode
Trace Cache
Microcode ROM
BTB/Branch Prediction
Front End
Out-of-orderexecutionlogic
Retirement
Out-of-order Engine
Branch History Update
Objective:Objective:– Complement other validation Complement other validation
activitiesactivities
– Correctness, not bug huntingCorrectness, not bug hunting
Strategy: Strategy: – Prove unit properties first, Prove unit properties first,
then multiple-unit protocols & then multiple-unit protocols & assumptionsassumptions
– Maintain properties in face of an Maintain properties in face of an evolving designevolving design
Tools:Tools:– Model checking Model checking – Symbolic simulationSymbolic simulation– Theorem provingTheorem proving
Page 18
Floating Point Verification -Floating Point Verification -MultiplicationMultiplication
Verified adherence to Verified adherence to IEEE 754 spec including IEEE 754 spec including results, flags & faultsresults, flags & faults
Design specified as very Design specified as very low-level RTLlow-level RTL
Huge capacity challenge Huge capacity challenge for traditional MC for traditional MC techniquestechniques
Verification required Verification required multiple person years multiple person years and iterationsand iterations
…
Partial Products generator
Booth Encoder
Exponentdatapath
Wallace Tree Adder Network
Rounder logic
CONTROL
S1S2
Mantissadatapath
Ultimate solution: combined bit-vector level checked Ultimate solution: combined bit-vector level checked results using theorem provingresults using theorem proving
Proof framework used on subsequent designs Proof framework used on subsequent designs
Page 19
Combining Formal and Dynamic Combining Formal and Dynamic VerificationVerification
On Pentium® 4, we treated FPV and dynamic On Pentium® 4, we treated FPV and dynamic verification as essentially independent activitiesverification as essentially independent activities
Current projects are seeking to exploit synergy Current projects are seeking to exploit synergy between the two techniquesbetween the two techniques
Currently using SAT solver technology as a Currently using SAT solver technology as a bridge between the two worldsbridge between the two worlds– SAT solvers provide much greater capacity, reducing SAT solvers provide much greater capacity, reducing
or eliminating the need for problem decompositionor eliminating the need for problem decomposition– Allows us to do bug hunting (falsification)Allows us to do bug hunting (falsification) in addition in addition
to verificationto verification– Use dynamic validation to confirm or refuteUse dynamic validation to confirm or refute counter-counter-
examplesexamples
Page 20
High-Level Formal VerificationHigh-Level Formal Verification Formal Design Process: Formal Design Process: Work with architects & Work with architects &
designers while design is being developeddesigners while design is being developed– Abstract model of architecture built and formally checkedAbstract model of architecture built and formally checked
– Fast focus on algorithm problemsFast focus on algorithm problems
– Intensive and quick feedback for every definition changeIntensive and quick feedback for every definition change
– Find errors in early design approaches Find errors in early design approaches
Goal: Completely verify architectural correctnessGoal: Completely verify architectural correctness– Avoid finding architectural bugs late in design cycle, when fixing Avoid finding architectural bugs late in design cycle, when fixing
them is much more difficult and costlythem is much more difficult and costly
– Drive rigor, completeness and correctness of the high level designDrive rigor, completeness and correctness of the high level design– Speed up design phase based upon complete specificationSpeed up design phase based upon complete specification
– Cleaner, simpler design – fewer wartsCleaner, simpler design – fewer warts
Page 21
Verifying Architectural CorrectnessVerifying Architectural Correctness
Top Level SpecificationTop Level Specification Architectural Feature Model (AFM)Architectural Feature Model (AFM)
– Declare players, describe their roles in upholding top-level Declare players, describe their roles in upholding top-level specificationspecification
– Model check for violationsModel check for violations
Microarchitectural Block Model (UBM)Microarchitectural Block Model (UBM)– Player event oriented detail: case-by-case description of how each Player event oriented detail: case-by-case description of how each
block reacts to communicationsblock reacts to communications
– Verify by inserting UBM into AFMVerify by inserting UBM into AFM
– Fully expose and understand uarch and role in supporting top levelFully expose and understand uarch and role in supporting top level
– Analyze uarch alternatives: UBMs can be developed fairly quicklyAnalyze uarch alternatives: UBMs can be developed fairly quickly
Microarchitectural State Model (USM)Microarchitectural State Model (USM)– One-to-one correspondence with final set of UBMsOne-to-one correspondence with final set of UBMs
– Describe uarch in a state-centric wayDescribe uarch in a state-centric way
Page 22
Future Opportunities & Future Opportunities & ChallengesChallenges
Page 23
Tools & MethodologyTools & Methodology Full verification with SAT-based model checkers Full verification with SAT-based model checkers
– Currently only feasible on simple examplesCurrently only feasible on simple examples– Overlapping loops/feedback in real designs cannot be handledOverlapping loops/feedback in real designs cannot be handled
– Explicit induction schemesExplicit induction schemes Accumulate design coverage for SAT-based techniquesAccumulate design coverage for SAT-based techniques
– Need to integrate formal and dynamic (simulation) coverage to Need to integrate formal and dynamic (simulation) coverage to fully understand coverage holesfully understand coverage holes
Expand formal methods beyond functional correctnessExpand formal methods beyond functional correctness– Other areas of concern include performance, power, timing …Other areas of concern include performance, power, timing …– These areas may be amenable to formal analysisThese areas may be amenable to formal analysis
Other tool opportunities: Other tool opportunities: – Parallel and Distributed CheckersParallel and Distributed Checkers– Debugging, interactive Model CheckingDebugging, interactive Model Checking
Page 24
Methodology driversMethodology drivers RegressionRegression
– RTL is “live”, and changes frequently until the very last stages of RTL is “live”, and changes frequently until the very last stages of the projectthe project
– Model checking automation at lower levels allows regression to Model checking automation at lower levels allows regression to be automated and provides robustness in the face of ECOsbe automated and provides robustness in the face of ECOs
DebuggingDebugging– Need to be able to demonstrate FV counter-examples to Need to be able to demonstrate FV counter-examples to
designers and architects designers and architects – Designers want a dynamic test that they can simulate Designers want a dynamic test that they can simulate – Waveform viewers, schematic browsers, etc. can help to bridge Waveform viewers, schematic browsers, etc. can help to bridge
the gapthe gap Verification in the largeVerification in the large
– Proof design: how do we approach the problem in a systematic Proof design: how do we approach the problem in a systematic fashion?fashion?
– Proof engineering: how do we write maintainable and modifiable Proof engineering: how do we write maintainable and modifiable proofs?proofs?
Page 25
Integrated reasoning within checkerIntegrated reasoning within checker– Current status: Current status:
– Theorem provers have integrated decision procedures, model Theorem provers have integrated decision procedures, model checkers don’t have reasoning capabilitycheckers don’t have reasoning capability
– Not much improvement in exhaustive MC capacity, need Not much improvement in exhaustive MC capacity, need mechanical assistance for problem decompositionmechanical assistance for problem decomposition
– Want lightweight framework to direct MC/BMC proof Want lightweight framework to direct MC/BMC proof strategystrategy
– Case split, abstraction, symmetry detection, “what if”, …Case split, abstraction, symmetry detection, “what if”, …– User guided problem decompositionUser guided problem decomposition
– Standard HDLs make it difficult for FV to automatically identify Standard HDLs make it difficult for FV to automatically identify symmetrysymmetry
Reasoning about changesReasoning about changes– Want to fully verify design during development, but real Want to fully verify design during development, but real
designs go through many iterationsdesigns go through many iterations
ReasoningReasoning
Page 26
Dealing with constantly-changing specificationsDealing with constantly-changing specifications– Specification changes are a reality in designSpecification changes are a reality in design– Properties and proofs should be readily adaptedProperties and proofs should be readily adapted– How to engineer agile and robust regressions?How to engineer agile and robust regressions?
Protocol VerificationProtocol Verification– This problem has always been hardThis problem has always been hard– Getting harder (more MP) and more important (intra-die Getting harder (more MP) and more important (intra-die
protocols make it more expensive to fix bugs)protocols make it more expensive to fix bugs) Verification of embedded softwareVerification of embedded software
– S/W for large SoCs has impact beyond functional S/W for large SoCs has impact beyond functional correctness (power, performance, …)correctness (power, performance, …)
– Not all S/W verification techniques apply because H/W Not all S/W verification techniques apply because H/W abstraction is less feasibleabstraction is less feasible
– One example is microcode verificationOne example is microcode verification
Other ChallengesOther Challenges