1
Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors:Annual Review Presentation – April 2007
Presenters: Ganesh Gopalakrishnan
Xiaofang Chen
School of Computing, University of UtahSalt Lake City, UT
Intel SRC Customization Award2005-TJ-1318
2
Project Personnel
IBM Mentor: Dr. Steven M. German Intel Mentor: Dr. Ching-Tsun Chou Primary Student:
Xiaofang Chen Summer internship planned - IBM T.J. Watson (6/07)
where the research discussed here in Project 2 will be furthered
Other SRC Student: Robert Palmer (work involving TLA+ modeling of
communication libraries) Defense May 10; Expected to join Intel (6/07)
3 other PhD students, 1 MS student, 2 UGs in FV all working on FV of threading / msg-passing software
3
Multicores are the future!Their caches are visibly central…
(photo courtesy of
Intel Corporation.)
> 80% of chipsshipped will bemulti-core
4
…and the number of organizations of multiprocessor caches is mindboggling (e.g. imagine 80 cores and deeper hierarchies).
Interface
L2 Cache+Local Dir
L1 Cach
e
L1 Cach
e
Global Dir
MainMemory
Cluster 2Cluster 1 Cluster 3
Interface
L2 Cache+Local Dir
L1 Cach
e
L1 Cach
e
Interface
L2 Cache+Local Dir
L1 Cach
e
L1 Cach
e
Shared / Private
Inclusive / Exclusive
5
Protocol design happens in “the thick of things” (many interfaces, constraints of performance, power, testability).
From “High-throughput coherence control and hardware messaging in Everest,” by Nanda et.al., IBM J.R&D 45(2), 2001.
6
Future Coherence Protocols
Cache coherence protocols that are tuned for the contexts in which they are operating can significantly increase performance and reduce power consumption [Liqun Cheng]
Producer-consumer sharing pattern-aware protocol [Cheng, HPCA07] 21% speedup and 15% reduction in network traffic
Interconnect-aware coherence protocols [Cheng, ISCA06] Heterogeneous Interconnect Improve performance AND reduce power 11% speedup and 22% wire power savings
Bottom-line: Protocols are going to get more complex!
7
Designers have poor conceptual tools (e.g., “Informal MSC drawings”). Need better notations and tools.
LDirL1-1 GDir
Req_S
(S) (S: L1-1)
L1-2
(I)Swap
Broadcast
NAckFwd_Req
Gnt_S
Gnt_S
(S: L1-2)
8
Design Abstractions in More Modern Flows
An Interleaving Protocol Model (Murphi or TLA+ are the languages of choice here) FV here eliminates concurrency bugs
Detailed HDL model FV here eliminates implementation bugs;
however Correspondence with Interleaving Model is lost
Need more detailed models anyhow Interleaving Models are very abstract
Monolithic Verification of HDL Code Does not Scale Design optimizations captured at HDL level
Interleaving model becomes more obsolete Need an Integrated Flow:
Interleaving -> High level HW View -> Final HDL
9
Related Work in Formal HW Design
BlueSpec High level design is expressed using atomic
transactions Synthesizes high level designs into hardware
implementations Automatic scheduling of high level design steps in
hardware May not meet performance goals
Malik et.al. Formal Architecture and Microarchitecture Modeling for Verification Meant for Instruction Set Processors
Need Formal theory of Refinement from Interleaving to High level HW Models
10
Our Goals Develop Methodology to Verify “Realistic” Interleaving
Models Useful Benchmarks for others Our particular contributions are towards Hierarchical
protocols Largely Inspired by Chou et.al.’s work (FMCAD’04) Xiaofang Chen’s PhD is wrapping up a nice story
here!
Develop Language and Formal Theory for Higher Level HW Specification & Refinement Ideas largely due to German & Janssen Xiaofang Chen’s PhD work is taking ideas from
initial proposal all the way to practical realization!
11
A summary of our work over Y1-2
1. Three progressively better approaches to verify hierarchical cache coherence protocols at the interleaving level
1. A/G method of complementary abstractions (FMCAD’06)2. Extensions to Non-inclusive hierarchies (TR 06-014)3. Abstract each level separately (to be submitted)4. Error-trace checking (to be submitted)
2. A theory of transaction based design and verification (writeup finished; initial experiments finished)
3. Modular verification of transactions (writeup in progress; initial experiments finished)
Number the projects 1.1, 1.2, 1.3, 1.4, 2, and 3
12
Project 1.[1-4] Timeline
1.1: FMCAD’06 results
1.2: Another hierarchical benchmark (non-inclusive)
1.3: Abstraction per level (more scalable)
1.4: Automatic Recognition of spurious/real bugs
13
1.[1-4]: Hierarchical Protocols
RAC
L2 Cache+Local Dir
L1 Cach
e
L1 Cach
e
Global Dir
MainMemory
Home ClusterRemote Cluster 1
Remote Cluster 2
RAC
L2 Cache+Local Dir
L1 Cach
e
L1 Cach
e
RAC
L2 Cache+Local Dir
L1 Cach
e
L1 Cach
e
14
Abstracted Protocol #1
RAC
L2 Cache+Local Dir’
Global Dir
MainMemory
Home Cluster
Remote Cluster 1
Remote Cluster 2
RAC
L2 Cache+Local Dir
L1 Cach
e
L1 Cach
e
RAC
L2 Cache+Local Dir’
15
Abstracted Protocol #2
RAC
L2 Cache+Local Dir’
Global Dir
MainMemory
Home Cluster
Remote Cluster 1
Remote Cluster 2
RAC
L2 Cache+Local Dir
L1 Cach
e
L1 Cach
e
RAC
L2 Cache+Local Dir’
16
Non-Circular Assume/Guarantee
We can’t verify this due to state explosion: h ║ r1 ║ r2 ╞ Coh
Instead Check-1: h ║ R1 ║ R2 ╞ Coh1 Λ Guarant1 Check-2: H ║ r1 ║ R2 ╞ Coh2 Λ Guarant2
17
Protocol features Broadcast channels Non-imprecise local dir
Verification challenges A/G cannot infer local dir from just intra-
clusters Coherence may involve multiple L1
caches
1.2: We applied the non-circular A/G method to a Non-Inclusive Hierarchical Protocol….
18
Verifying Non-Inclusive Protocols
Inferring “L2.State = Excl” from Outside the cluster Inside the cluster
Use history variables to change non-inclusive to inclusive protocols
19
Experimental Results
Protocols # of States Mem (GB)
Model Check
Hierarchy > 1,521,900,000 20 No
Abs-1 234,478,105 20 Y
Abs-2 283,124,383 20 Y
Reduction is over 65%
20
1.3: We then tried a “Split Hierarchy Per Level Approach” to using non-circular A/G
RAC
L2 Cache+Local Dir’
Global Dir
MainMemory
RAC
L2 Cache+Local Dir’
RAC
L2 Cache+Local Dir’
L2 Cache+Local Dir
L1 Cach
e
L1 Cach
e ABS #1
L2 Cache+Local Dir
L1 Cach
e
L1 Cach
eABS #2
ABS #3
21
A Sample Scenario
Home ClusterRemote Cluster 1 Remote Cluster 2
1. Req_Ex
2. Fwd Req_Ex
3. Fwd Req_Ex
4. Fwd Req_Ex
5. Grant
6. Grant
Excl Invld
22
Map to Abstracted ProtocolsRemote Cluster 1 Remote Cluster 2
2. Fwd Req_Ex
3. Fwd Req_Ex
5. Grant
6. Grant
1. Req_Ex4. Fwd Req_Ex
InvldExcl
23
Experimental Results
Protocols # of States Exec time(sec)
Mem (GB)
Model Check
Hierarchy > 438,120,000 >125,799 18 No
Inter 1,500,621 269 2 Y
Intra-1 564,878 48 2 Y
Intra-2 188,842 18 2 Y
Reduction is over 95% !
24
Project 1.4: Automatic Recognition of Spurious / Real Bugs in these approaches
Problem statement Given an error trace of ABS protocol Is it a real bug of the original protocol?
Solution In the original protocol, using BFS to
guide the model checking to match the error trace
Reason: because our abstraction is just projection
25
Basic Idea of Automatic Recognition
v1=0, v2=0
v1=1, v2=2
v1=6, v2=8
……
v1=3, v2=1, v3=0
v1=0, v2=0, v3=0
v1=1, v2=2, v3=1
v1=0, v2=0, v3=3
keep
keep
drop
…………
Error trace of Abs. protocol Directed BFS of original
protocol
26
Y3 Plans for Project 1: Considerable Experience Gained Three Large Benchmark Protocols (each is 3000+ lines
of Murphi Code) on the web
Have Reduced Verif Complexity of Hier Protocols by 90%
Can Identify Spurious Errors Automatically All Finite-state
Not Parameterized No plans for Parameterized
Y3 Plans: Build Tool to support this methodology
27
Summary of Projects 2 and 3
1. Three progressively better approaches to verify hierarchical cache coherence protocols at the interleaving level
1. A/G method of complementary abstractions (FMCAD’06)2. Extensions to deeper, and non-inclusive hierarchies (TR 06-014)3. Latest method that abstracts each level separately (to be
submitted)4. Error-trace checking (to be submitted)
2. A theory of transaction based design and verification (writeup finished)
3. Modular verification of transactions (writeup in progress)
28
Transaction Level HW Modeling
The problem addressed: Bridge the gap between high-level specifications and RTL implementations
Global properties cannot be formally verified at RTL Level!
Specifications can be verified, but do they correctly represent the implementations?
29
Driving Design Benchmark due to German and Geert Janssen
30
What changes when moving from a spec to an implementation?
Atomicity Concurrency Granularity in modeling
1 1.1
1.2
1.3
client home
client
router buffer
home
31
General Mappings between high level transitions and transactions that help implement them
High Level Transition 1
Low Level Transitions that help realize 1
1
1.1 1.2
1.3
High Level Transitions take some non-zero unit of time (conceptual)
Each Low Level Transition takesOne Clock Cycle
32
High-Level and Low-Level Computations
1
1.1 1.2
1.3
2 3
2.1 2.2 3.1
3.2
3.3
33
Specification of High and Low Levels
1
1.1 1.2
1.3
In Murphi as a Guard Action Rule
In HMurphi as Multiple Guard Action Rulesenclosed in a Begin Transaction / End Transaction
The Guards Decide when each low level transition can fire
The Maximal Number of Low Level Transitions Enabledin any state are concurrently fired within each clock tick
34
Transaction
A transaction is a set of transitions in Impl that correspond to a transition in Spec
Transaction
Rule 1
……
Rule n
Endtransaction;
35
Executions
Spec: interleaving One enabled transition fires at each step
Impl: concurrent All enabled transitions fire at each step
……1 2 3
……{1.1, 2.1} {1.2} {2.2, 3.1, 3.2}
36
A Few Notations
Observable variables: VH
These are Variables used in both Spec and Impl
Impl has additional internal variables also
A variable v is inactive at a state s if all transactions in Impl that can write to v are quiescent at s
37
A Formal Notion of Simulation
For every concurrent execution of Impl, exists an interleaving execution of Spec, VH ∩ inactive(li) match
…… {…} {…} {…}l0 l1 l2
……t0 t1 t2h0 h1 h2
38
Simulation Checks
Spec(I)
I
Spec(I’)Spec
transition
Impl transaction I’
Guard for Spec transition must hold
I is a reachable state where the commit guard is true
Observable vars changed by either Spec or Impl must match
39
Model Checking Approaches
Monolithic Cross product construction
Compositional Abstraction Assume/Guarantee
40
Compositional Approach
Abstraction Change read to an access of an input var Self-sourced read Add all transitions that write to a var
Assume/Guarantee Require all writes to var guarantee prop P Assume P holds on all reads
41
Example of Abstraction
Transaction … Rule (v1 = d1) => ... …Endtransaction
Transaction 1
Transaction 2
Transaction n
……
42
Example of Assume/Guarantee
…
Transaction 1: Request granted
Transaction 2: Update Cache
State := Excl
Data := d
Impl.State = Spec.State
43
Benchmarks
High level in FMCAD’04 tutorial Low level provided by German and
Janssen Sizes:
1 Home node, 1 remote node
Sizes are constrained by accessible VHDL tools!
44
Implementations
Muv: HMurphi VHDL Written by German
Mud: Static analyzer for possible conflicts /
dependencies VHDL verifier
IBM RuleBase
45
Preliminary Results
Approaches # Flip-Flops
# Gates
Time (min)
Monolithic 212 8574 17
Decomposed W/W
conflicts108 5763 11
closures 89 2194 3
* This is for datapath = 1 bit* Intel Xeon CPU 3.0GHz, 2GB memory
46
When Datapath > 1 bit Cannot check monolithic approach
RuleBase 300 F-F academic license restriction Decomposed approach
W/W checks not affected
Datapath bits # of F-F # of Gates
1 89 2194
2 97 2380
26 289 6659
47
Future Work
Reduce the cost of W/W conflicts checking Localized reasoning
Apply to pipeline More benchmarks Try other VHDL tools
SixthSense etc.
48
Publications, Software, Models FMCAD 2006 paper Presentation at Intel Journal version of hierarchical coherence protocol verification (under
prep) TR on Theory of Transaction Based Specification and Verification
(under prep) Detailed VHDL-level German Protocol developed Analysis Framework for HMurphi Developed Preliminary Verification Experiments using Cadence IFV, IBM
RuleBase, and IBM SixthSense Xiaofang Chen’s Summer Internship at IBM T.J. Watson Res. Ctr. Robert’s SRC Poster Techcon 2007 submission
There will be more publications during 2007-8 following hiatus due to infrastructure build-up (many delays!)