scaling formal methods toward hierarchical protocols in shared memory processors
DESCRIPTION
Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors. GRC CADTS Review, Berkeley, March 18, 2008. Presenters: Ganesh Gopalakrishnan and Xiaofang Chen School of Computing , University of Utah, Salt Lake City, UT 84112 {ganesh, xiachen}@cs.utah.edu - PowerPoint PPT PresentationTRANSCRIPT
Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors
Presenters: Ganesh Gopalakrishnan and Xiaofang ChenSchool of Computing , University of Utah, Salt Lake City, UT 84112
{ganesh, xiachen}@cs.utah.edu
http://www.cs.utah.edu/formal_verification
GRC CADTS Review, Berkeley, March 18, 2008
Supported by SRC Contract TJ-1318 (Intel Customization)
2
Multicores are the future! Their caches are visibly central…
(photo courtesy of
Intel Corporation.)
> 80% of chipsshipped will bemulti-core
3
Hierarchical Cache Coherence Protocols will play a major role in multi-core processors
Chip-level protocols
Inter-cluster protocols
Intra-cluster protocols
dirmem dirmem
…
State Space grows multiplicatively across the hierarchy!
Verification will become harder
4
Protocol design happens in “the thick of things” (many interfaces, constraints of performance, power, testability).
From “High-throughput coherence control and hardware messaging in Everest,” by Nanda et.al., IBM J.R&D 45(2), 2001.
5
Future Coherence Protocols
Cache coherence protocols that are tuned for the contexts in which they are operating can significantly increase performance and reduce power consumption [Liqun Cheng] Producer-consumer sharing pattern-aware protocol [Cheng
et.al, HPCA07] 21% speedup and 15% reduction in network traffic
Interconnect-aware coherence protocols [Cheng et.al., ISCA06] Heterogeneous Interconnect Improve performance AND reduce power 11% speedup and 22% wire power savings
Bottom-line: Protocols are going to get more complex!
6
Main Result #1 : Hierarchical
RAC
L2 Cache+Local Dir
L1 Cache
Main Mem
Home ClusterRemote Cluster 1
Remote Cluster 2
L1 Cache
Global Dir
RAC
L2 Cache+Local Dir
L1 Cache
L1 Cache
RAC
L2 Cache+Local Dir
L1 Cache
L1 Cache
Intra-cluster
Inter-cluster
Developed way to reduce verification complexity of
hierarchical (CMP) protocols using A/G
7
Main Result #2 : Refinement
Developed way to Verify a Proposed Refinement of
ONE unit into its low level (RTL) implementation
8
Main Result #2 : Refinement
Developed way to Verify a Proposed Refinement of
ONE unit into its low level (RTL) implementation
Murphi
9
Main Result #2 : Refinement
Developed way to Verify a Proposed Refinement of
ONE unit into its low level (RTL) implementation
Murphi
10
Main Result #2 : Refinement
Developed way to Verify a Proposed Refinement of
ONE unit into its low level (RTL) implementation
Murphi
HMurphi
11
Differences in Modeling: Specs vs. Impls
home remote
One step in high-level
Multiple steps in low-level
an atomic guarded/command
home
router
buf
remote
12
Our Refinement Check
Spec(I)
I
Spec(I’)Spec
transition
Multi-step Impl
transactionI’
Guard for Spec transition must
hold
I is a reachable Impl state
Observable vars changed
by either must match
13
Workflow of Our Refinement Check
Hardware Murphi
Impl model
Product model in
Hardware Murphi
Product model in VHDL
MurphiSpec model
Property check
Muv
Check implementation meets specification
14
Anticipated Future Result
Developed way to Verify a Proposed Refinement of
the ENTIRE hierarchy
15
Anticipated Future Result
Deal with pipelining
Sequential InteractionPipelined Interaction
16
Anticipated Future Result
Develop ways to “tease apart” protocols that are “blended in”
e.g. for power-down or post-si observability enhancement
More protocols…
.. do they interfere?
17
Basics
PI : Ganesh Gopalakrishnan Industrial Liaisons : Ching Tsun Chou (Intel), Steven M. Geman (IBM),
John W. O’Leary (Intel), Jayanta Bhadra (Freescale), Alper Sen (Freescale), Aseem Maheshwari (TI)
Primary Student : Xiaofang Chen Graduation Date : Writing PhD Dissertation; in the market Other Students :Yu Yang (PhD), Guodong Li (PhD), Michael DeLisi
(BS/MS) Anticipated Results:
Hierarchical : Methodology for Hierarchical (Cache Coherence) Protocol Verification, with Emphasis on Complexity Reduction (was in original SRC proposal)
Refinement : Methodology for Expressing and Verifying Refinement of Higher Level Protocol Descriptions (not in original SRC proposal)
18
Basics
Deliverables (Papers, Software, Xiaofang’s Dissertation) Hierarchical:
Methodology for Applying A/G Reasoning for Complexity Reduction
Verified Protocol Benchmarks – Inclusive, Non-Inclusive, Snoopy (Large Benchmarks)
Automatic Abstraction Tool in support of A/G Reasoning Refinement:
Muv Language Design (for expressing Designs) Refinement Checking Theory and Methodology Complete Muv tool implementation
19
What’s Going On
Accomplishments during the past year Hierarchical:
Finishing Non-inclusive Hierarchical Protocol Verif
Developing and Verifying a Hier. Protocol with a
Snoopy First Level
20
Experimental Results on One Hierarchical Protocol
Model checkpassed
Use mem(GB)
18
18
18
1.8
1.8
1.8
Model checktime (sec)
> 125,410
44,978
66,249
270
50
21
# of states
> 438,120,000
284,088,425
636,613,051
1,500,621
574,198
198,162
Original model
Abs. model 1
Abs. model 2
Abs. model 1
Abs. model 2
Abs. model 3
Monolithicapproach
FMCAD’06approach
HLDVT’07approach
Nonconclusive
Yes
Yes
Yes
Yes
Yes
21
A Snoopy Multicore Protocol
Motivation: Snoop protocols commonly used in 1st level of caches
Have applied our approach on directory protocols
How about snoop protocols?
L1 Cache
L2 Cache
RAC
Global Dir
Main Mem
Cluster 1
L1 Cache L1 Cache
L2 Cache
RAC
Cluster 2
L1 Cache
22
Applying Our Approach
L1 Cache
L2 Cache
Global Dir
Main Mem
Cluster 1
L1 Cache L2 Cache
RAC
Cluster 2
L2 Cache
RAC
Cluster 1
Abstracted protocols
Experimental results
Model checkpassed
Use mem (GB)
1.8
1.8
1.8
Model check time (sec)
86
6
7
# of states
552,375
474
15,371
Original model
Abs. intra
Abs. inter
Monolithicapproach
Our approach
Yes
Yes
Yes
23
What’s Going On
Accomplishments during the past year (contd.) Refinement:
HMurphi was fleshed out in great detail
Most of Muv was implemented (a large portion during
IBM T.J. Watson Internship) – joint work with Steven
German and Geert Janssen
24
What’s Going On
Future directions Hierarchical + Refinement
Develop ways to verify hierarchies of HMurphi modules interacting Pipelining Teasing out protocols supporting non-functional aspects
Power-down protocols Protocols to enhance Post-si Observability
Architectural Characterization How do we describe the “ISA” of future multi-core
machines? How do we make sure that this ISA has no hidden
inconsistencies
25
What’s Going On
Technology Transfer & Industrial Interactions With Liaisons
Publications FMCAD 06, FMCAD 07, HLDVT 07
TECHCON 07 (best session paper award)
Journal paper and Dissertation (under prep)
Request to IBM for Open-sourcing Muv has been
placed
26
Overview of “Hierarchical”
Given a protocol to verify, create a verification
model that models a small number of clusters
acting on a single cache line
Verification Model
Inv P
Home
Remote
Global directory
27
2. Exploit Symmetries
Model “home” and the two “remote”s (one remote,
in case of symmetry)
Verification Model
Inv P
28
3. Initial abstraction will be extreme; slowly back-off from this extreme…
Inv P1 Inv P2
Inv P3
P1 fails
Diagnose failure
Bug
report to user
False Alarm
Diagnose where guard
is overly weak
Add Strengthening Guard
Introduce Lemma to ensure
Soundness of Strengthening
29
Overview of Theory Involved
rule g1 ==> a1;
rule g2 ==> a2;
invariant P;rule g1 ==> a1;
rule g2 /\ cond2 ==> a2;
invariant P /\ (g1 => cond1);
rule g1 /\ cond1 ==> a1;
rule g2 ==> a2;
invariant P /\ (g2 => cond2);
30
3. Create Abstract Models (three models in this example)
Inv P
Inv P1 Inv P2
Inv P3
31
Step 1 of Refinement
Inv P1 Inv P2
Inv P3
Inv P1 Inv P2
Inv P3’
32
Step 2 of Refinement
Inv P1 Inv P2
Inv P3
Inv P1 Inv P2
Inv P3’
Inv P1 Inv P2’
Inv P3’
33
Final Step of Refinement
Inv P1 Inv P2
Inv P3
Inv P1 Inv P2
Inv P3’
Inv P1’ Inv P2’
Inv P3’
Inv P1 Inv P2’
Inv P3’’
34
Detailed Presentation of Refinement
Note: Three examples have been presented in full detail at
http://www.cs.utah.edu/formal_verification/muv
35
Our Approach of Refinement Check
Hardware Murphi
Impl model
Product model in
Hardware Murphi
Product model in VHDL
MurphiSpec model
Property check
Muv
Check implementation meets specification
36
Basic Features of Hardware Murphi vs Murphi
…
signal s1, s2 …
s1 <= …
chooserule rules; end; …
firstrule rules; end; …
transaction
rule-1; rule-2; …
end; …
37
Language Extensions to Hardware Murphi (I)
--include spec.m
correspondence
u1[0..7] :: v1[1..8]; u1 :: v2; end;
Directives
Joint variables correspondence
38
Language Extensions to Hardware Murphi (II)
transactionset p1:T1; p2:T2 do
transaction …
end;
Transactionset
rule:id guard ==> action;
ruleset p1:T1; p2:T2 do
rule:id …
end;
Rules with IDs
39
Language Extensions to Hardware Murphi (III)
<< id.guard() >>;
<< id.action() >>;
<< id[v1][v2].guard() >>; …
Execute a rule by ID
var[i] <:= data;
Fine-grained assignments for write-write conflicts
40
var spec_nodes: array [node_id] of spec_node_type;
startstate "initialize" for i: node_id do reset_memory_type( spec_nodes[i].memory); … end;end;
ruleset src: node_id; ch: chan_id; dest: node_id dorule:R1 "1. Transfer msg from src via ch" spec_nodes[src].outchan[ch].valid & dest = spec_nodes[src].outchan[ch].msg.dest & ! spec_nodes[dest].inchan[ch].valid==>begin spec_nodes[dest].inchan[ch] := spec_nodes[src].outchan[ch]; reset_outchan( spec_nodes[src].outchan[ch]);endrule;endruleset;
...
const num_nodes: 2; num_addr: 2; ...
type cache_state: enum {cache_invalid, cache_shared, cache_exclusive}; ...
var msg_0: message_type; ...
signal random_data: node_memory_type;
procedure reset_data_type(var param: data_type); begin for i: data_range do param[i] := false; end; end; …
function router_turn_next(turn: node_id): node_id; var ret: node_id; begin ... end; …
...
cache_common.m cache_spec.m
41
transactionset src: node_id; dest: node_id dotransaction "transfer msg from src via ch-1"rule "buf deliver msg via chan-1" nodes_internal[dest].buf[1].update & nodes_internal[dest].buf[1].n_valid & nodes_internal[dest].buf[1].n_msg.src = src==>begin nodes[dest].buf[1].msg_buf.msg := nodes_internal[dest].buf[1].n_msg; nodes[dest].buf[1].msg_buf.valid := nodes_internal[dest].buf[1].n_valid;
<< R1[src][1][dest].guard() >>; << R1[src][1][dest].action() >>;endrule;
rule "local reset src node outchan-1" nodes_io[src].local.reset_out1==>begin reset_message_buf_type( nodes[src].local.buf1); reset_transaction;endrule;endtransaction;endtransactionset;...
--include cache_common.m--include cache_spec.m
var router: router_unit_type; nodes: array [node_id] of node_unit_type;
signal nodes_io: nodes_io_type; ...
correspondence "joint vars" spec_nodes[0..1].mem :: nodes[0..1].home.mem; ...end;
assign nodes_io[0].buf[1].data_in <= router_io.chans_out[0][1].msg;
startstate "initialize" router.turn := 0; ...end;
cache_impl.m
42
Our Extensions to Muv
Language extensions support
Automatic assertion generation for refinement Ensure exclusive write to a var in each clock cycle
Serializability check for spec rules
Enableness for spec rules
Joint vars equivalence when inactive
Many done with static analysis
43
Refinement Extensions to Muv (I)
v := d;
for i: s1..s2 do
assert (update_bits[i] = false);
end;
v := d;
for i: s1..s2 do
update_bits[i] := true;
end;
No write-write conflicts
44
Refinement Extensions to Muv (II)
Serializability for specification rules
S0 S1 S0 S1
t1
t2
t3S’1 S’2
t1 t2 t3
Obtain read and write sets of variables of each rule
Analyze read-write dependency
Check for cycles
45
Check for Dependency Cycles
S0 S1 S0 S1
t1
t2
t3S’1 S’2
t1 t2 t3
t3 write v2, read v3
t1 read v1, write v3
t2 write v1, read v2
46
Refinement Extensions to Muv (III)
rule:id
guard action;
bool function id_guard() {…}
void procedure id_action(…) {…}
Enableness of specification rules
<< id.guard() >>;
<< id.action() >>;
assert id_guard();
id_action();
47
Refinement Extensions to Muv (IV)
Joint variables equivalence when inactive
For each joint variable v When all transactions that write to v are inactive
v must be equivalent in Impl and Spec
…
transaction T1 …
transaction T2 …
…
assert
inactive(T1) & inactive(T2)
=>
v = v’;
48
The Cache Coherence Protocol Benchmark
S. German and G. Janssen, IBM Research Tech Report 2006
Buf
Buf
Buf Remote
Dir Cache
Mem
Router
Buf
Buf
Buf
Local
Home
Remote
Dir Cache
Mem
Local
Home
49
Details of the Cache Example
Hardware Murphi model ~2500 LOC
15 transactionsets
Generated VHDL ~1000 assertions, of which ~800 are write-write
conflicts check assertions
Took ~16min with SixthSense for all assertions
Took ~13min w/o write-write conflicts check
50
Bugs Found with Refinement Check
Benchmark satisfies cache coherence already
Bugs still found Bug 1: router unit loses messages
Bug 2: home unit replies twice for one request
Bug 3: cache unit gets updated twice from 1 reply
Refinement check is an automatic way of
constructing such checks
51
Model Checking Approaches
Monolithic Straightforward property check
Compositional Divide and conquer
52
Compositional Refinement Check
Reduce the verification complexity
Basic Techniques Abstraction
Removing details to make verification easier
Assume guarantee A simple form of induction which introduces
assumptions and justifies them
53
Experimental Results
Verification Time
1-bit 10-bit
1-day
Datapath
Configurations 2 nodes, 2 addresses, SixthSense
30 min
Monolithic approach
Compositional approach
54
A Simple 2-Stage Pipelined Stack
pipelined pushes pipelined pops
overlapped pop & push
Push: increase counter + push data
Pop: decrease counter + pop data
55
Future Work
Muv-like refinement check for interaction modules RTL modules interaction via communication
protocols
Interfaces involving buffers and pipelining
Refinement of initial RTL protocols Power-down issues
Post-silicon validation support
Runtime verification support
Safe augmentation of verified protocols
Cheap re-verification