![Page 1: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/1.jpg)
Parallel Methods for Verifying the Consistency of
Weakly-Ordered Architectures
Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader
![Page 2: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/2.jpg)
Challenges of Design Verification
• Contemporary hardware designs require
millions of lines of RTL code
– More lines of code written for verification than for
the implementation itself
• Tradeoff between performance and design
complexity
– Speculative execution, shared caches, instruction
reordering
– Performance wins out
GTC 2016, San Jose, CA 2
![Page 3: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/3.jpg)
Performance vs. Design Complexity
• Programmer burden
– Requires correct usage of
synchronization
• Time to market
– Earlier remediation of bugs is less costly
– Re-spins on tapeout are expensive
• Significant time spent of verification
– Verification techniques are often NP-
complete
GTC 2016, San Jose, CA 3
![Page 4: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/4.jpg)
Memory Consistency Models
• Contract between SW and HW regarding the
semantics of memory operations
• Classic example: Sequential Consistency (SC)
– All processors observe the same ordering of
operations serviced by memory
– Too strict for modern optimizations/architectures
• Nomenclature
– ST[A] → 1 “Wrote a value of 1 to location A”
– LD[B] ← 2 “Read a value of 2 from location B”
GTC 2016, San Jose, CA 4
![Page 5: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/5.jpg)
ARM Idiosyncrasies
• Our focus: ARMv8
• Speculative Execution is allowed
• Threads can reorder reads and writes
– Assuming no dependency exists
• Writes are not guaranteed to be
simultaneously visible to other cores
GTC 2016, San Jose, CA 5
![Page 6: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/6.jpg)
Problem Setup
1. Construct an initial graph
– Vertices represent load, store,
and barrier insts
– Edges represent memory
ordering
• Based on architectural rules
2. Iteratively infer additional
edges to the graph
– Based on existing
relationships
3. Check for cycles
– If one exists: contradiction!GTC 2016, San Jose, CA 6
LD[B] ← 92
LD[A] ← 2
ST[B] → 92
LD[B] ← 93
LD[B] ← 92
CPU 0
CPU 1
ST[B] → 90
• Given an inst. trace from a simulator, RTL, or silicon
![Page 7: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/7.jpg)
TSOtool
• Hangal et al., ISCA ’04
– Designed for SPARC, but portable to ARM
• Each store writes a unique value to memory
– Easily map a load to the store that wrote its data
• Tradeoff between accuracy and runtime
– Polynomial time, but false positives are possible
– If a cycle is found, a bug indeed exists
– If no cycles are found, execution appears consistent
GTC 2016, San Jose, CA 7
![Page 8: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/8.jpg)
Need for Scalability
• Must run many tests to maximize coverage
– Stress different portions of the memory subsystem
• Longer tests put supporting logic in more interesting
states
– Many instructions are required to build history in an LRU
cache, for instance
• Using a CPU cluster does not suffice
– The results of one set of tests dictate the structure of the
ensuing tests
– Faster tests help with interactivity!
• Solution: Efficient algorithms and parallelism
GTC 2016, San Jose, CA 8
![Page 9: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/9.jpg)
Inferred Edge Insertions (Rule 6)
• S can reach X
• X does not load data
from S
GTC 2016, San Jose, CA 9
W: ST[A] → 2
X: LD[A] ← 2
S: ST[A] → 1
![Page 10: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/10.jpg)
Inferred Edge Insertions (Rule 6)
• S can reach X
• X does not load data
from S
• S comes before the
node that stored X’s
data
GTC 2016, San Jose, CA 10
W: ST[A] → 2
X: LD[A] ← 2
S: ST[A] → 1
![Page 11: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/11.jpg)
Inferred Edge Insertions (Rule 7)
• S can reach X
• Loads read data from S, not X
GTC 2016, San Jose, CA 11
L: LD[A] → 1
X: ST[A] → 2
S: ST[A] → 1
M: LD[A] → 1
![Page 12: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/12.jpg)
Inferred Edge Insertions (Rule 7)
• S can reach X
• Loads read data from S, not X
• Loads came before X
GTC 2016, San Jose, CA 12
L: LD[A] → 1
X: ST[A] → 2
S: ST[A] → 1
M: LD[A] → 1
![Page 13: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/13.jpg)
Initial Algorithm for Inferring Edges
for_each(store vertex S)
{
for_each(reachable vertex X from S) //Getting this set is expensive!
{
if(location[S] == location[X])
{
if((type[X] == LD) && (data[S] != data[X]))
{
//Add Rule 6 edge from S to W, the store that X read from
}
else if(type[X] == ST)
{
for_each(load vertex L that reads data from S)
{
//Add Rule 7 edge from L to X
}
} //End if instruction type is store
} //End if location
} //End for each reachable vertex
} //End for each store
GTC 2016, San Jose, CA 13
![Page 14: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/14.jpg)
Virtual Processors (vprocs)
• Split instructions from physical to virtual processors
• Each vproc is sequentially consistent
– Program order ↔ Memory order
GTC 2016, San Jose, CA 14
ST[B] → 91
ST[A] → 1
LD[A] ← 2
ST[B] → 92
VPROC 0
ST[A] → 1
LD[A] ← 2
VPROC 1
VPROC 2
ST[B] → 91
ST[B] → 92
CPU 0
![Page 15: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/15.jpg)
Reverse Time Vector Clocks (RTVC)
• Consider the RTVC of ST[B] = 90
Purple: ST[B] = 92
Blue: NULL
Green: LD[B] = 92
Orange: LD[B] = 92
• Track the earliest
successor from each
vertex to each vproc
– Captures transitivity
GTC 2016, San Jose, CA 15
LD[B] ← 92
LD[A] ← 2
ST[B] → 92
LD[B] ← 93
LD[B] ← 92
CPU 0
CPU 1
ST[B] → 90
Complexity of inferring edges: 𝑂 𝑛2𝑝2𝑑𝑚𝑎𝑥
![Page 16: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/16.jpg)
Updating RTVCs
• Computing RTVCs once is fast
– Process vertices in the reverse
order of a topological sort
– Check neighbors directly, then
their RTVCs
• Every time a new edge is
inserted, RTVC values need to
change
– # of edge insertions ≈ 𝑚
GTC 2016, San Jose, CA 16
• TSOtool implements both vprocs and RTVCs
![Page 17: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/17.jpg)
Facilitating Parallelism
• Repeatedly updating RTVCs is expensive
– For 𝑘 edge insertions, RTVC updates take 𝑂(𝑘𝑝𝑛) time
• 𝑘 = 𝑂 𝑛2 , but usually is a small multiple of 𝑛
• Idea: Update RTVCs once per iteration rather than
per edge insertion
– For 𝑖 iterations RTVC updates take 𝑂(𝑖𝑝𝑛) time
• 𝑖 ≪ 𝑘 (less than 10 for all test cases)
– Less communication between threads
• Complexity of inferring edges: 𝑂(𝑛2𝑝)
GTC 2016, San Jose, CA 17
![Page 18: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/18.jpg)
Correctness
• Inferred edges found by our approach will not be the
same as the edges found by TSOtool
– Might not infer an edge that TSOtool does
• RTVC for TSOtool can change mid-iteration
– Might infer an edge that TSOtool does not
• Our approach will have “stale” RTVC values
• Both approaches make forward progress
– Number of edges monotonically increases
• Any edge inserted by our approach could have been
inserted by the naïve approach [Thm 1]
• If TSOtool finds a cycle, we will also find a cycle [Thm 2]
GTC 2016, San Jose, CA 18
![Page 19: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/19.jpg)
Parallel Implementations
• OpenMP
– Each thread keeps its own partition of added
edges
– After each iteration of inferring edges, reduce
• CUDA
– Assign threads to each store instruction
– Threads independently traverse the vprocs of this
store
– Atomically add edges to a preallocated array in
global memory
GTC 2016, San Jose, CA 19
![Page 20: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/20.jpg)
Experimental Setup
• Intel Core i7-2600K CPU
– Quad core, 3.4GHz, 8MB LLC, 16GB DRAM
• NVIDIA GeForce GTX Titan
– 14 SMs, 837 MHz base clock, 6GB DRAM
• ARM system under test
– Cortex-A57, quad core
• Instruction graphs range from 𝑛 = 218 to 𝑛 = 222
vertices, 𝑛 ≈ 𝑚
– Sparse, high-diameter, low-degree
– Tests vary by their distribution of LD/ST/DMB
instructions, # of vprocs, and inst dependencies
GTC 2016, San Jose, CA 20
![Page 21: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/21.jpg)
Importance of Scaling
GTC 2016, San Jose, CA 21
• 512K instructions
per core
• 2M total
instructions
![Page 22: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/22.jpg)
Speedup over TSOtool (Application)
Graph Size # of tests Lazy RTVC OMP 2 OMP 4 GPU
64K*4 = 256K 27 5.64x 7.62x 9.43x 10.79x
128K*4 = 512K 27 5.31x 7.12x 8.90x 10.76x
256K*4 = 1M 23 6.30x 9.05x 12.13x 15.47x
512K*4 = 2M 10 3.68x 6.41x 10.81x 24.55x
1M*4 = 4M 2 3.05x 5.58x 9.97x 37.64x
GTC 2016, San Jose, CA 22
• GPU is always best; scales much better to larger tests
• Extreme case: 9 hours using TSOtool → under 10
minutes using our GPU approach
• Avg. Parallel speedups over our improved sequential
approach:
– 1.92x (OMP 2), 3.53x (OMP 4), 5.05x (GPU)
![Page 23: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/23.jpg)
Summary
• Relaxing the updates to RTVCs lead to a better
sequential approach and facilitated parallel
implementations
– Trade off between redundant work and parallelism
• Faster execution leads to interactive bug-finding
• The GPU scales well to larger problem instances
– Helpful for corner case bugs that slip through pre-silicon
verification
• For the twelve largest test cases our GPU
implementation achieves a 26.36x average
application speedup
GTC 2016, San Jose, CA 23
![Page 24: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/24.jpg)
Acknowledgments
• Shankar Govindaraju, and Tom Hart for their
help on understanding NVIDIA’s
implementation of TSOtool for ARM
GTC 2016, San Jose, CA 24
![Page 25: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/25.jpg)
Questions
“To raise new questions, new possibilities, to
regard old problems from a new angle, requires
creative imagination and marks real advance in
science.”– Albert Einstein
25GTC 2016, San Jose, CA
![Page 26: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/26.jpg)
Backup
26GTC 2016, San Jose, CA
![Page 27: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/27.jpg)
Sequential Consistency Examples
• Valid
• Invalid
GTC 2016, San Jose, CA 27
P1: ST[x]→1 P2: LD[x]←1 LD[x]←2P3: LD[x]←1 LD[x]←2P4: ST[x]→2
t=0 t=1 t=2
P1: ST[x]→1 P2: LD[x]←1 LD[x]←2P3: LD[x]←2 LD[x]←1P4: ST[x]→2
t=0 t=1 t=2
• ST[x]→1 handled before
ST[x]→2
• Writes propagate to P2
and P3 in a different
order
– Valid for weaker memory
models
![Page 28: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/28.jpg)
Weaker Models
• SC is intuitive, but is too strict
– Prevents common compiler/arch. optimizations
• Commercial products use weaker models
– x86: Total Store Order (TSO)
– Power/ARM: Relaxed Memory Ordering (RMO)
• Weaker models allow for greater optimization
opportunities
– Cost: More complicated semantics
GTC 2016, San Jose, CA 28
![Page 29: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/29.jpg)
Initial Algorithm: Weaknesses
• Expensive to compute
– 𝑂(𝑛3), assuming edges can be inserted in 𝑂(1)time
– Repeated iteratively until a fixed point is reached
• Requires the transitive closure of the graph
– Expensive to store
– Capturing 𝑛2 relationships (does vertex 𝑖 reach
vertex 𝑗?)
• Adds lots of redundant edges
– Should leverage transitivity when possible
GTC 2016, San Jose, CA 29
A
B
C
![Page 30: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/30.jpg)
Reverse Time Vector Clocks (RTVCs)
• vprocs provide implicit orderings
GTC 2016, San Jose, CA 30
ST[B] → 91
ST[B] → 92
ST[A] → 1
![Page 31: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/31.jpg)
Reverse Time Vector Clocks (RTVCs)
• vprocs provide implicit orderings
• Reverse Vector Time Clock
– Track the earliest successor from each vertex to each
vproc
• Bounds the number of reachable edges to be
inspected by 𝑝, the number of vprocs
– No need to compute or store the transitive closure!
GTC 2016, San Jose, CA 31
ST[B] → 91
ST[B] → 92
ST[A] → 1
![Page 32: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/32.jpg)
Reverse Time Vector Clocks (RTVCs)
• Track the earliest successor from each vertex to
each vproc
– Captures transitivity
• Traverse vprocs rather than the graph itself
– No need to check every reachable vertex
• Bounds the number of reachable edges to be
inspected by 𝑝, the number of vprocs
– No need to compute or store the transitive closure!
GTC 2016, San Jose, CA 32
![Page 33: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/33.jpg)
Superfluous work?
• Our approach tends
to add more edges
than TSOtool, some of
which are redundant
– Worst case: 36%
additional edges
– The redundancy is well
worth the
performance benefits
GTC 2016, San Jose, CA 33
![Page 34: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/34.jpg)
Test Info
𝒏 = 𝑽 𝒎 = 𝑬 TSOtool
Inferred
Iterations ST/LD/BAR
(%)
2,097,963 3,799,254 4,487,224 5 76/24/0
2,098,219 3,686,624 4,411,887 4 79/21/0
1,977,832 4,453,340 5,179,108 5 46/53/1
2,097,741 3,875,831 4,635,852 7 77/23/0
1,936,321 5,109,990 5,236,671 5 44/54/2
2,098,321 2,491,062 4,257,077 6 80/20/0
2,097,809 4,321,793 4,404,753 7 78/21/1
1,871,831 3,660,617 4,861,044 6 44/54/2
2,097,809 4,434,120 4,418,555 5 80/20/0
4,195,405 6,934,725 9,338,902 7 76/23/1
4,194,961 7,960,567 8,963,281 6 78/22/0
GTC 2016, San Jose, CA 34
![Page 35: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/35.jpg)
Speedup over TSOtool (Inferring edges)
Graph Size # of tests Lazy RTVC OMP 2 OMP 4 GPU
64K*4 = 256K 27 15.09x 29.31x 53.45x 57.90x
128K*4 = 512K 27 16.41x 31.49x 57.34x 76.98x
256K*4 = 1M 23 14.51x 27.98x 51.68x 72.32x
512K*4 = 2M 10 4.01x 7.52x 14.19x 42.90x
1M*4 = 4M 2 3.08x 5.70x 10.39x 45.16x
GTC 2016, San Jose, CA 35
• Number of tests decreases with test size because
of industrial time constraints
– Motivation for this work
• Avg. Parallel speedups over our improved
sequential approach:
– 1.92x (OMP 2), 3.53x (OMP 4), 5.05x (GPU)
![Page 36: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/36.jpg)
Problem Setup
1. Construct an initial graph
– Vertices represent load, store,
and barrier insts
– Edges represent memory
ordering
• Based on architectural rules
2. Iteratively infer additional
edges to the graph
– Based on existing
relationships
3. Check for cycles
– If one exists: contradiction!GTC 2016, San Jose, CA 36
LD[B] ← 92
LD[A] ← 2
ST[B] → 92
LD[B] ← 93
LD[B] ← 92
CPU 0
CPU 1
ST[B] → 90
• Given an inst. trace from a simulator, RTL, or silicon
![Page 37: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/37.jpg)
Importance of Scaling
GTC 2016, San Jose, CA 37
• 128K instructions
per core
• 512K total
instructions
![Page 38: Parallel Methods for Verifying the Consistency of Weakly ...€¦ · Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader. Challenges of Design Verification •Contemporary](https://reader033.vdocuments.net/reader033/viewer/2022051910/600041dad70242242108d8d1/html5/thumbnails/38.jpg)
Importance of Scaling
GTC 2016, San Jose, CA 38
• 256K instructions
per core
• 1M total
instructions