![Page 1: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/1.jpg)
Mark BattyUniversity of Kent
Mechanised industrial concurrency specification:C/C++ and GPUs
![Page 2: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/2.jpg)
It is time for mechanised industrial standards
2
Specifications are written in English prose: this is insufficient
Write mechanised specs instead (formal, machine-readable, executable)
This enables verification, and can identify important research questions
Writing mechanised specifications is practical now
![Page 3: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/3.jpg)
A case study:industrial concurrency specification
3
![Page 4: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/4.jpg)
Multiple threads communicate through a shared memory
Shared memory concurrency
4
…Thread Thread
Shared memory
…
![Page 5: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/5.jpg)
Multiple threads communicate through a shared memory
Most systems use a form of shared memory concurrency:
Shared memory concurrency
5
…Thread Thread
Shared memory
…
![Page 6: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/6.jpg)
An example programming idiom
6
…Thread 1 Thread 2
data, flag, r
…
Thread 1:
data = 1;flag = 1;
Thread 2:
while (flag==0) {};r = data;
data, flag, r initially zero
In the end r==1
Sequential consistency:simple interleaving ofconcurrent accesses
Reality: more complex
![Page 7: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/7.jpg)
An example programming idiom
7
…Thread 1 Thread 2
data, flag, r
…
Thread 1:
data = 1;flag = 1;
Thread 2:
while (flag==0) {};r = data;
data, flag, r initially zero
In the end r==1
Sequential consistency:simple interleaving ofconcurrent accesses
Reality: more complex
![Page 8: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/8.jpg)
Memory is slow, so it is optimised (buffers, caches, reordering…)
e.g. IBM’s machines allow reordering of unrelated writes
(so do compilers, ARM, Nvidia…)
Sometimes, in the end r==0, a relaxed behaviour
Many other behaviours like this, some far more subtle, leading to trouble
Relaxed concurrency
8
Thread 1:
data = 1;flag = 1;
Thread 2:
while (flag==0) {};r = data;
data, flag, r initially zero
In the end r==1
![Page 9: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/9.jpg)
Memory is slow, so it is optimised (buffers, caches, reordering…)
e.g. IBM’s machines allow reordering of unrelated writes
(so do compilers, ARM, Nvidia…)
Sometimes, in the end r==0, a relaxed behaviour
Many other behaviours like this, some far more subtle, leading to trouble
Relaxed concurrency
9
Thread 1:
flag = 1;data = 1;
Thread 2:
while (flag==0) {};r = data;
data, flag, r initially zero
In the end r==1
![Page 10: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/10.jpg)
Relaxed behaviour leads to problems
10
Power/ARM processors:unintended relaxed behaviourobservable on shipped machines
[AMSS10]
Bugs in deployed processors
Many bugs in compilers
Bugs in language specifications
Bugs in operating systems
![Page 11: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/11.jpg)
Relaxed behaviour leads to problems
11
Errors in key compilers (GCC, LLVM): compiled programs could behave outside of spec.
[MPZN13, CV16]
Bugs in deployed processors
Many bugs in compilers
Bugs in language specifications
Bugs in operating systems
![Page 12: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/12.jpg)
Relaxed behaviour leads to problems
12
The C and C++ standards had bugs that made unintended behaviour allowed.
[BOS+11, BMN+15]
Bugs in deployed processors
Many bugs in compilers
Bugs in language specifications
Bugs in operating systems
![Page 13: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/13.jpg)
Relaxed behaviour leads to problems
13
Confusion among operating system engineers leads tobugs in the Linux kernel
[McK11, SMO+12]
Bugs in deployed processors
Many bugs in compilers
Bugs in language specifications
Bugs in operating systems
![Page 14: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/14.jpg)
Relaxed behaviour leads to problems
14
Bugs in deployed processors
Many bugs in compilers
Bugs in language specifications
Bugs in operating systems
Current engineering practice is severely lacking!
![Page 15: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/15.jpg)
Vague specifications are at fault
15
Relaxed behaviours are subtle, difficult to test for and often unexpected, yet allowed for performance
Specifications try to define what is allowed, but English prose is untestable, ambiguous, and hides errors
![Page 16: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/16.jpg)
A diverse and continuing effort
16
Build mechanised executable formal models of specifications
[AFI+09,BOS+11,BDW16][FGP+16,LDGK08,OSP09]
Modelling of hardware and languages
Simulation tools and reasoning principles
Empirical testing of current hardware
Verification of language design goals
Test and verify compilers
Feedback to industry: specs and test suites
![Page 17: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/17.jpg)
A diverse and continuing effort
17
Provide tools to simulate the formal models, to explain their behaviours to non-experts
Provide reasoning principles to help in the verification of code
[BOS+11,SSP+,BDG13]
Modelling of hardware and languages
Simulation tools and reasoning principles
Empirical testing of current hardware
Verification of language design goals
Test and verify compilers
Feedback to industry: specs and test suites
![Page 18: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/18.jpg)
A diverse and continuing effort
18
Run a battery of tests to understand the observable behaviour of the system and check it against the model
[AMSS’11]
Modelling of hardware and languages
Simulation tools and reasoning principles
Empirical testing of current hardware
Verification of language design goals
Test and verify compilers
Feedback to industry: specs and test suites
![Page 19: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/19.jpg)
A diverse and continuing effort
19
Explicitly stated design goals should be proved to hold
[BMN+15]
Modelling of hardware and languages
Simulation tools and reasoning principles
Empirical testing of current hardware
Verification of language design goals
Test and verify compilers
Feedback to industry: specs and test suites
![Page 20: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/20.jpg)
A diverse and continuing effort
20
Test to find the relaxed behaviours introduced by compilers and verify that optimisations are correct
[MPZN13, CV16]
Modelling of hardware and languages
Simulation tools and reasoning principles
Empirical testing of current hardware
Verification of language design goals
Test and verify compilers
Feedback to industry: specs and test suites
![Page 21: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/21.jpg)
A diverse and continuing effort
21
Specifications should be fixed when problems are found
Test suites can ensure conformance to formal models
[B11]
Modelling of hardware and languages
Simulation tools and reasoning principles
Empirical testing of current hardware
Verification of language design goals
Test and verify compilers
Feedback to industry: specs and test suites
![Page 22: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/22.jpg)
A diverse and continuing effort
22
Modelling of hardware and languages
Simulation tools and reasoning principles
Empirical testing of current hardware
Verification of language design goals
Test and verify compilers
Feedback to industry: specs and test suites
I will describe my part:
![Page 23: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/23.jpg)
The C and C++ memory model
23
![Page 24: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/24.jpg)
Acknowledgements
24
S. Owens
S. Sarkar P. Sewell T. Weber
K. MemarianM. Dodds A. Gotsman K. Nienhuis
J. Pichon-Pharabod
![Page 25: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/25.jpg)
The medium for system implementation
Defined by WG14 and WG21 of the International Standards Organisation
The ’11 and ’14 revisions of the standards define relaxed memory behaviour
I worked with WG21, formalising and improving their concurrency design
C and C++
25
![Page 26: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/26.jpg)
The medium for system implementation
Defined by WG14 and WG21 of the International Standards Organisation
The ’11 and ’14 revisions of the standards define relaxed memory behaviour
We worked with the ISO, formalising and improving their concurrency design
C and C++
26
![Page 27: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/27.jpg)
C++11 concurrency design
A contract with the programmer: they must avoid data races, two threads competing for simultaneous access to a single variable
Beware:Violate the contract and the compiler is free to allow anything: catch fire!
27
Thread 1:
data = 1;
Thread 2:
r = data;
data initially zero
![Page 28: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/28.jpg)
C++11 concurrency design
A contract with the programmer: they must avoid data races, two threads competing for simultaneous access to a single variable
Beware:Violate the contract and the compiler is free to allow anything: catch fire!
28
Thread 1:
data = 1;
Thread 2:
r = data;
data initially zero
![Page 29: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/29.jpg)
C++11 concurrency design
A contract with the programmer: they must avoid data races, two threads competing for simultaneous access to a single variable
Beware:Violate the contract and the compiler is free to allow anything: catch fire!
Atomics are excluded from the requirement, and can order non-atomics, preventing simultaneous access and races
29
Thread 1:
data = 1;
Thread 2:
r = data;
data initially zero
![Page 30: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/30.jpg)
C++11 concurrency design
A contract with the programmer: they must avoid data races, two threads competing for simultaneous access to a single variable
Beware:Violate the contract and the compiler is free to allow anything: catch fire!
Atomics are excluded from the requirement, and can order non-atomics, preventing simultaneous access and races
30
Thread 1:
data = 1;flag = 1;
Thread 2:
while (flag==0) {};r = data;
data, r, atomic flag, initially zero
![Page 31: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/31.jpg)
Design goals in the standard
31
The design is complex but the standard claims a powerful simplification:
C++11/14: §1.10p21It can be shown that programs that correctly use mutexes and memory_order_seq_cst operations to prevent all data races and use no other synchronization operations behave [according to] “sequential consistency”.
This is the central design goal of the model, called DRF-SC
![Page 32: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/32.jpg)
32
Compilers like GCC, LLVM map C/C++ to pieces of machine code
C/C++ Power ARM x86
Load acquire ld; cmp; bc; isync ldr; dmb MOV (from memory)
Implicit design goals
Each mapping should preserve the behaviour of the original program
Power ARMx86
C/C++11
![Page 33: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/33.jpg)
33
A mechanised formal model, close to the standard text
In total, several thousand lines of Lem [MOG+14]
We formalised a draft of the standard
C++11 standard §1.10p12:An evaluation A happens before an evaluation B if:
• A is sequenced before B, or • A inter-thread happens before B.
The implementation shall ensure that no program execution demonstrates a cycle in the “happens before” relation.
The corresponding formalisation:let happens_before sb ithb = sb ∪ ithb
let consistent_hb hb = isIrreflexive (transitiveClosure hb)
![Page 34: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/34.jpg)
Issues were discussed in N-papers and Defect Reports
Communication with WG21 and WG14
4/3/2016 3057: Explicit Initializers for Atomics
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3057.html 1/5
Explicit Initializers for AtomicsISO/IEC JTC1 SC22 WG21 N3057 = 10-0047 - 2010-03-11
Paul E. McKenney, [email protected] Mark Batty, [email protected] Clark Nelson, [email protected] N.M. Maclaren, [email protected] Hans Boehm, [email protected] Anthony Williams, [email protected] Peter Dimov, [email protected] Lawrence Crowl, [email protected], [email protected]
IntroductionMark Batty recently undertook a partial formalization of the C++ memory model, which Marksummarized in N2955. This paper summarizes the discussions on Mark's paper, both verbal andemail, recommending appropriate actions for the Library Working Group. Core issues are dealt within a companion N3074 paper.
This paper is based on N3045, and has been updated to reflect discussions in the Concurrencysubgroup of the Library Working Group in Pittsburgh. This paper also carries the C-language side ofN3040, which was also discussed in the Concurrency subgroup of the Library Working Group inPittsburgh.
Library Issues
Library Issue 1: 29.3p1 Limits to Memory-Order Relaxation (Non-Normative)
Add a note stating that memory_order_relaxed operations must maintain indivisibility, as describedin the discussion of 1.10p4. This must be considered in conjunction with the resolution to LWG 1151,which is expected to be addressed by Hans Boehm in N3040.
Library Issue 2: 29.3p11 Schedulers, Loops, and Atomics (Normative)
The second sentence of this paragraph, “Implementations shall not move an atomic operation out ofan unbounded loop”, does not add anything to the first sentence, and, worse, can be interpreted asrestricting the meaning of the first sentence. This sentence should therefore be deleted. The LibraryWorking Group discussed this change during the Santa Cruz meeting in October 2009, and agreedwith this deletion.
Library Issue 3: 29.5.1 Uninitialized Atomics and C/C++ Compatibility
(Normative)
This topic was the subject of a spirited discussion among a subset of the participants in the C/C++-compatibility effort this past October and November.
Unlike C++, C has no mechanism to force a given variable to be initialized. Therefore, if C++ atomicsare going to be compatible with those of C, either C++ needs to tolerate uninitialized atomic objects,or C needs to require that all atomic objects be initialized. There are a number of cases to consider:
![Page 35: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/35.jpg)
Major problems fixed, key properties verified
35
DRF-SC:
The central design goal, was false, the standard permitted too much
Fixed the model and then proved (in HOL4) that the goal is now true
Fixes were incorporated, pre-ratification, and are in C++11/14
Compilation mappings:
Efficient x86, Power mappings are sound [BOS+11,BMO+12,SMO+12]
Reasoning:
Developed a reasoning principle for proving programs correct [BDO13]
![Page 36: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/36.jpg)
A fundamental problem uncovered
36
// Thread 1r1 = x;if(r1==1) y = 1;
// Thread 2r2 = y;if(r2==1) x = 1;
x, y, r1, r2 initially zero
Can we observe r1==1, r2==1 at the end?
![Page 37: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/37.jpg)
The write of y is dependent on the read of x
The write of x is dependent on the read of y
This will never occur in compiled code, and ought to be forbidden
“[ Note: […] However, implementations should not allow such behavior. — end note ]”
The ISO: notes carry no force, and “should” imposes no constraint
37
// Thread 1r1 = x;if(r1==1) y = 1;
// Thread 2r2 = y;if(r2==1) x = 1;
x, y, r1, r2 initially zero
Can we observe r1==1, r2==1 at the end?
A fundamental problem uncovered
![Page 38: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/38.jpg)
The write of y is dependent on the read of x
The write of x is dependent on the read of y
This will never occur in compiled code, and ought to be forbidden
“[ Note: […] However, implementations should not allow such behavior. — end note ]”
The ISO: notes carry no force, and “should” imposes no constraint
38
// Thread 1r1 = x;if(r1==1) y = 1;
// Thread 2r2 = y;if(r2==1) x = 1;
x, y, r1, r2 initially zero
Can we observe r1==1, r2==1 at the end?
A fundamental problem uncovered
![Page 39: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/39.jpg)
The write of y is dependent on the read of x
The write of x is dependent on the read of y
This will never occur in compiled code, and ought to be forbidden
“[ Note: […] However, implementations should not allow such behavior. — end note ]”
The ISO: notes carry no force, and “should” imposes no constraint
39
// Thread 1r1 = x;if(r1==1) y = 1;
// Thread 2r2 = y;if(r2==1) x = 1;
x, y, r1, r2 initially zero
Can we observe r1==1, r2==1 at the end?
A fundamental problem uncovered
![Page 40: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/40.jpg)
The write of y is dependent on the read of x
The write of x is dependent on the read of y
This will never occur in compiled code, and ought to be forbidden
“[ Note: […] However, implementations should not allow such behavior. — end note ]”
ISO: notes carry no force, and “should” imposes no constraint, so yes!
40
// Thread 1r1 = x;if(r1==1) y = 1;
// Thread 2r2 = y;if(r2==1) x = 1;
x, y, r1, r2 initially zero
Can we observe r1==1, r2==1 at the end?
A fundamental problem uncovered
![Page 41: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/41.jpg)
The write of y is dependent on the read of x
The write of x is dependent on the read of y
This will never occur in compiled code, and ought to be forbidden
“[ Note: […] However, implementations should not allow such behavior. — end note ]”
ISO: notes carry no force, and “should” imposes no constraint, so yes!
41
// Thread 1r1 = x;if(r1==1) y = 1;
// Thread 2r2 = y;if(r2==1) x = 1;
x, y, r1, r2 initially zero
Can we observe r1==1, r2==1 at the end?
A fundamental problem uncovered
Why? Dependencies are ignored to allow dependency-removing optimisations
Should respect the left-over dependencies
We have proved that no fix exists in the structure of the current specification
This identifies a difficult research problem
![Page 42: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/42.jpg)
Timing was everything
42
Achieved direct impact on the standard
C++11 was a major revision, so the ISO was receptive to change
Making this work was partly a social problem
![Page 43: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/43.jpg)
GPU concurrency
43
![Page 44: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/44.jpg)
44
J. Alglave
T. Sorensen
A. Donaldson
J. WickersonD. Poetzl
Acknowledgements
G. Gopalakrishnan
J. Ketema
B. Beckmann
![Page 45: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/45.jpg)
Alternate design path: throughput over latency, thousands of threads
Forecast for use in critical applications: AUDI-Nvidia Drive Partnership
Hardware and specs under rapid development (computing only 10 years old)
An opportunity for lightweight verification at the design phase
Graphics processors
45
![Page 46: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/46.jpg)
Many fronts of progress
46
Empirical testing of GPU behaviour
Refinement of an AMD GPU design
Formalisation of OpenCL concurrency
Direct engagement with Nvidia
Observed ‘surprising’ relaxed behaviours that break algorithms in the literature
e.g. Cederman and Tsigas queue
Same for programming idioms in vendor-supported tutorials
[ABD+15]
![Page 47: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/47.jpg)
47
Empirical testing of GPU behaviour
Refinement of an AMD GPU design
Formalisation of OpenCL concurrency
Direct engagement with Nvidia
Direct collaboration with AMD
Modelled a prototype GPU design
Found bugs, refined the design
Early concept, so change is cheap
[WBDB15]
Many fronts of progress
![Page 48: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/48.jpg)
48
Empirical testing of GPU behaviour
Refinement of an AMD GPU design
Formalisation of OpenCL concurrency
Direct engagement with Nvidia
OpenCL is an extension of C11 to CPU-GPU systems
Extended C11 model to OpenCL
Verified AMD compiler mapping
[BDW16,WBDB15]
Many fronts of progress
![Page 49: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/49.jpg)
49
Empirical testing of GPU behaviour
Refinement of an AMD GPU design
Formalisation of OpenCL concurrency
Direct engagement with Nvidia
Helping to develop internal specification for next-gem architecture
Verifying compilation mapping in HOL4 theorem prover
Many fronts of progress
![Page 50: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/50.jpg)
Conclusion
50
Mechanised industrial specification is practical and can have major impact
It can guide us to future research questions
This is a necessary step in formal verification
Formalisation can inform good hardware and language specifications
![Page 51: Mechanised industrial concurrency specification: C/C++ and ... · [MPZN13, CV16] Modelling of hardware and languages Simulation tools and reasoning principles ... This paper is based](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3a1dffa237b8590c1ecf01/html5/thumbnails/51.jpg)
51
Bibliography
[ABD+15] J. Alglave, M. Batty, A. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, J. Wickerson. GPU concurrency: weak behaviours and programming assumptions. ASPLOS’15
[AFI+09] J. Alglave, A. Fox, S. Ishtiaq, M. O. Myreen, S. Sarkar, P. Sewell, and F. Zappa Nardelli. The semantics of Power and ARM multiprocessor machine code. DAMP’09
[AMSS10] J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Fences in weak memory models. CAV’10.
[AMSS’11] J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Litmus: Running tests against hardware. TACAS’11/ETAPS’11.
[B11] P. Becker, editor. Programming Languages — C++. 2011. ISO/IEC 14882:2011. A non-final version is available at http://www.open-std.org/jtc1/sc22/ wg21/docs/papers/2011/n3242.pdf.
[BDG13] M. Batty, M. Dodds, A. Gotsman. Library Abstraction for C/C++ Concurrency. POPL’13
[BDW16] M. Batty, A. Donaldson, J. Wickerson. Overhauling SC atomics in C11 and OpenCL. POPL’16
[BMN+15] M. Batty, K. Memarian, K. Nienhuis, J. Pichon, P. Sewell. The Problem of Programming Language Concurrency Semantics. ESOP’15
[BMO+12] M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell. Clarifying and compiling C/C++ concurrency: from C++0x to POWER. POPL’12
[BOS+11] M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C++ concurrency. POPL’11
[CV16] S. Chakraborty, V. Vafeiadis. Validating optimizations of concurrent C/C++ programs. CGO’16
[FGP+16] S. Flur, K. E. Gray, C. Pulte, S. Sarkar, A. Sezgin, L. Maranget, W. Deacon, P. Sewell. Modelling the ARMv8 Architecture, Operationally: Concurrency and ISA. PLDI’16
[LDGK08] G. Li, M. Delisi, G. Gopalakrishnan, and R. M. Kirby. Formal specification of the mpi-2.0 standard in tla+. PPoPP’08
[McK11] P. E. McKenney. [patch rfc tip/core/rcu 0/28] preview of RCU changes for 3.3, November 2011. https://lkml.org/lkml/2011/11/2/363
[MOG+14] D. P. Mulligan, S. Owens, K. E. Gray, T. Ridge, and P. Sewell. Lem: reusable engineering of real-world semantics. ICFP ’14
[MPZN13] R. Morisset, P. Pawan, F. Zappa Nardelli. Compiler testing via a theory of sound optimisations in the C11/C++11 memory model. PLDI’13
[OSP09] S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. TPHOLS’09.
[SMO+12] S. Sarkar, K. Memarian, S. Owens, M. Batty, P. Sewell, L. Maranget, J. Alglave, and D. Williams. Synchronising C/C++ and POWER. PLDI’12
[SSP+] S. Sarkar, P. Sewell, P. Pawan, L. Maranget, J. Alglave, D. Williams, F. Zappa Nardelli. The PPCMEM Web Tool. www.cl.cam.ac.uk/~pes20/ppcmem/
[WBDB15] J. Wickerson. M. Batty, B. Beckmann, A. Donaldson. Remote-Scope Promotion: Clarified, Rectified, and Verified. OOPSLA’15