a framework for reasoning about inherent parallelism in modern object-oriented languages
DESCRIPTION
A Framework for Reasoning About Inherent Parallelism in Modern Object-Oriented Languages. Presented by A. Craik ( 5 -Jan-12). Research supported by funding from Microsoft Research and the Queensland State Government. Introduction. Semantic Analysis. Dependency Analysis. - PowerPoint PPT PresentationTRANSCRIPT
1
A Framework for Reasoning About Inherent Parallelism in Modern
Object-Oriented Languages
Presented by A. Craik (5-Jan-12)
Research supported by funding from Microsoft Research and the Queensland State Government
Introduction
2
Procedural Algorithm
Sequential Implementation w/ Injected Parallelism
Procedural Algorithm
Sequential Implementation
ParallelAlgorithm
Explicitly Parallel Implementation
SemanticAnalysis
DependencyAnalysis
2
3
• Inherent Parallelism:a = 1;b = 2;c = a + b;
• Three steps for finding & exploiting:1. Find the inherent parallelism in the
program2. Decide which inherent parallelism is
worth exploiting3. Choose an implementation technology
to expose the selected parallelism
Introduction
for (int i=0; i<max; ++i) a[i] = a[i] + 1;
4
• Dependencies impose ordering constraints
• Sequential consistency required• Two forms
– Control – which statements will run– Data – reads & writes of shared state
• Control well studied and easier to handle inter-procedurally– Example, Java checked exceptions
Introduction
5
• Flow Dependence (Write-After-Read)int a = 1;int b = a + 1;a = 2;
• Output Dependence (Write-After-Write)int a = 1;a = 4;a = 5;
• Anti-Dependence (Read-After-Write)int a = 1;a = 2;int b = a + 1;
Data Dependencies
6
for (int i=0; i < 3; ++i) {
for (int j=0; j < i+1; ++j) {
a[i,j] = b[i,j] + c[i,j];
b[i,j] = a[i,j+1];
}
}
• Pair-wise analysis of statements and expressions
• Can a, b or c refer to the array?
Traditional Approach
7
for (int i=0; i < 3; ++i) {
for (int j=0; j < i+1; ++j) {
a[i,j] = b[i,j] + c[i,j];
b[i,j] = a.readIandJInc(i,j);}
}
• What does a.readIandJInc(i,j) do?• Examine ALL possible
implementations!
Traditional Approach
8
class Holder {
public static int value;
}
class Array {
public int readsIandJInc(i,j) {
return this[i,j+1];
}
}
Side-Effects
9
class Holder {
public static int value;
}
class Array {
public int readsIandJInc(i,j) {
this[0,0] = i + j;
return this[i,j];
}
}
Side-Effects
10
class Holder {
public static int value;
}
class Array {
public int readsIandJInc(i,j) {
Holder.value++;
return this[i,j];
}
}
Side-Effects
11
Traditional Approach My ApproachKernels Less precise
Inter-procedural
Limitations of Current Techniques
• Traditional:• Focused on analyzing complex
tight loops• Poor abstraction and composition• Too complex for programmers
to use without tool support
12
• Goal:– Simplify inter-procedural dependency
analysis
• Idea:– Ensure safety– Make reasoning modular and
composable
The Idea
13
• Specify effects on method signature: public int getReads()
reads<> writes<>
• What goes in the angle brackets?– Abstract effect description– Composable descriptions– Verifiable
The Idea
14
The Idea
15
• Encapsulation representation hierarchy
Object-Orientation
Personname
dateOfBirth
employer
String
Date
Company
16
The Idea
17
• Can 2 arbitrary pieces of code execute in parallel safely?
• Type rules specify computation of effect sets
• Look for overlaps in the read & write effect sets to find possible data deps.
Safe ParallelismBlock 1 {...}
reads <a,b> writes <c,d>
Block 2 {...}
reads <w,x> writes <y,z>
18
• Dependency exists where two triangles of representation overlap
• Triangles can only be nested:
• Becomes a check for a parent-child relationship; disjointess no dep.
Dependencies using Effect Sets
19
• Task Parallelism– Run 2+ separate ops. at same time
• Loop Parallelism– Execute loop iterations in parallel
• Pipeline Parallelism– Stage loop body execution so that
iteration execution overlaps safely
Types of Parallelism
20
class Demo {void op1() reads<a,b> writes<c,d> {…}void op2() reads<w,x> writes<y,z> {…}
}
• Can we execute calls to op1 and op2 in parallel?
• Determine the overlap in the effect sets; no overlap no data deps.
• Realization using one-way calls or futures
Task Parallelism
21
• Data parallel loops major source of parallelism in imperative programs
• Start with simple data parallel loop in the form of a foreach loop:
foreach (T element in collection) element.operation();
Loop Parallelism Conditions
22
• Condition 1:Areas holding the representations of the objects returned by the enumerator are all disjoint from one another
Foreach Loop Conditions
23
• Condition 2:The operation only mutates the representation of its “own” element and does not read the state owned by any of the other elements
Foreach Loop Conditions
24
• Condition 3:There are no control dependencies which would prevent loop parallelization
Foreach Loop Conditions
25
• So far we have looked atforeach(T element in collection)element.operation();
• Question: How do we generalize this to an arbitrary loop body?foreach(T element in collection) {
//sequence of statements //including local var defs//and a read of a context r
}
Arbitrary Loop Bodies
26
• Loop becomes:foreach (T elem in collection) elem.loopBody(this);
• Where loopBody is:class T { void loopBody(Foo me) {
//same sequence of statements //replace all elem by this //and all this by me
}
}
Loop Body Rewriting
27
• Encapsulation representation hierarchy
Object-Orientation
Personname
dateOfBirth
employer
String
Date
Company
28
• Designed to enforce encapsulation• Adapted to validate encapsulation• Type parameters to capture
memory referencing permissionsclass Person [o,c] {private String|this| Name;private Date|this| DateOfBirth;private Company|c| Employer;…
}
Ownership Types
29
class Company[o] {public string name;…
}
class Person[o,c] {private Company|c| Employer;
public string employerName()reads<this,c> writes<>
{return Employer.name;}…
}
Ownerships & Effects
30
• Analyze & apply sufficient conditions
• All pairs of context relations need to be known
• Need some basis to believe the relationships between contexts to hold
Contexts and Dependencies
31
• Statically know some relationships– The owner of an object is a parent of
the object’s this context– The world context is a parent of all
contexts
• Relationship may only be known dynamically
• Optionally track at runtime to allow runtime conditions
Reasons for a Runtime System
32
Conditional Parallelismparallel for(T<c> e in collection){
e.operation(arguments);}
serial for(T<c> e in collection){e.operation(arguments);
}
disjoint(r,c)Always True
if (disjoint(r,c)) {parallel version
} else {sequential version
}
disjoint(r,c)Always False
disjoint(r,c)unknown
for(T<c> e in collection){e.operation(arguments);
}
33
• We do not know the relationships between all contexts at compile time.
• May vary from one object or method invocation to another
• Reasons:– Separate Compilation
– Dynamic Linking
– Complex Data Flows
Reasons for a Runtime System
34
• Type system provides support for specifying context relationships programmer asserts must be true
void oper1[r]() reads<r,c…> writes<…> where r # c {
…foreach(T|c| elem in collection)
{…}…
}
Reasons for a Runtime System
35
• Naïve implementation – each object keeps a pointer to its owner
Runtime System Implementation
36
AFJO Soundness
Subject Reduction Progress
Effect SoundnessOwner Invariance
Effect CompletenessContexts form a Tree Cast Safety
Context Disjointness Implies Effect Disjointness
Disjoint effects imply no data dependencies
Update Dependency Preservation Sufficient for Parallelization Sequential Consistency
Task Parallelism Sufficient Conditions
Data Parallelism Sufficient Conditions
Pipeline Parallelism Sufficient Conditions
Disjointness Test Correct
Static Context Relations
Well Formed Heap
Context Parameters do not survive
37
• Added my system to C# 3.5• Extended GPC# compiler
• Added infrastructure to support arbitrary type parameters
• Implemented runtime ownership tracking system (~1,000 lines)
Implementation – Zal
Metric Total GPC# Extensions Extensions (% Total)
SLOC-P 39,444 27,888 12,156 30.8%
SLOC-L 22,201 14,957 7,244 32.7%
38
Implementation – Zal
Zal Compiler MicrosoftC# Compiler
Executing Program with Automatic Parallelization
Zalsource
C#source
Runtime Ownership Libraries
CILProgram w/OwnershipTracking
39
Implementation – Zal
AST
Tokens AST AST
AST
Dynamic Linked Libraries
Source Code Files
BytecodeFile
C# SourceFile
Scannergenerated by GPLex
Parsergenerated by Coco/R
Type Checker CodeGeneration
EffectComputation
ParallelizationLegend
C# compilation step
Zal compilation step
I/OAST
Scanner.scan()
Reads a stream of characters and processes them into tokens
Parser.parse()
Converts stream of tokens into an Abstract Syntax Tree
TypeCheck()
Resolves all TypeRefs to TypeDefs & checks type correctness
Output()Emit
Generates C# or CIL implementation of AST
computeEffects()LocalEffects()
Computes heap & stack effects for AST nodes
Parallelize()
Checks sufficient conditions for parallelism and implements them
OwnershipImplementation
BuildOwnershipImplementation()
Implements Zal features in C# by modifying AST
AST
40
• Have applied my system to a number of realistic applications
• Overall annotation requires modification to 20% of the source
• Ownership tracking overhead:– Execution time: 10% to 20%– Memory usage: 15% to 30%
• Implementation not fully optimized
Validation
41
Validation – Speedup
42
Validation – Speedup
43
• Focus on providing tools to express parallelism
• No support for validating correctness of parallelization
• Assumed programmer knowledge of parallel programming constructs
• Examples: Fortress, Chapel, X10
Related Work – Prog. Langs.
44
• Have proposed effect systems, but only suggested application to parallelism
• Data race and dead lock detection for locking – very different reasoning
• Deterministic Parallel Java (late 2009)– modified ownerships– Focused on kernels– Lost composition & abstraction to do so
Related Work – Ownership
45
• Abstract and composable system for reasoning about effects based on Ownership Types.
• Effect and reasoning systems applied to a real language and real program examples
• Real parallelism detected and exploited automatically
Contributions
46
• Developed and proved sufficient conditions for a number of different forms of parallelism
• Runtime system to support static reasoning.
Contributions
47
A. Craik and W. Kelly. Using Ownership to Reason About Inherent Parallelism in Imperative Object-Oriented Programs. International Conference on Compiler Construction. ed. R. Gupta, LNCS 6011, pp. 145-164, Springer-Verlag Berlin Hiedleberg, 2010.
W. Reid, W. Kelly, and A. Craik. Reasoning about Parallelism in Modern Object-Oriented Languages. Australasian Computer Science Conference. 2008
+3 technical reports on various versions of the reasoning system in e-prints
Publications
48
• System for reasoning about data dependencies and parallelism
• Abstract & composable• Usable by both programmers &
automated tools• Question of when & how to exploit still
open• Demonstration this automated
reasoning is possible w/ prototype
Conclusion
49
Q & A
50
• Ownerships traditionally for encapsulation
• Stack not considered by these works• Stack & stack referencing models vary
from language to language• I consider a restricted stack model:
– Stack and heap are disjoint– Stack locations can be differentiated by
name
Ownership & The Stack
51
• Stack model fits Java, C#, and VB .NET
• Dereferencing to read the heap causes an ownership effect
• Stack location names are unique and cannot be aliased without de-referencing
Ownership & The Stack