mama: mostly automatic management of atomicity
DESCRIPTION
MAMA: Mostly Automatic Management of Atomicity. Christian DeLozier, Joseph Devietti, Milo M. K. Martin University of Pennsylvania. March 2 nd , 2014. Start with a serial problem. Find and express the parallelism. Coordinate the parallel execution (synchronization). Don’t mess up!. - PowerPoint PPT PresentationTRANSCRIPT
MAMA: Mostly Automatic Management of Atomicity
Christian DeLozier, Joseph Devietti, Milo M. K. MartinUniversity of Pennsylvania
March 2nd, 2014
2
Start with a serial problem
Christian DeLozier - 2014
3
Find and express the parallelism
Christian DeLozier - 2014
4
Coordinate the parallel execution (synchronization)
Christian DeLozier - 2014
5
Don’t mess up!
Christian DeLozier - 2014
6
Is there another way to do this?
• Programmer currently has to:1. Express the parallelism (Hard)
2. Coordinate the parallelism (Hard)
• Alternative:1. Programmer expresses the parallelism
2. Machine handles coordination
Christian DeLozier - 2014
7
Coordinating Parallel Execution
• Atomicity vs. Ordering• Types of concurrency bugs [Lu et al., ASPLOS 2008]
• Atomicity: Locks, transactions• Ordering: Barriers, fork/join, blocking on a queue, etc.
• Atomicity constraints are more common than ordering constraints
• Difficult to infer ordering constraints
Christian DeLozier - 2014
8
Mostly Automatic Management of Atomicity
• Toward automatically providing atomicity for parallel programs
• Program either executes atomically – or deadlocks
• Protect every shared variable with its own lock
• Restore progress and performance when necessary (with help from the programmer)
Christian DeLozier - 2014
9
Related Work
• Automatic Parallelization• [Bernstein, IEEE Transactions 1966]• …
• Data Centric Synchronization• [Vaziri et. al, POPL 2006]• [Ceze et. al, HPCA 2007]
• Transactional Memory• [Herlihy and Moss, ISCA 1993]• …
Christian DeLozier - 2014
10
Lock-Based Atomic Sections
• What lock do we acquire?
• When do we acquire the lock?
• When should we release the lock?
Christian DeLozier - 2014
11
What lock do we acquire?
• Associate a lock with each variable
• Trade-off between parallelism and overhead
• Coarse-grained vs. Fine-grained• Coarse-grained: 1 lock per object, 1 lock per array• Fine-grained: 1 lock per field, 1 lock per array element
• Mutex vs. Reader-writer lock
Christian DeLozier - 2014
12
MAMA Prototype
• Uses fine-grained locking• More parallelism
• Especially for arrays• Optimization: Divide arrays into N chunks, 1 lock per
chunk
• Uses reader-writer locks• More parallelism
• Read sharing is common
Christian DeLozier - 2014
13
Lock-Based Atomic Sections
• What lock do we acquire?• One reader-writer lock per variable (fine-grained)
• When do we acquire the lock?• Acquire before the first dynamic access
• When should we release the lock?
Christian DeLozier - 2014
14
T1T1
When should we release the lock?
• Simple case: After the owning thread has exited
Christian DeLozier - 2014
T1 T2
Write A
Exit
Write A
T2T2
Exited
Write A
15
When should we release the lock?
• When the owning thread is waiting for another thread to make progress (e.g. join, barrier)
Christian DeLozier - 2014
T1T1
T1 T2
Write A
Join T2
Write A T2T2
Joined
Write A
Exit
Spawn T2
16
When should we release the lock?
• Other deadlocks cannot be safely broken
• Need help from the programmer• Trusted annotations to sanction breaking a deadlock
• MAMA_release(object)• Also used to improve performance when threads are
over-serializedChristian DeLozier - 2014
T1T1
T1 T2
Write A
Write B
Write B T2T2
Write B
Write A
Write A
17
Lock-Based Atomic Sections
• What lock do we acquire?• One reader-writer lock per variable (fine-grained)
• When do we acquire the lock?• Acquire before the first dynamic access
• When should we release the lock?• At thread exit• When waiting for another thread to make progress• Or, at programmer sanctioned program points
Christian DeLozier - 2014
18
What can deadlocks tell us?• When a thread cannot acquire a lock:
• Perform distributed deadlock detection [Bracha and Toueg, Distributed Computing 1987]
Christian DeLozier - 2014
void f(){ A = 1; B = 2;}
void g(){ B = 1; A = 2;}
T1T1 T2T2
Write B
Write AT2
T1
19
MAMA Prototype
• Implemented as a RoadRunner tool [Flanagan and Freund, PASTE 2010]
• Dynamic instrumentation for Java byte-code
• Evaluated on the Java Grande benchmarks and selected DaCapo benchmarks
• Running on one socket (8 cores) of a 4 socket Nehalem system with 128 GB RAM
• Removed all synchronized blocks and java.util.concurrent constructs from benchmarks
• Ensure that MAMA is providing all of the atomicity
Christian DeLozier - 2014
20
Evaluating MAMA
• Can we execute parallel programs correctly?
• How many annotations need to be added for progress and performance?
• How is the performance of the program affected?• Does MAMA permit thread to execute in parallel?
Christian DeLozier - 2014
21
Benchmark Lines of CodeProgress
AnnotationsPerformance Annotations
crypt 314 0 0
lufact 461 1 4
lusearch 124105 0 4
matmult 187 0 0
moldyn 487 3 0
montecarlo 1165 0 28
pmd 60062 0 4
series 180 0 0
sor 186 1 0
sunflow 21970 1 3
xalan 172300 0 0
Annotation Burden
Christian DeLozier - 2014
22
Performance
Christian DeLozier - 2014
• MAMA incurs overhead due to locking and serial execution• But, MAMA still allows some parallel execution as
compared to serialization
23x
23
Performance Breakdown
Christian DeLozier - 2014
• Many benchmarks have significant portions that run in parallel• Checking whether or not a lock is already owned incurs
significant overhead on some benchmarks
24
Memory Usage
Christian DeLozier - 2014
• Fine-grained locking incurs significant memory overheads• Could be optimized to save space via chunking arrays or
decreasing the size of the lock
25
Future Directions
• Does this approach apply to other languages?
• How do we test programs running with MAMA?• Find uncommon deadlocks• Gain more confidence in trusted annotations
• How can we reduce the performance overheads?
• How can we infer ordering constraints?
Christian DeLozier - 2014
26
MAMA
• Provides atomicity for parallel programs• Some help via annotations from programmer
• A step toward programming without worrying about atomicity
• Programmer expresses parallelism• Machine provides atomicity automatically
Christian DeLozier - 2014
27
Thank you for listening!
Christian DeLozier - 2014