openmp examples - part 2 - unipv · pdf fileopenmp examples - part 2 mirto musci, ... odd-even...
TRANSCRIPT
IntroductionParallel Odd-Even Sorting
OpenMP Examples - Part 2
Mirto Musci, PhD Candidate
Department of Computer ScienceUniversity of Pavia
Processors Architecture Class, Fall 2011
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Outline
1 Introduction
2 Parallel Odd-Even Sorting
Serial Algorithm
Straightforward Parallelization
Trying to Push the Concurrency Higher
Final Algorithm
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Introduction
This time, work in class will be di�erent
There will be only one example providedSubstancially more di�cultIt will introduce you to the �nal project
First steps
Start looking at the serial algorithm
Implement a main function to showcase the algorithm
Declare an appropriate array to use for testing
Use printf to check for correctness
Set up for measurement
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Outline
1 Introduction
2 Parallel Odd-Even Sorting
Serial Algorithm
Straightforward Parallelization
Trying to Push the Concurrency Higher
Final Algorithm
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Odd-Even Sort
The odd-even transposition sort is a sorting algorithm
It can be adapted to a generic collection of itemsFor simplicity, we'll use an array of integers
Like Bubblesort compares adjacent items and exchanges them
if they are found to be out of order.
Unlike Bubblesort, compares disjointed pairs by using
alternating odd and even index values
The odd and even phases are repeated until no exchanges of
data are required
What's the worst case?
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Odd and Even Phases
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Serial Code
vo id OddEvenSort ( i n t ∗A, i n t N){
i n t exch = 1 , s t a r t = 0 , i ;i n t temp ;
whi le ( exch | | s t a r t ) {exch = 0 ;f o r ( i = s t a r t ; i < N−1; i += 2) {
i f (A[ i ] > A[ i +1]) {temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch = 1 ;
}}i f ( s t a r t == 0) s t a r t = 1 ;e l s e s t a r t = 0 ;
}}
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
How does it work?
Loop continues until a complete pass generates no exchange
We must run through at least one odd and one even phase to
consider the �rst and last items
start variable has two purposes!
Comparisons start on alternating element indexes
Key Point
Each comparison within a given phase can be done concurrently.
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Outline
1 Introduction
2 Parallel Odd-Even Sorting
Serial Algorithm
Straightforward Parallelization
Trying to Push the Concurrency Higher
Final Algorithm
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
How to parallelize?
Ideal candidate for data decomposition design
Dividing the array into chunks keep comparisons in the chunk.For comparisons on the edge we can simply let the thread withlower indexed element do the comparison.
Key Features
No contention: any item in either phases only touched once.
Simple: divide the inner for loop with a parallel for construct.
Implicit barrier: all the comparisons completed before
launching the next phase odd or even phase
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Parallel Code - First Version
vo id OddEvenSort ( i n t ∗A, i n t N){
i n t exch = 1 , s t a r t = 0 , i ;i n t temp ;
whi le ( exch | | s t a r t ) {exch = 0 ;
#pragma omp p a r a l l e l f o r p r i v a t e ( temp )f o r ( i = s t a r t ; i < N−1; i += 2) {
i f (A[ i ] > A[ i +1]) {temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch = 1 ;
}}
i f ( s t a r t == 0) s t a r t = 1 ;e l s e s t a r t = 0 ;}
}
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
How does it work?
temp is a work, private variable; exch and start are shared.
Shared Resources Analysis
start does not need any protection
Read within the parallel region but updated only outside
exch is updated within the loop and then read outside
but does not need protection either!Each thread update with the same value: benign data race(Count of the number of exchanges: need for synchronization)
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Outline
1 Introduction
2 Parallel Odd-Even Sorting
Serial Algorithm
Straightforward Parallelization
Trying to Push the Concurrency Higher
Final Algorithm
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Perfomance optimization
The parallel region is within the body of the while loop
Each iteration will cost the execution some overhead fromstarting and stopping threads
Can we move the parallelism to a �higher� level to prevent
repeated waking and sleeping?
We need to be sure that each thread will execute the same
number of iterations
Otherwise we can end up in deadlock!Some threads at the end of the parallel regionOthers at some implicit barrier in the body of the loop
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Parallel Code - First Version
vo id OddEvenSort ( i n t ∗A, i n t N){
i n t exch = 1 , s t a r t = 0 , i ;i n t temp ;
whi le ( exch | | s t a r t ) {exch = 0 ;
#pragma omp p a r a l l e l f o r p r i v a t e ( temp )f o r ( i = s t a r t ; i < N−1; i += 2) {
i f (A[ i ] > A[ i +1]) {temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch = 1 ;
}}
i f ( s t a r t == 0) s t a r t = 1 ;e l s e s t a r t = 0 ;}
}
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Parallel Code - Second Version
#pragma omp p a r a l l e l{
i n t temp ;whi le ( exch | | s t a r t ) {
exch = 0 ;#pragma omp f o r
f o r ( i = s t a r t ; i < N; i += 2) {i f (A[ i ] > A[ i +1]) {
temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch = 1 ;
}}
#pragma omp s i n g l ei f ( s t a r t == 0) s t a r t = 1 ;e l s e s t a r t = 0 ;
}}
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Review: Start Variable
Loop reads start and the single construct updates it
Using a critical region would be a mistake.Each thread will be given access, in turn, to update start.Start will ping- pong back and forth between 0 and 1.
Instead of allowing one thread at a time to have access to
update start, we need one thread alone to update this variable.
Added bonus of an implicit barrier (previously parallel region
implicit barrier)
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Review: Exch Variable
Does reading of exch need protection?
Unfortunately, we can still run into a catastrophic data race!
Consider the initial entry into the parallel region.
The shared value of exch is 1, and all threads should enterFirst thing: resets exch to 0.If any threads don't access exch in the conditional expressionbefore any reset it, those threads will exit parallel regionIn the worst case, all but one thread will be sidelined.
Deadlock Situation
Any thread inside waiting at the barrier of the for construct
All others waiting at the implicit barrier of the parallel region
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Bug Fix
Can we �x this? Simply protecting the reset of exch with some
form of synchronization won't do the trick
We need to set up exch to allow N threads to pass into the
while loop if any of the threads have performed an exchange
Idea
Assign to exch the total number of threads, instead of 1
Reset would be a protected decrement
Exch would not be 0 until every thread had entered
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Parallel Code - Bug Fix
#pragma omp p a r a l l e l{
i n t temp ;whi le (1 ) {
i f ( exch == 0 && s t a r t == 0) break ;#pragma omp c r i t i c a l
exch−−;#pragma omp f o r
f o r ( i = s t a r t ; i < N; i += 2) {i f (A[ i ] > A[ i +1]) {
temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;#pragma omp c r i t i c a l
exch = omp_num_threads ( ) ; // KEY!}
}#pragma omp s i n g l e
i f ( s t a r t == 0) s t a r t = 1 ;e l s e s t a r t = 0 ;
}}
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Review
We turned into an in�nite loop
Break when there have been no exchangesWhy? Cannot protect read access in while conditional.
No need to protect if access of exch
If exch is 0: all threads will break and exit parallel region.If exch is N, only the last thread will reset exch to 0No thread will fail to enter the loop region
Decrement and update are protected by the same region:
Theay are mutually exclusivePossible con�icts avoided
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Bug alert!
Still, there is a subtle bug
Some thread enter the for construct and do a swap, whileother are still waiting at the �rst critical regionExch could be set to an incorrect value and lead to deadlock...
Fix: an explicit barrier after the critical region decrement
Problem
Our original goal was to remove overhead...
But we are introducing far more!
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Outline
1 Introduction
2 Parallel Odd-Even Sorting
Serial Algorithm
Straightforward Parallelization
Trying to Push the Concurrency Higher
Final Algorithm
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Minimize The Overhead
We can't e�ciently minimize thread sleeping and waking
Without adding a bunch of other overheads
We have to revert to straightforward parallelization, but
Idea
We can cut down on the number of times we need to start and
stop threads
Inside each parallel region, rather than doing one odd or one
even phase, we can do one of each
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Parallel Code - Double Phase I
vo id OddEvenSort ( i n t ∗A, i n t N){
i n t exch0 , exch1 = 1 , t r i p s = 0 , i ;
whi le ( exch1 ) {exch0 = 0 ;exch1 = 0 ;
#pragma omp p a r a l l e l{
i n t temp ;#pragma omp f o r
f o r ( i = 0 ; i < N−1; i += 2) {i f (A[ i ] > A[ i +1]) {
temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch0 = 1 ;
}}
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Parallel Code - Double Phase II
i f ( exch0 | | ! t r i p s ) {#pragma omp f o r
f o r ( i = 1 ; i < N−1; i += 2) {i f (A[ i ] > A[ i +1]) {
temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch1 = 1 ;
}}
} // i f exch0} // end p a r a l l e lt r i p s = 1 ;
}}
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
How does it work?
Algorithm
Uses two exchange �ags: exch0 and exch1.
The two �ags are reset and the thread enters the parallel region
The �rst for construct divides up the array. If any thread
performs an exchange, it sets exch0.
After implicit barrier, each thread tests exch0.
If the variable (or trips counter) has been set, thread entersthe second for construct
If any thread does an exchange in the second loop, sets exch1.
Check exch1... and continue...
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
IntroductionParallel Odd-Even Sorting
Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm
Performance Considerations
Result
Through unrolling we change granularity
With double phase we cut the number of entries and exits
from parallel region in half
Reduction of overhead is worth the extra coding.
Question
More unrolling could be worth it?
Mirto Musci, PhD Candidate OpenMP Examples - Part 2
Appendix For Further Reading
For Further Reading
Clay Breshears
The art of Concurrency
O'Really, 2009.
Mirto Musci, PhD Candidate OpenMP Examples - Part 2