openmp examples - part 2 - unipv · pdf fileopenmp examples - part 2 mirto musci, ... odd-even...

29

Upload: duongkiet

Post on 06-Mar-2018

243 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

OpenMP Examples - Part 2

Mirto Musci, PhD Candidate

Department of Computer ScienceUniversity of Pavia

Processors Architecture Class, Fall 2011

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 2: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Outline

1 Introduction

2 Parallel Odd-Even Sorting

Serial Algorithm

Straightforward Parallelization

Trying to Push the Concurrency Higher

Final Algorithm

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 3: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Introduction

This time, work in class will be di�erent

There will be only one example providedSubstancially more di�cultIt will introduce you to the �nal project

First steps

Start looking at the serial algorithm

Implement a main function to showcase the algorithm

Declare an appropriate array to use for testing

Use printf to check for correctness

Set up for measurement

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 4: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Outline

1 Introduction

2 Parallel Odd-Even Sorting

Serial Algorithm

Straightforward Parallelization

Trying to Push the Concurrency Higher

Final Algorithm

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 5: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Odd-Even Sort

The odd-even transposition sort is a sorting algorithm

It can be adapted to a generic collection of itemsFor simplicity, we'll use an array of integers

Like Bubblesort compares adjacent items and exchanges them

if they are found to be out of order.

Unlike Bubblesort, compares disjointed pairs by using

alternating odd and even index values

The odd and even phases are repeated until no exchanges of

data are required

What's the worst case?

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 6: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Odd and Even Phases

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 7: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Serial Code

vo id OddEvenSort ( i n t ∗A, i n t N){

i n t exch = 1 , s t a r t = 0 , i ;i n t temp ;

whi le ( exch | | s t a r t ) {exch = 0 ;f o r ( i = s t a r t ; i < N−1; i += 2) {

i f (A[ i ] > A[ i +1]) {temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch = 1 ;

}}i f ( s t a r t == 0) s t a r t = 1 ;e l s e s t a r t = 0 ;

}}

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 8: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

How does it work?

Loop continues until a complete pass generates no exchange

We must run through at least one odd and one even phase to

consider the �rst and last items

start variable has two purposes!

Comparisons start on alternating element indexes

Key Point

Each comparison within a given phase can be done concurrently.

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 9: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Outline

1 Introduction

2 Parallel Odd-Even Sorting

Serial Algorithm

Straightforward Parallelization

Trying to Push the Concurrency Higher

Final Algorithm

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 10: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

How to parallelize?

Ideal candidate for data decomposition design

Dividing the array into chunks keep comparisons in the chunk.For comparisons on the edge we can simply let the thread withlower indexed element do the comparison.

Key Features

No contention: any item in either phases only touched once.

Simple: divide the inner for loop with a parallel for construct.

Implicit barrier: all the comparisons completed before

launching the next phase odd or even phase

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 11: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Parallel Code - First Version

vo id OddEvenSort ( i n t ∗A, i n t N){

i n t exch = 1 , s t a r t = 0 , i ;i n t temp ;

whi le ( exch | | s t a r t ) {exch = 0 ;

#pragma omp p a r a l l e l f o r p r i v a t e ( temp )f o r ( i = s t a r t ; i < N−1; i += 2) {

i f (A[ i ] > A[ i +1]) {temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch = 1 ;

}}

i f ( s t a r t == 0) s t a r t = 1 ;e l s e s t a r t = 0 ;}

}

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 12: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

How does it work?

temp is a work, private variable; exch and start are shared.

Shared Resources Analysis

start does not need any protection

Read within the parallel region but updated only outside

exch is updated within the loop and then read outside

but does not need protection either!Each thread update with the same value: benign data race(Count of the number of exchanges: need for synchronization)

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 13: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Outline

1 Introduction

2 Parallel Odd-Even Sorting

Serial Algorithm

Straightforward Parallelization

Trying to Push the Concurrency Higher

Final Algorithm

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 14: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Perfomance optimization

The parallel region is within the body of the while loop

Each iteration will cost the execution some overhead fromstarting and stopping threads

Can we move the parallelism to a �higher� level to prevent

repeated waking and sleeping?

We need to be sure that each thread will execute the same

number of iterations

Otherwise we can end up in deadlock!Some threads at the end of the parallel regionOthers at some implicit barrier in the body of the loop

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 15: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Parallel Code - First Version

vo id OddEvenSort ( i n t ∗A, i n t N){

i n t exch = 1 , s t a r t = 0 , i ;i n t temp ;

whi le ( exch | | s t a r t ) {exch = 0 ;

#pragma omp p a r a l l e l f o r p r i v a t e ( temp )f o r ( i = s t a r t ; i < N−1; i += 2) {

i f (A[ i ] > A[ i +1]) {temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch = 1 ;

}}

i f ( s t a r t == 0) s t a r t = 1 ;e l s e s t a r t = 0 ;}

}

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 16: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Parallel Code - Second Version

#pragma omp p a r a l l e l{

i n t temp ;whi le ( exch | | s t a r t ) {

exch = 0 ;#pragma omp f o r

f o r ( i = s t a r t ; i < N; i += 2) {i f (A[ i ] > A[ i +1]) {

temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch = 1 ;

}}

#pragma omp s i n g l ei f ( s t a r t == 0) s t a r t = 1 ;e l s e s t a r t = 0 ;

}}

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 17: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Review: Start Variable

Loop reads start and the single construct updates it

Using a critical region would be a mistake.Each thread will be given access, in turn, to update start.Start will ping- pong back and forth between 0 and 1.

Instead of allowing one thread at a time to have access to

update start, we need one thread alone to update this variable.

Added bonus of an implicit barrier (previously parallel region

implicit barrier)

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 18: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Review: Exch Variable

Does reading of exch need protection?

Unfortunately, we can still run into a catastrophic data race!

Consider the initial entry into the parallel region.

The shared value of exch is 1, and all threads should enterFirst thing: resets exch to 0.If any threads don't access exch in the conditional expressionbefore any reset it, those threads will exit parallel regionIn the worst case, all but one thread will be sidelined.

Deadlock Situation

Any thread inside waiting at the barrier of the for construct

All others waiting at the implicit barrier of the parallel region

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 19: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Bug Fix

Can we �x this? Simply protecting the reset of exch with some

form of synchronization won't do the trick

We need to set up exch to allow N threads to pass into the

while loop if any of the threads have performed an exchange

Idea

Assign to exch the total number of threads, instead of 1

Reset would be a protected decrement

Exch would not be 0 until every thread had entered

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 20: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Parallel Code - Bug Fix

#pragma omp p a r a l l e l{

i n t temp ;whi le (1 ) {

i f ( exch == 0 && s t a r t == 0) break ;#pragma omp c r i t i c a l

exch−−;#pragma omp f o r

f o r ( i = s t a r t ; i < N; i += 2) {i f (A[ i ] > A[ i +1]) {

temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;#pragma omp c r i t i c a l

exch = omp_num_threads ( ) ; // KEY!}

}#pragma omp s i n g l e

i f ( s t a r t == 0) s t a r t = 1 ;e l s e s t a r t = 0 ;

}}

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 21: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Review

We turned into an in�nite loop

Break when there have been no exchangesWhy? Cannot protect read access in while conditional.

No need to protect if access of exch

If exch is 0: all threads will break and exit parallel region.If exch is N, only the last thread will reset exch to 0No thread will fail to enter the loop region

Decrement and update are protected by the same region:

Theay are mutually exclusivePossible con�icts avoided

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 22: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Bug alert!

Still, there is a subtle bug

Some thread enter the for construct and do a swap, whileother are still waiting at the �rst critical regionExch could be set to an incorrect value and lead to deadlock...

Fix: an explicit barrier after the critical region decrement

Problem

Our original goal was to remove overhead...

But we are introducing far more!

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 23: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Outline

1 Introduction

2 Parallel Odd-Even Sorting

Serial Algorithm

Straightforward Parallelization

Trying to Push the Concurrency Higher

Final Algorithm

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 24: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Minimize The Overhead

We can't e�ciently minimize thread sleeping and waking

Without adding a bunch of other overheads

We have to revert to straightforward parallelization, but

Idea

We can cut down on the number of times we need to start and

stop threads

Inside each parallel region, rather than doing one odd or one

even phase, we can do one of each

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 25: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Parallel Code - Double Phase I

vo id OddEvenSort ( i n t ∗A, i n t N){

i n t exch0 , exch1 = 1 , t r i p s = 0 , i ;

whi le ( exch1 ) {exch0 = 0 ;exch1 = 0 ;

#pragma omp p a r a l l e l{

i n t temp ;#pragma omp f o r

f o r ( i = 0 ; i < N−1; i += 2) {i f (A[ i ] > A[ i +1]) {

temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch0 = 1 ;

}}

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 26: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Parallel Code - Double Phase II

i f ( exch0 | | ! t r i p s ) {#pragma omp f o r

f o r ( i = 1 ; i < N−1; i += 2) {i f (A[ i ] > A[ i +1]) {

temp = A[ i ] ; A [ i ] = A[ i +1] ; A [ i +1] = temp ;exch1 = 1 ;

}}

} // i f exch0} // end p a r a l l e lt r i p s = 1 ;

}}

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 27: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

How does it work?

Algorithm

Uses two exchange �ags: exch0 and exch1.

The two �ags are reset and the thread enters the parallel region

The �rst for construct divides up the array. If any thread

performs an exchange, it sets exch0.

After implicit barrier, each thread tests exch0.

If the variable (or trips counter) has been set, thread entersthe second for construct

If any thread does an exchange in the second loop, sets exch1.

Check exch1... and continue...

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 28: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

IntroductionParallel Odd-Even Sorting

Serial AlgorithmStraightforward ParallelizationTrying to Push the Concurrency HigherFinal Algorithm

Performance Considerations

Result

Through unrolling we change granularity

With double phase we cut the number of entries and exits

from parallel region in half

Reduction of overhead is worth the extra coding.

Question

More unrolling could be worth it?

Mirto Musci, PhD Candidate OpenMP Examples - Part 2

Page 29: OpenMP Examples - Part 2 - unipv · PDF fileOpenMP Examples - Part 2 Mirto Musci, ... Odd-Even Sort Theodd-even ... Mirto Musci, PhD Candidate OpenMP Examples - rtPa 2

Appendix For Further Reading

For Further Reading

Clay Breshears

The art of Concurrency

O'Really, 2009.

Mirto Musci, PhD Candidate OpenMP Examples - Part 2