toward using higher-level abstractions to teach parallel computing 5/20/2013 (c) copyright 2013...

46
Toward using higher-level abstractions to teach Parallel Computing 5/20/2013 (c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 1 Clayton Ferner, University of North Carolina Wilmington 601 S. College Rd., Wilmington, NC 28403 USA [email protected] Barry Wilkinson, University of North Carolina Charlotte 9201 Univ. City Blvd., Charlotte, NC 28223 USA [email protected] Barbara Heath, East Main Evaluation & Consulting, LLC P.O. Box 12343, Wilmington, NC 28405 USA [email protected] May 20 2013

Upload: theodore-barton

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 1

Toward using higher-level abstractions to teach Parallel Computing

5/20/2013

Clayton Ferner, University of North Carolina Wilmington601 S. College Rd., Wilmington, NC 28403 USA

[email protected]

Barry Wilkinson, University of North Carolina Charlotte9201 Univ. City Blvd., Charlotte, NC 28223 USA

[email protected]

Barbara Heath, East Main Evaluation & Consulting, LLCP.O. Box 12343, Wilmington, NC 28405 USA

[email protected]

May 20 2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 2

Outline

Problem Addressed – Raising the level of abstraction in Parallel Programming courses

Patterns Seeds Framework Paraguin Compiler Surveys/Results Conclusions Future Work

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 3

Problem Addressed

Parallel computing has typically been treated as an advanced topic in computer science.

Most parallel computing classes use low-level tools: MPI for distributed-memory systems OpenMP for shared-memory systems CUDA/OpenCL for high performance GPU

computing

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 4

Problem Addressed

Does not give students skills to tackle larger problems

Does not give students skills in computational thinking for parallel applications

Have to deal with issues such as deadlock, race conditions, and mutual exclusion.

Need to raise the level of abstraction

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 5

Pattern Programming Concept

Programmer begins with established computational patterns that provide a structure Reusable solutions to commonly occurring

problems Patterns provide guide to “best practices,” not a

final implementation Provides good scalable design structure Can reason more easier about programs

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 6

Pattern Programming Concept

Potential for automatic conversion into executable code avoiding low-level programming

Particularly useful for the complexities of parallel/distributed computing

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 7

What kind of Patters are We Talking About?

Low-level algorithmic patterns: fork-join, broadcast/scatter/gather.

Higher-level algorithm patterns: workpool, pipeline, stencil, map-reduce.

We concentrate upon higher-level patterns

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 8

Patterns(Workpool)

5/20/2013

Compute node

Master (Source/sink)

Two-way connection

One-way connection

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 9

Patterns(Pipeline)

5/20/2013

Stage 1 Stage 3Stage 2

Compute node

Master (Source/sink)

Two-way connection

One-way connection

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 10

Patterns(Stencil)

5/20/2013

Compute node

Master (Source/sink)

Two-way connection

One-way connection

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 11

Patterns(Divide and Conquer)

5/20/2013

Compute node

Master (Source/sink)

Two-way connection

One-way connectionDivide Merge

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 12

Patterns(All-to-All)

5/20/2013

Compute node

Master (Source/sink)

Two-way connection

One-way connection

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 13

Two Approaches

The first approach uses a new software environment (called Seeds) Developed at UNCC Creates a higher level of abstraction for parallel

and distributed programming based upon a pattern programming approach.

The second approach is uses compiler directives (Paraguin Compiler): Developed at UNCW Similar to OpenMP but creates MPI code for a

distributed-memory system. 5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 14

Seeds Framework

Programmer selects the pattern specifies the data to be sent to and from the

processors/processes specifies the computation to be performed by the

processors/processes The framework will

automatically distribute tasks across distributed computers and processors

self-deploy on distributed computers, clusters, and multicore processors, or a combination of distributed- and shared-memory computers.

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 15

Advantages

Programmer does not program using low level message passing APIs

Programmer is relieved concerns for message-passing deadlock

Programmer has a very simple programming interface

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 16

Seeds Framework

Programmer implement 3 methods: Diffuse method – to distribute pieces of data. Compute method – the actual computation Gather method – to gather the results

Plus Programmer completes a “bootstrap” class to deploy and start the framework with the selected pattern.

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 17

Example: Monte Carlo Method for Estimating of π

package edu.uncc.grid.example.workpool;... // import statementspublic class MonteCarloPiModule extends Workpool{ private static final long serialVersionUID = 1L; private static final int DoubleDataSize = 1000; double total; int random_samples; Random R;

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 18

Example: Monte Carlo Method for Estimating of π

public MonteCarloPiModule() { R = new Random();

} public void initializeModule(String[] args) {

total = 0; random_samples = 3000; // random samples

}

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 19

Example: Monte Carlo Method for Estimating of π

public Data Compute (Data data) {DataMap<String,Object>

input= (DataMap<String,Object>)data;DataMap<String, Object>

output = new DataMap<String, Object>();Long seed = (Long) input.get("seed");

Random r = new Random(); r.setSeed(seed);Long inside = 0L;…

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 20

Example: Monte Carlo Method for Estimating of π

for (int i = 0; i < DoubleDataSize ; i++) { double x = r.nextDouble();

double y = r.nextDouble(); double dist = x * x + y * y; if (dist <= 1.0) ++inside;

} output.put("inside", inside); return output;

}

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 21

Example: Monte Carlo Method for Estimating of π

public Data DiffuseData (int segment) { DataMap<String, Object>

d =new DataMap<String, Object>(); d.put("seed", R.nextLong()); return d; // returns a random seed for //each job unit }

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 22

Example: Monte Carlo Method for Estimating of π

public void GatherData (int segment, Data dat){ DataMap<String,Object> out = (DataMap<String,Object>) dat; Long inside = (Long) out.get("inside"); total += inside; // aggregate answer from all // the worker nodes.

}

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 23

Example: Monte Carlo Method for Estimating of π

public double getPi() { double pi = (total/ (random_samples*DoubleDataSize)) * 4; return pi;

} public int getDataCount() { return random_samples; }}

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 24

Example: Monte Carlo Method for Estimating of π

package edu.uncc.grid.example.workpool;… //import statements public class RunMonteCarloPiModule {

public static void main(String[] args) {try {

MonteCarloPiModule pi = new MonteCarloPiModule();

Seeds.start( "/path-to-seeds-folder" , false);PipeID id = Seeds.startPattern(new Operand(

(String[])null, new Anchor( "hostname", Types.DataFlowRoll.SINK_SOURCE), pi ));

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 25

Example: Monte Carlo Method for Estimating of π

System.out.println(id.toString() ); Seeds.waitOnPattern(id); System.out.println( "The result is: " + pi.getPi() ); Seeds.stop(); } catch

… // exceptions}

}

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 26

Compiler Directive Approach (Paraguin Compiler)

The Paraguin Compiler is a compiler being developed at UNCW that will produce parallel code Suitable for distributed-memory systems Uses MPI Interface is through directives (#pragmas) Similar to OpenMP

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 27

Example: Matrix Multiplication

#pragma paraguin begin_parallel#pragma paraguin bcast a b#pragma paraguin forall C p i j k \ 0x0 -1 1 0x0 0x0 \ 0x0 1 -1 0x0 0x0 #pragma paraguin gather 1 C i j k \ 0x0 0x0 0x0 1 \ 0x0 0x0 0x0 -1 

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 28

Example: Matrix Multiplication

for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { for (k = 0; k < N; k++) { c[i][j] = c[i][j] + a[i][k] * b[k][j]; } } }

#pragma paraguin end_parallel

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 29

Surveys

During the Fall 2012 semester, we administered 3 surveys (a pre-, mid-, and post-course) Feedback was collected by the external evaluator Students who provided consent and completed

each of the three surveys were entered in a drawing for one of eight $25 Amazon gift cards.

For each survey, 58 invitations were sent to students at both campuses.

The response rates for the three surveys were: 36%, 29%, and 28%, respectively.

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 30

Pre- and Post-test Survey Questions

5/20/2013

The purpose of the pre- and post-semester surveys was to assess the degree to which the students learned the material

A set of seven pre-course items were developed for this purpose.

The items were presented with a six-point Likert scale from “strongly disagree” (1) through “strongly agree” (6).

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 31

Pre- and Post-test Survey Questions

ItemI am familiar with the topic of parallel patterns for structured parallel programming.I am able to use the pattern programming framework to create a parallel implementation of an algorithm.I am familiar with the CUDA parallel computing architecture.

I am able to use CUDA parallel computing architecture.

I am able to use MPI to create a parallel implementation of an algorithm.I am able to use OpenMP to create a parallel implementation of an algorithm.I am able to use the Paraguin compiler (with compiler directives) to create a parallel implementation of an algorithm.

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 32

Results of Likert-Questions

ItemPre Post

Mean (sd)N=21

Mean (sd)N=16

I am familiar with the topic of parallel patterns for structured parallel programming. 2.74 (1.59) 4.44 (1.09)I am able to use the pattern programming framework to create a parallel implementation of an algorithm. 2.38 (1.60) 4.25 (0.86)

I am familiar with the CUDA parallel computing architecture. 2.29 (1.55) 4.63 (0.72)

I am able to use CUDA parallel computing architecture. 1.95 (1.43) 4.44 (0.89)I am able to use MPI to create a parallel implementation of an algorithm. 2.24 (1.26) 4.88 (0.81)I am able to use OpenMP to create a parallel implementation of an algorithm. 2.19 (1.12) 5.06 (1.24)I am able to use the Paraguin compiler (with compiler directives) to create a parallel implementation of an algorithm. 1.76 (0.89) 4.13 (1.15)

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 33

Results

Beginning of the semester: Students indicated that they mostly did not feel

able to use the tools to implement algorithms in parallel.

End of the semester: Students were mostly confident in their ability to

implement parallel algorithms using the tools Naturally, this is expected.

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 34

Results

What is not expected is: Students indicated greater confidence in using the lower

level parallel tools (MPI, OpenMP, and CUDA) than in using our new approaches (patterns and the Paraguin compiler).

There are two possible explanations for this:1) the tools need improvement to be easier to use; and2) students preferred the flexibility and control of the

lower level tools. Based upon the next set of data, both explanations

are true.

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 35

Open-Ended Questions

The students were asked to provide open-ended comments comparing and contrasting the various methods

5/20/2013

Item

Describe the benefits and drawbacks between the following methods: Pattern Programming (Assignment 1) and MPI (Assignment 2).Describe the benefits and drawbacks between the following methods: Pattern Programming (Assignment 1) and Paraguin Compiler Directives (Assignment 3).Describe the benefits and drawbacks between the following methods: MPI (Assignment 2) and Paraguin Compiler Directives (Assignment 3).

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 36

Results of Open-Ended Questions

All of the open-ended answers are included in the paper.

This answer is representative of the rest: “Using the seeds framework for pattern

programming made it easy implement patterns for workflow. However, seeds works at such a high level that I do not understand how it implements the patterns. MPI gave me much more control over how program divides the workflow, but it can often be difficult to write a complex program that requires a lot of message passing between systems.”

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 37

Results of Open-Ended Questions

Computer Science students are accustomed to having control.

Seeds framework takes control from the user when the user is working within the “basic” layer (which the students were using)

The framework is constructed in three layers: the “basic” layer the “advanced” layer the “expert” layer

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 38

Results of Open-Ended Questions

All of the open-ended answers are included in the paper.

This answer is representative of the rest: “I found Paraguin to be useful because it

eliminated the need for me to write the more complex message passing routines in MPI. However, I found it difficult to debug errors in the code and also determine the correct loop dependencies.”

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 39

Results of Open-Ended Questions

Some found the Paraguin compiler easier to use and some did not

The Paraguin compiler generates MPI code, so the programmer can see how it’s implemented, plus are free to modify

We feel that the concerns of the Paraguin being complicated are legitimate The compiler was designed to provide significant

flexibility over partitioning nested loops This flexibility overly complicated the

partitioning directive5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 40

Relative Difficulty

Students were asked to rate the relative difficulty of using the Seeds Framework, MPI, and the Paraguin compiler (1) very difficult to (6) very easy Students felt that the Seeds framework was the

easiest while the Paraguin was the most difficult

5/20/2013

Mean (sd)Pattern Programming 3.63 (0.89)MPI 3.25 (1.13)Paraguin Compiler Directives 2.56 (1.26)

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 41

Conclusions

Student would benefit from using the “advanced” layer of the Seeds Framework Students would see how their code is

implemented Students would feel the “control” over

implementation The compiler directives of the Paraguin

compiler have be redesigned to make them easier (work is almost complete)

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 42

Conclusions

Given the feedback, we feel confident we can overcome the obstacles and achieve our goal

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 43

Future Work

Our course will be offered again Fall 2013 with changes to address what we’ve learned

We will be measuring again the effectiveness of our methods

Spring 2013 will serve as a “control” Parallel Computing taught in traditional methods

at UNCC Similar surveys were administered

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 44

Future Work

Paraguin directives have been redesigned and reimplemented to make them easier to use:

#pragma paraguin begin_parallel #pragma paraguin scatter A B #pragma paraguin forall … #pragma paraguin gather C#pragma paraguin end_parallel

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 45

Paraguin compiler directives to describe patterns will be introduced

#pragma paraguin pattern(workpool) #pragma paraguin begin_master … #pragma paraguin end_master

#pragma paraguin begin_worker … #pragma paraguin end_worker

Future Work

5/20/2013

(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington 46

Questions?

Clayton Ferner

Department of Computer Science

University of North Carolina Wilmington

[email protected]

http://people.uncw.edu/cferner/

5/20/2013