parallel & gpu computing in matlab its research computing lani clough

108
Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Upload: bruce-edgar-parker

Post on 23-Dec-2015

275 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Parallel & GPU computing in MATLAB ITS Research Computing

Lani Clough

Page 2: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Objectives• Introductory level MATLAB course for people who

want to learn parallel and GPU computing in MATLAB.

• Help participants determine when to use parallel computing and how to use MATLAB parallel & GPU computing on their local computer & on the Research Computing clusters (Killdevil/Kure)

Page 3: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Logistics

• Course Format• Overview of MATLAB topics

with Lab Exercises• UNC Research Computing–http://its.unc.edu/research

Page 4: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Agenda• Parallel computing (1hr 10min)

– What is it? – Why use it? – How to write MATLAB code in parallel (1hr)

• GPU computing (20 min)– What is it & why use it? – How to write MATLAB code in for GPU computing (15 min)

• How to run MATLAB parallel & GPU code on the UNC cluster (20 min)– Quick introduction to the UNC cluster (Kure)

– Bsusb commands and what they mean• Questions (10 min)

Page 5: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Parallel Computing

Page 6: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Generally, computer code is written in serial– 1 task completed after another until the script is

finished with only 1 task completing at each time– Concept the computer only has 1 CPU

What is Parallel Computing?

Source: https://computing.llnl.gov/tutorials/parallel_comp/

Page 7: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Parallel Computing: Using multiple computer processing units (CPUs) to solve a problem at the same time

• The compute resources might be: computer with multiple processors or networked computers

What is Parallel Computing? (cont.)

Source: https://computing.llnl.gov/tutorials/parallel_comp/

Page 8: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Save time & money (commodity components)• Provide concurrency• Solve larger problems• Use non-local resources– UNC compute cluster– SETI: 2.9 million computers– Folding (Stanford): 450,000 cpus

Why use Parallel Computing

Source: https://computing.llnl.gov/tutorials/parallel_comp/

Page 9: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• The computational problem should be able to:– Be broken into discrete parts that can be solved

simultaneously and independently–Be solved in less time with multiple

compute resources than with a single compute resource.

How to write code in parallel

Page 10: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Parallel Computing in MATLAB

Page 11: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• MATLAB parallel Computing Toolbox (available for use at UNC) – Provides twelve workers (MATLAB computational

engines) to execute applications on a multicore system.

– Built in functions for parallel computing• parfor loop (for running task-parallel algorithms on

multiple processors)• spmd (handles large datasets and data-parallel

algorithms)

Parallel Computing in MATLAB

Page 12: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Allows MATLAB to run as many workers on a remote cluster of computers as licensing allows.

• OR run more than 12 workers on a local machine.

• UNC does not have a license for this toolbox- it’s extremely $$$$$$$$

• More information: http://www.mathworks.com/products/distriben/

• Course will not go over this toolbox

Matlab Distributed Computing Toolbox

Page 13: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• findResource• matlabpool– open– close– size

• parfor (for loop)• spmd (distributed computing for datasets)• batch jobs (run job in background)

Primary Parallel Commands

Page 14: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Find available parallel computing resources• out = findResource()

findResource

Page 15: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough
Page 16: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• lsf_sched = findResource('scheduler','type','LSF')

– Find the Platform LSF scheduler on the network.

• local_sched = findResource('scheduler','type','local')

– Create a local scheduler that will start workers on the client machine for running your job.

• jm1 = findResource('scheduler','type’, 'jobmanager’ ,'Name', 'ClusterQueue1');

– Find a particular job manager by its name.

findResource Examples

Page 17: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• http://www.mathworks.com/help/toolbox/distcomp/findresource.html

More Resources for findResource

Page 18: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• matlabpool open– Begins a parallel work session

• Options for open matlab pool

Matlabpool

Page 19: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• These three examples of open matlabpool each have the same result: opens a local pool of 4 workers– 1:

– 2:

– 3:

Matlabpool open

Page 20: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• matlabpool(x)– Request the number of workers you’d like, i.e. matlabpool(4)

• matlabpool(‘size’)– Tells you the number of workers available in matlabpool

– i.e.

Matlabpool

Page 21: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Request too many workers, get an error

Matlabpool

Can only request 4 workers on this machine!

Page 22: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Use matlabpool close to end parallel session

• Options– matlabpool close force • deletes all pool jobs for current user in the cluster

specified by default profile (including running jobs)– matlabpool close force <profilename> • deletes all pool jobs run in the specified profile

Matlabpool Close

Page 23: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• parfor loops can execute for loop like code in parallel to significantly improve performance

• Must consist of code broken into discrete parts that can be solved simultaneously (i.e. it can’t be serial)

Parallel for Loops (parfor)

Page 24: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Will work in parallel, loop increments are not dependent on each other: open matlabpool local 2j=zeros(100,1); %pre-allocate vectorparfor i=2:100;

j(i,1)=5*i; end;close matlabpool

Parfor example

Makes the loop run in parallel

Page 25: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Won’t work in parallel- it’s serial:j=zeros(100,1); %pre-allocate vector

j(1)=5;for i=2:100;

j(i,1)=j(i-1)+5; end;

Serial Loop example

j(i-1) needed to calculate j(i,1) serial!!!

Page 26: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Can not nest parfor loops within parfor loops

parfor i=1:10parfor j=1:10

x(i,j)=1;end;

end;

Parallel for Loops (parfor)

Page 27: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• If a function is used with multiple outputs,within a parfor loop MATLAB will have difficulty

figuring out how to run the parfor loop.

e.g.for i=1:10

[x{i}(:,1), x{i}(,:2)]=functionName(z,w)

end

Parallel for Loops (parfor)

Page 28: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Use this instead

for i=1:10[x1, x2]=functionName(z,w);x{i}=[x1 x2];

end

Parallel for Loops (parfor)

Page 29: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

For parallel computing to be worth your time: the task must be solved in less time with multiple compute resources than with a single compute resource.

Parallel for Loops (parfor)

Page 30: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Use MATLAB’s tic & toc functions– Tic starts a timer– Toc tells you the number of seconds since the tic

function was called

Test the efficiency of your parallel code

Page 31: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Simple Exampletic; parfor i=1:10z(i)=10;

end;toc

Tic & Toc

Page 32: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

clear;clc;matlabpool(4)k=(zeros(10,3));m=1;i=1;while i<1e8 [time1 time2]=testParfor(i); k(m,:)= [i time1 time2]; m=m+1; i=i*10;end;

Check efficiency of simple parfor loop

Page 33: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

function [t1e, t2e]=testParfor(x)

A=ones(x,1).*4;B=zeros(x,1);t1s=tic;matlabpool(4)parfor i = 1:length(A) B(i) = sqrt(A(i));endt1e=toc(t1s);matlabpool close

B=zeros(x,1);t2s=tic;for i = 1:length(A) B(i) = sqrt(A(i));endt2e=toc(t2s);

Check efficiency of simple parfor loop

Page 34: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• For loop is much more efficient than parfor loop- more resources does not necessary equate to a faster run time!!

Result of Check Efficiency of parfor

Page 35: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Previous example is not an effective use of a parfor loop because it takes more time to evaluate than a for loop.– Data transfer is the issue– Parfor is more effective with long running

calculations within the loop– Generally more iterations increase the efficiency

of a parfor loop

Parfor Efficiency

Page 36: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Lab exercise: – Turn a non-parallel function into a function that

can run in parallel

– Go through each section of each and determine if it can be written in parallel and if so, how? (%% denotes a new section)

Lab Exercise with parfor

Page 37: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

function N=calcNeighNp(neighPoly,manzPoly,manzPop93, manzPop05, manzID)

%Find the manzanas which don't have an associated population in 93, but a population in 05

j=1;for i=1:length(manzPop93) if manzPop93(i,1)==0 && manzPop05(i,1)>0 no93manzPopID(j,1)=manzID(i,1); j=j+1; end;end;%%

Lab Exercise

matlabpool(x) %start matlabpool

%parfor can't be used here because it’s serial

Page 38: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

%% %Calculate the average monthly population change

(excluding the data pts with no pop in 1993);MonthsC=(2005-1993);j=1;count=0;TotalPopC=0;for i=1:length(manzPop93) if manzID(i,1)~=no93manzPopID(j,1) TotalPopC=TotalPopC+((manzPop05(i,1)-

manzPop93(i,1))/MonthsC); count=count+1; else j=j+1; end;end;%%

Lab Exercise%parfor can't be used here because it’s serial

Page 39: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

%%meanPopChangeM=TotalPopC/count; PopChangeMmanz=zeros(length(manzPop05),1);%Calculate the monthly population change for all the

manzanasfor i=1:length(manzPop05) for j=1:length(no93manzPopID) if manzID(i,1)==no93manzPopID(j,1) PopChangeMmanz(i,1)=meanPopChangeM;

break; else PopChangeMmanz(i,1)=(manzPop05(i,1)-

manzPop93(i,1))/MonthsC; end; end;end;%%

Lab Exercise

parfor i=1:length(manzPop05)

%% break must be deleted, not permitted in parfor

Page 40: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

%%%Now calculate what the midpoint populationmidPop=manzPop93+(PopChangeMmanz*9.5); %turn the neighs and manz clockwise to calc pop

for i=1:length(neighPoly) [neighClock{i}(:,1) neighClock{i}(:,2)] =

poly2cw(neighPoly{i}(:,1),neighPoly{i}(:,2));

end;

for i=1:length(manzPoly) [manzClock{i}(:,1) manzClock{i}(:,2)]=

poly2cw(manzPoly{i}(:,1),manzPoly{i}(:,2));end;%%

Lab Exercise

parfor i=1:length(neighPoly)[temp1, temp2] = poly2cw(neighPoly{i}(:,

1),neighPoly{i}(:,2));neighClock{i}=[temp1 temp2];

end;

parfor i=1:length(manzPoly)[temp1, temp2] = poly2cw(manzPoly{i}(:,

1),manzPoly{i}(:,2));manzClock{i}=[temp1 temp2];

end;

Page 41: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

%%%calculate the areas of the manzanas;polyAreaR=zeros(length(manzClock),1);for i=1:length(manzClock) polyAreaR(i,1)=calcArea(manzClock{i}(:,1),

manzClock{i}(:,2));end;

%%

Lab Exercise

parfor i=1:length(manzClock)polyAreaR(i,1)=calcArea(manzClock{i}(:,1),

manzClock{i}(:,2));end;

Page 42: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

%calculate the population for each of the neighs as function of the manzanas & sum calculated pop

N=zeros(length(neighClock),1); %pre-allocate the vector;for i=1:length(neighClock) m=0; Ntemp=zeros(length(manzClock),1); for j=1:length(manzClock) [tempx tempy]=polybool('intersection',

neighClock{i}(:,1),neighClock{i}(:, 2) ,manzClock{j}(:,1),manzClock{j}(:,2));

if isempty(tempx)==0; m=m+1;

Ntemp(m,1)=(calcArea(tempx,tempy)/polyAreaR(j,1))*midPop(j);

end; end; N(i,1)=(sum(Ntemp)); end;

Lab Exercise

parfor i=1:length(neighClock)

Page 43: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Loren Shure’s blog entry on parfor– http://blogs.mathworks.com/loren/2009/10/02/u

sing-parfor-loops-getting-up-and-running/

• Advanced parfor topics (MATLAB online help)– http://www.mathworks.com/help/toolbox/distco

mp/brdqtjj-1.html#bq_of7_-1

• Lauren Shore (MATLAB engineer)

More parfor resources

Page 44: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• All functions are included in the online Parallel MATLAB program files

• Parfor progress monitor (user created)– http://www.mathworks.com/matlabcentral/fileexc

hange/24594-parfor-progress-monitor

• Parallel Profiler (user created)– http://www.mathworks.com/help/toolbox/distco

mp/bra51qt-1.html#brcrm_t

Functions to support parfor performance

Page 45: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• All functions are included in the online Parallel MATLAB program files

• User-created codes

• Parfor progress monitor (user created)– http://www.mathworks.com/matlabcentral/fileexc

hange/24594-parfor-progress-monitor

Functions to support parfor performance

Page 46: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Parallel Profiler (built-in function)– http://www.mathworks.com/help/toolbox/distco

mp/bra51qt-1.html#brcrm_t

• partictoc– You can also use this user created function,

partictoc to examine the efficiency of your parallel code

– Download at:http://www.mathworks.com/matlabcentral/fileexchange/27472-partictoc

Functions to support parfor performance

Page 47: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Used to Partition large data sets

• Excellent when you want to work with an array too large for your computer’s memory

Spmd

Page 48: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Spmd distributes the array among MATLAB workers (each worker contains a part of the array)

• However, still can operate on entire array as 1 entity

• Workers automatically transfer data between when necessary i.e matrix multiplication.

Spmd

Page 49: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Formatmatlabpool (4)spmdstatements

end

• Simple Examplematlabpool(4)spmdj=zeros(1e7,1);

end;

Spmd Format

Page 50: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Result j is a Composite with 4 parts!

Spmd Examples

Page 51: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Its an object used for data distribution in MATLAB

• A Composite object has one entry for each worker– matlabpool(12) creates? – matlabpool(6) creates?

MATLAB Composites

12X1 composite6X1 composite

Page 52: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• You can create a composite in two ways: – spmd– c = Composite(); • This creates a composite that does not contain any

data, just placeholders for data• Also, one element per matlabpool worker is created for

the composite• Use smpd or indexing to populate a composite created

this way

MATLAB Composites

Page 53: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Example

c = Composite(); % One element per lab in the pool for ii = 1:length(c) % Set the entry for each lab to zero c{ii} = 0; % Value stored on each lab end

MATLAB Composites

Page 54: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Using the j Composite from Previous slide

Composite indexing

Page 55: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Assign the values of a composite to a matrix

• All composites are turned into MATLAB cell arrays

Composite indexing

Page 56: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

%Perform a simple calculation in parallel, and plot the results:

matlabpool(4)spmd % build magic squares in parallel q = magic(labindex + 2); %labindex- index of the

lab/worker (e.g. 1)end

for ii=1:length(q) % plot each magic square figure, imagesc(q{ii}); %plot a matrix as an

imageendmatlabpool close

Another spmd Example- creating graphs

Page 57: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Results

Another spmd Example- creating graphs

Page 58: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Extensive documentation online for using spmd and composites– http://www.mathworks.com/help/toolbox/distco

mp/brukbno-1.html

– Spmd specific documentation:• http://www.mathworks.com/help/toolbox/distcomp/sp

md.html

MATLAB help documents on spmd

Page 59: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Run independent parallel jobs on a worker, not on a compute cluster– Batch in cluster language≠ batch in MATLAB

language

Run jobs in batch

Page 60: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

%Construct a parallel job object using the default configuration.

pjob = createParallelJob(); %Add the task to the job.createTask(pjob, 'rand', 1, {4}); %Set the number of workers required

for parallel execution.set(pjob,'MinimumNumberOfWorkers',4

);set(pjob,'MaximumNumberOfWorkers',4

); %Run the job.submit(pjob);

%Wait for the job to finish running, and retrieve the job results.

waitForState(pjob);out = getAllOutputArguments(pjob); %Display the random matrices.celldisp(out); %Destroy the job.destroy(pjob);

Run jobs in batch

Page 61: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Results from previous batch job

Running jobs in batch

Page 62: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• More information at: http://www.mathworks.com/help/toolbox/distcomp/f1-6010.html#f1-7659

Run jobs in batch

Page 63: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

GPU Computing

Page 64: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• GPU computing is the use of a GPU (graphics processing unit) with a CPU to accelerate performance

• Offloads compute-intensive portions an application to the GPU, and remainder of code runs on CPU

What is GPU computing?

Page 65: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• CPUs consist of a few cores optimized for serial processing

• GPUs consist of thousands of smaller cores designed for parallel performance (i.e. more memory bandwidth and cores)

What is GPU computing?

Source: http://www.nvidia.com/object/what-is-gpu-computing.html

Page 66: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Serial portions of the code run on the CPU while parallel portions run on the GPU

• From a user's perspective, applications in general run significantly faster

What/Why GPU computing?

Page 67: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Transfer data between the MATLAB workspace & the GPU– Accomplished by a GPUArray• Data stored on the GPU.

– Use gpuArray function to transfer an array from the MATLAB workspace to the GPU

Write GPU computing codes in MATLAB

Page 68: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• ExamplesN = 6;M = magic(N);G = gpuArray(M); %create an array stored on GPU

• G is a MATLAB GPUArray object representing magic square data on the GPU.

X = rand(1000);G = gpuArray(X); %array storedOn GPU

Write GPU computing codes in MATLAB

Page 69: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• gpuArray requires nonsparse data types: 'single', 'double', 'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64', or 'logical’.

Write GPU computing codes in MATLAB

Page 70: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Static GPUArrays allow users to directly construct arrays on GPUs, without transfers

• Include:

Static GPUArrays

Page 71: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Construct an Identity Matrix on the GPUII = parallel.gpu.GPUArray.eye(1024,'int32');size(II)

1024 1024

• Construct a Multidimensional Array on the GPUG = parallel.gpu.GPUArray.ones(100, 100, 50);size(G) 100 100 50classUnderlying(G)Double %double is default, so don’t need to specify it

Static Array Examples

Page 72: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• For a complete list of available static methods in any release, type methods('parallel.gpu.GPUArray')

• For help on any one of the constructors, typehelp parallel.gpu.GPUArray/functionname

• For example, to see the help on the colon constructor, type help parallel.gpu.GPUArray/colon

More Resources for GPU Arrays

Page 73: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Use gather function– Makes data available in GPU environment,

available in MATLAB workspace (CPU)

• Use isequal to verify that you get the correct data back:

Retrieve Data from the GPU

Page 74: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Example G = gpuArray(ones(100, 'uint32')); %array stored only on GPU

D = gather(G); %bring D to CPU/MATLAB workspace

OK = isequal(D, ones(100, 'uint32')) %check to see if the array on the GPU is the same as the array brought to the CPU

Retrieve Data from the GPU

Page 75: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• You can also examine GPUArray underlying charateristics using following built-in functions:

GPUArray Characteristics

Page 76: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Example– To examine the size of the GPUArray object G,

type:G = gpuArray(rand(100));s = size(G) 100 100

GPU Array Charaterstics

Page 77: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Example uses the fft and real functions, arithmetic operators + and *.

• Calculations are performed on the GPU, gather retrieves data from the GPU to workspace.

Ga = gpuArray(rand(1000, 'single'));

%array on GPU & next operations performed on GPU

Gfft = fft(Ga); Gb = (real(Gfft) + Ga) * 6;G = gather(Gb); brings G to the CPU

Calling Functions with GPU Objects

Page 78: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• The whos command is instructive for showing where each variable's data is stored.

whos

• All arrays are stored on the GPU (GPUArray), except G, because it was “gathered”

Calling Functions with GPU Objects

Name Size Bytes ClassG 1000x1000 4000000 single

Ga 1000x1000 108 parallel.gpu.GPUArray

Gb 1000x1000 108 parallel.gpu.GPUArray

Gfft 1000x1000 108 parallel.gpu.GPUArray

Page 79: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Call arrayfun with a function handle to the MATLAB function as the first input argument:

result = arrayfun(@myFunction, arg1, arg2);

• Subsequent arguments provide inputs to the MATLAB function.

• Input arguments can be workspace data or GPUArray. – GPUArray type input arguments return GPUArray. – Else arrayfun executes in the CPU

Running Functions on GPU

Page 80: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Example: function applies correction to an array function c = myCal(rawdata, gain, offst)c = (rawdata .* gain) + offst;

• Function performs only element-wise operations when applying a gain factor and offset to each element of the rawdata array.

Running Functions on GPU

Page 81: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Create some nominal measurement:meas = ones(1000)*3; % 1000-by-1000 matrix

• Function allows the gain and offset to be arrays of the same size as rawdata, so unique corrections can be applied to individual measurements.

• Typically keep the correction data on the GPU so you do not have to transfer it for each application:

Running Functions on GPU

Page 82: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

% Runs on the GPU because the input arguments gn and offs are in GPU memory;

gn = gpuArray(rand(1000))/100 + 0.995; offs = gpuArray(rand(1000))/50 - 0.01;corrected = arrayfun(@myCal, meas, gn, offs);

% Retrieve the corrected results from the GPU to the MATLAB workspace;

results = gather(corrected);

Running Functions on GPU

Page 83: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• If you have only one GPU in your computer, that GPU is the default.

• If you have more than one GPU card in your computer, you can use the following functions to identify and select which card you want to use:

Identify & Select GPU

Page 84: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Identify & Select GPU• This example shows how to identify and GPU a for

your computations– First, determine the number of GPU devices on your

computer using gpuDeviceCount

Page 85: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Identify & Select GPU• In this case, you

have 2 devices, thus the first is the default. – To examine it’s

properties type gpuDevice

Page 86: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Identify & Select GPU• If the previous GPU is the device you want to use,

then you can just proceed with the default

• To use another device call gpuDevice with the index of the card and view its properties to verify you want to use it. Here is an example where the second device is chosen

Page 87: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• MATLAB’s extensive online help documents for GPU computing– http://www.mathworks.com/help/toolbox/distco

mp/bsic3by.html

More Resources for GPU computing

Page 88: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Parallel & GPU Computing on the cluster

Page 89: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Node– A standalone "computer in a box". Usually

comprised of multiple CPUs/processors/cores. Nodes are networked together to comprise a cluster.

• Processor / Core– individual CPUs subdivided into multiple "cores",

each being a unique execution unit (processor). • The result is a node with multiple CPUs, each

containing multiple cores.

Cluster Jargon

Page 90: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Using MATLAB on the computer Cluster• What?? – UNC provides researchers and graduate students with access

to extremely powerful computers to use for their research. – Kure is a Linux based computing system with >1,800 cores– Killdevil is a Linux based computing system with >6,000 cores

• Why??– The cluster is an extremely fast and efficient way to run LARGE

MATLAB programs (fewer “Out of Memory” errors!)– You can get more done! Your programs run on the cluster

which frees your computer for writing and debugging other programs!!!

Page 91: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Using MATLAB on the computer Cluster

• Where and When??– The cluster is available 24/7 and you can run programs

remotely from anywhere with an internet connection!

Page 92: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Using MATLAB on the computer Cluster

• Overview of how to use the computer cluster– It would be helpful to take the following courses:• Getting Started on Kure & Killdevil• Introduction to Linux

– For presentations & help documents, visit:• Course presentations:

http://its2.unc.edu/divisions/rc/training/scientific/ • Help documents:

http://its.unc.edu/research/its-research-computing/computing-resources/

Page 93: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Run your job on the cluster (1 job, not parallel)• 1. Log in SSH file transfer client • 2. Transfer the files you want to work with • 3. Log into the SSH client • 4. Change your working directory to the folder you want to

work in i.e. cd /netscr/myoynen/• 5. Type ls to make sure your program is located in the correct

folder• 6. Type bmatlab <yourProgram.m>• Optional- to see you program running, type bhist or bjobs

Using MATLAB on the computer Cluster

Page 94: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Have access to:– 8 workers on Kure – 12 workers for each job on Killdevil

Parallel MATLAB on Cluster

Page 95: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Start a cluster job with this command which gives you 1 job that is NOT parallel OR GPU– bsub /nas02/apps/matlab-2011a/matlab –nodisplay –

nosplash –singleCompThread –r <filename> o “filename” is the name of your Matlab script with the

.m extension left offo singleCompThread

oALWAYS use this option unless you are requesting an entire node for a serial (i.e. not using the Parallel Computing Toolbox) Matlab job or using GPUs!!!!!!

Bsub commands for parallel & GPU

Page 96: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Log file options (always created for jobs) • sent to your email by default- it is possible to output

this to a file located in your job’s current working directory.

• ALWAYS PUT additional BSUB OPTIONS AFTER bsub & BEFORE the executable name!!!!!!!• See examples on next slides!!

Bsub commands for parallel & GPU

Page 97: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Add these additional Logfile options: o - o logfile.%J

oDoes not send your MATLAB logfile to your email, it instead puts this information in a file called logfile.%J where %J is the job’s ID number.

ouse this when your MATLAB output (all the resulting unsuppressed output from your job) is too large to send over email.

Bsub commands for parallel & GPU

Page 98: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Add these additional options: o -x

oRequest the use of an entire nodeo -M

oRequests more than 4GB of memory for your jobo -n

oRequests the number of workers you’d like for your job

Bsub commands for parallel & GPU

Page 99: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• ALL MATLAB jobs must run on 1 host! • LSF option to use with parallel Matlab jobs: -R

“span[hosts=1]”o -R “span[hosts=1]”

o Send your job to one host.

Bsub commands for parallel & GPU

Page 100: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Intermediate & Introductory MATLAB course PPTs have step by step instructions to get started using the cluster & using a basic matlab command on to run a simple job– http://its2.unc.edu/divisions/rc/training/scientific/

• UNC cluster help files (LSF, file sharing system, tells MATLAB how to run jobs, all commands before where LSF commands– https://help.unc.edu/6273 – Must use onyen to log in!

More information on using the cluster

Page 101: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Run a parallel MATLAB job with 96GB of Memory and 12 workers on 1 host– This can only run on KillDevil!!!– bsub –n12 –M96 –R “span[hosts=1]”

/nas02/apps/matlab-2011a/matlab –nodisplay –nosplash –singleCompThread –r <filename>

• Run a parallel MATLAB job on 2 hosts, with 8 workers – Can’t DO this!! All parallel MATLAB jobs must

run on 1 host!!!

Bsub Parallel MATLAB Excercise

Page 102: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Run a parallel MATLAB job on 1 hosts, with 8 workers – Use either Kure or KillDevil• bsub –n8 –R “span[hosts=1]” /nas02/apps/matlab-

2011a/matlab –nodisplay –nosplash -singleCompThread –r <filename>

Bsub Parallel MATLAB Exercise

Page 103: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Run a parallel MATLAB job with 6 workers, give log file a specified name, don’t send output to email and and don’t include .m on filename– Use either Kure or KillDevil• bsub –o out.%J –n6 –R “span[hosts=1]” /nas02/apps/matlab-

2011a/matlab -nodisplay –nosplash -singleCompThread -r <filename> -logfile <logName>

Bsub Parallel MATLAB Exercise

Page 104: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Can only use KillDevil to run GPU jobs• bsub script is straightforward • Only request 1 CPU because you are only

using 1 CPU and the multiple GPU processors– Use –q gpu –a gpuexcl_t– E.g. • bsub –q gpu –a gpuexcl_t /nas02/apps/matlab-

2011a/matlab –nodisplay –nosplash –r <filename>

Bsub GPU MATLAB commands

Page 105: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Will not use the following optionso -xo -Mo -no singleCompThread

• Can use all other bsub commands introduced

• More information:– https://help.unc.edu/CCM3_034792

Bsub GPU MATLAB commands

Page 106: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

• Make sure your written MATLAB code has the following information:– matlabpool close– matlabpool (x)

Cluster Command Reminders!

Page 107: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Questions?

Page 108: Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Questions and Comments?

• For assistance with MATLAB, please contact the Research Computing Group: Email: [email protected] Phone: 919-962-HELP Submit help ticket at http://help.unc.edu