parallel computing with matlab · – automatically enabled in matlab since r2008a – multiple...

56
1 © 2013 The MathWorks, Inc. Parallel Computing with MATLAB 성 호 현 차장 Senior Application Engineer MathWorks Korea

Upload: others

Post on 23-Mar-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

1© 2013 The MathWorks, Inc.

Parallel Computing with MATLAB

성호현차장

Senior Application EngineerMathWorks Korea

2

Questions to Consider

Have you already optimized your serial code?

3

Serial MATLAB CodeBest Practices

Techniques for addressing performance– Vectorization– Preallocation

Consider readability and maintainability– Looping vs. matrix operations– Subscripted vs. linear vs. logical– etc.

4

Addressing Bottlenecks

Identify using Profiler

Focus on top bottlenecks– Total number of function calls– Time per function call

Classes of bottlenecks– File I/O– Displaying output– Computationally intensive

5

Questions to Consider

Have you already optimized your serial code? Do you need to reduce your run-time? Do you need to solve larger problems?

If so, do you have…– A multi-core or multi-processor computer?

– A graphics processing unit (GPU)?

– Access to a computer cluster?

6

Introduction to parallel computing tools

Using multicore/multi-processor computers

Using graphics processing units (GPUs)

Scaling up to a cluster

Agenda

7

Utilizing Additional Processing Power

Built-in multithreading– Automatically enabled in MATLAB since R2008a– Multiple threads in a single MATLAB computation engine

Parallel Computing using MATLAB Workers– Parallel Computing Toolbox, MATLAB Distributed Computing Server– Multiple computation engines with inter-process communication

GPU use: directly from MATLAB – Parallel Computing Toolbox– Perform MATLAB Computations on GPUs

www.mathworks.com/discovery/multicore-matlab.html

8

Worker Worker

Worker

Worker

WorkerWorker

Worker

WorkerTOOLBOXES

BLOCKSETS

Going Beyond Serial MATLAB Applications

9

Parallel Computing Toolbox for the Desktop

Speed up parallel applications

Take advantage of GPUs

Prototype code for your cluster

Desktop Computer

Parallel Computing ToolboxParallel Computing Toolbox

10

Scale Up to Clusters, Grids, and Clouds

Computer ClusterComputer Cluster

MATLAB Distributed Computing ServerMATLAB Distributed Computing Server

Scheduler

Desktop Computer

Parallel Computing ToolboxParallel Computing Toolbox

11

Introduction to parallel computing tools

Using multicore/multi-processor computers

Using graphics processing units (GPUs)

Scaling up to a cluster

Agenda

12

Programming Parallel Applications (CPU)

Ease

of U

se

Greater C

ontrol

13

Programming Parallel Applications (CPU)

Built-in support with Toolboxes

Ease

of U

seG

reater Control

14

Example: Optimizing Cell Tower PositionBuilt-in parallel support

With Parallel Computing Toolbox use built-in parallel algorithms in Optimization Toolbox

Run optimization in parallel

Use pool of MATLAB workers

15

Tools Providing Parallel Computing Support

Optimization Toolbox, Global Optimization Toolbox Statistics Toolbox Signal Processing Toolbox Neural Network Toolbox Image Processing Toolbox …

Worker

Worker

Worker

WorkerWorker

Worker

WorkerTOOLBOXES

BLOCKSETS

Directly leverage functions in Parallel Computing Toolboxwww.mathworks.com/products/parallel-computing/builtin-parallel-support.html

16

Programming Parallel Applications (CPU)

Built-in support with Toolboxes

Simple programming constructs:parfor, batch, distributed

Ease

of U

seG

reater Control

17

Ideal problem for parallel computing No dependencies or communications between tasks Examples: parameter sweeps, Monte Carlo simulations

Independent Tasks or Iterations

Time

blogs.mathworks.com/loren/2009/10/02/using-parfor-loops-getting-up-and-running/

18

Example: Parameter Sweep of ODEsParallel for-loops

Parameter sweep of ODE system– Damped spring oscillator– Sweep through different values

of damping and stiffness– Record peak value for each

simulation

Convert for to parfor

Use pool of MATLAB workers

0,...2,1,...2,1

5

xkxbxm

19

Programming Parallel Applications (CPU)

Built-in support with Toolboxes

Simple programming constructs:parfor, batch, distributed

Advanced programming constructs:createJob, labSend, spmd

Ease

of U

seG

reater Control

20

Example: MPI-based Functions

21

Parallel Profiler

Profiles the execution time for a function – Similar to the MATLAB profiler– Includes information about the communication between labs

Time spent in communication Amount of data passed between labs

Benefits– Identify the bottlenecks in your parallel algorithm– Understand which operations require communication

22

Parallel Computing Toolbox: Developments

R2012a– New programming interface– Distributed arrays: more MATLAB functions enabled

R2012b– Diary output available during running task – Distributed arrays: more MATLAB functions enabled

R2013a– Auto detect and transfer of files in batch and interactive workflows– Distributed arrays: more MATLAB functions enabled

www.mathworks.com/help/distcomp/using-matlab-functions-on-codistributed-arrays.html

.

23

Introduction to parallel computing tools

Using multicore/multi-processor computers

Using graphics processing units (GPUs)

Scaling up to a cluster

Agenda

24

What is a Graphics Processing Unit (GPU)

Originally for graphics acceleration, now also used for scientific calculations

Massively parallel array of integer andfloating point processors– Typically hundreds of processors per card– GPU cores complement CPU cores

Dedicated high-speed memory

* Parallel Computing Toolbox requires NVIDIA GPUs with Compute Capability 1.3 or greater, including NVIDIA Tesla 10-series and 20-series products. See www.nvidia.com/object/cuda_gpus.html for a complete listing

25

Core 1

Core 3 Core 4

Core 2

Cache

Performance Gain with More Hardware

Using More Cores (CPUs) Using GPUs

Device Memory

26

Programming Parallel Applications (GPU)

Built-in support with Toolboxes

Ease

of U

seG

reater Control

27

Programming Parallel Applications (GPU)

Built-in support with Toolboxes

Simple programming constructs:gpuArray, gather

Ease

of U

seG

reater Control

28

Example: Solving 2D Wave EquationGPU Computing

Solve 2nd order wave equation using spectral methods:

Run both on CPU and GPU

Using gpuArray and overloaded functions

www.mathworks.com/help/distcomp/using-gpuarray.html#bsloua3-1

29

Benchmark: Solving 2D Wave EquationCPU vs GPU

Intel Xeon Processor X5650, NVIDIA Tesla C2050 GPU

Grid Size CPU (s)

GPU(s) Speedup

64 x 64 0.1004 0.3553 0.28

128 x 128 0.1931 0.3368 0.57

256 x 256 0.5888 0.4217 1.4

512 x 512 2.8163 0.8243 3.4

1024 x 1024 13.4797 2.4979 5.4

2048 x 2048 74.9904 9.9567 7.5

30

Programming Parallel Applications (GPU)

Built-in support with Toolboxes

Simple programming constructs:gpuArray, gather

Advanced programming constructs:arrayfun, bsxfun, spmd

Interface for experts:CUDAKernel

Ease

of U

seG

reater Control

31

GPU Performance – not all cards are equal

Tesla-based cards will provide best performance Realistically, expect 4x to 10x speedup (Tesla) vs CPU See GPUBench on MATLAB Central for examples

Laptop GPU GeForce

Desktop GPU GeForce / Quadro

High Performance Computing GPU Tesla / Quadro

www.mathworks.com/matlabcentral/fileexchange/34080-gpubench

32

Criteria for Good Problems to Run on a GPU

Massively parallel:– Calculations can be broken into hundreds

or thousands of independent units of work– Problem size takes advantage of many GPU cores

Computationally intensive:– Computation time significantly exceeds CPU/GPU data transfer time

Algorithm consists of supported functions:– Growing list of Toolboxes with built-in support

www.mathworks.com/products/parallel-computing/builtin-parallel-support.html

– Subset of core MATLAB for gpuArray, arrayfun, bsxfun www.mathworks.com/help/distcomp/using-gpuarray.html#bsloua3-1 www.mathworks.com/help/distcomp/execute-matlab-code-elementwise-on-a-

gpu.html#bsnx7h8-1

33

GPU Support: Developments

R2012a– New supported functions and function enhancements– Asynchronous GPU Calculations– Ability to reset and deselect GPU device

R2012b– New support for toolboxes: Neural Network, Signal Processing– New supported functions and function enhancements– Auto-detection and selection for multiple GPU systems

R2013a– New support for toolboxes: Image Processing, Array Systems – New supported functions and function enhancements– Ability to use GPU arrays from MEX functions

34

Introduction to parallel computing tools

Using multicore/multi-processor computers

Using graphics processing units (GPUs)

Scaling up to a cluster

Agenda

35

Desktop Computer

Parallel Computing ToolboxParallel Computing Toolbox

Use MATLAB Distributed Computing Server

1. Prototype code

Localprofile

Your MATLAB

code

36

Use MATLAB Distributed Computing Server

1. Prototype code2. Get access to an enabled cluster

Computer ClusterComputer ClusterMATLAB Distributed Computing ServerMATLAB Distributed Computing ServerCluster

profile

Scheduler

37

Use MATLAB Distributed Computing Server

1. Prototype code2. Get access to an enabled cluster3. Switch cluster profile

Desktop Computer

Parallel Computing ToolboxParallel Computing Toolbox

Computer ClusterComputer ClusterMATLAB Distributed Computing ServerMATLAB Distributed Computing Server

SchedulerYour MATLAB

code

Localprofile

Clusterprofile

38

Example: Migrate from Desktop to ClusterScale to computer cluster

Desktop interface– Set defaults– Discover clusters– Manage profiles– Monitor jobs

Command-line API

39

Offload computation:– Free up desktop– Access better computers

Scale speed-up: – Use more cores– Go from hours to minutes

Scale memory: – Utilize distributed arrays– Solve larger problems without re-coding

Computer ClusterComputer ClusterMATLAB Distributed

Computing ServerMATLAB Distributed

Computing Server

Take Advantage of Cluster Hardware

Desktop Computer

Parallel Computing Toolbox

Parallel Computing Toolbox

40

Offload Computations with batch

TOOLBOXES

BLOCKSETSResult

Work Worker

Worker

Worker

Worker

41

Example: Parameter Sweep of ODEsScale Scheduled Processing

Offload processing to workers– batch

– matlabpool

Monitor progress of scheduled job– Job Monitor

Retrieve results from job– fetchOutputs

0,...2,1,...2,1

5

xkxbxm

42

Benchmark: Parameter Sweep of ODEsScaling case study with a cluster

Processor: Intel Xeon E5-267016 cores per node

Workers Job (minutes)

Main(minutes)

Job -Main

1 - 149

16 10.3 10.2 0.1

32 5.1 5.0 0.1

64 2.8 2.4 0.4

96 2.0 1.7 0.3

128 1.7 1.3 0.6

160 1.4 1.0 0.4

192 1.4 0.9 0.5

224 1.3 0.8 0.5

256 1.4 0.7 0.71 16 32 64 96 128 160 192 224 256

1

16

32

64

96

128

160

192

224

256

Rel

ativ

e S

peed

-up

Workers

Simulation for 106 Combinations

JobLinear ReferenceMain (Core Computation)

43

Distributed ArrayLives on the Workers

Remotely Manipulate Array from Client MATLAB

11 26 41

12 27 42

13 28 43

14 29 44

15 30 45

16 31 46

17 32 47

17 33 48

19 34 49

20 35 50

21 36 51

22 37 52

Distributing Large Data

TOOLBOXES

BLOCKSETS

44

Distributed Arrays and SPMD

Distributed arrays– Hold data remotely on workers running on a cluster– Manipulate directly from client MATLAB (desktop) – Use MATLAB functions directly on distributed arrays

www.mathworks.com/help/distcomp/using-matlab-functions-on-codistributed-arrays.html

spmd– Execute blocks of code on workers– Explicitly communicate between workers with message passing– Mix parallel and serial code in same program

45

Extension of Parallel Computing Toolbox

Complete pre-built solution– Framework and infrastructure– Communication between computers

Cost-effective– License for number of cores you will use– Simplified maintenance

MATLAB Distributed Computing Server

Computer ClusterComputer ClusterMATLAB Distributed Computing ServerMATLAB Distributed Computing Server

Scheduler

46

Dynamic Licensing Model

Users have access to their licensed products

Server does not check out any licenses on the client

User can exit MATLAB once the job is queued

Computer ClusterComputer ClusterMATLAB Distributed Computing ServerMATLAB Distributed Computing Server

Scheduler

47

Job Schedulers

• Direct support for existing scheduler: MDCS is simply another application

• Open API to support other schedulers( generic integration scripts)

• MathWorks Job Scheduler: turn-key solution for MATLAB-only clusters

www.mathworks.com/products/distriben/supported

Ease

of U

seG

reater Control

48

Best Practices for Scalability and Portability

Prototype for portability on the desktop

Use functions (avoid scripts)

Use MATLAB workflows to access more cores– batch

– createJob, createTask

Avoid large data transfers through the desktop interface

49

MATLAB Distributed Computing Server: Developments

R2012a– New Cluster Profile Manager– See Parallel Computing Toolbox developments

R2012b– Detection of available enabled clusters through Profile Manager– See Parallel Computing Toolbox developments

R2013a– See Parallel Computing Toolbox developments

Parallel Computing Toolbox: Developments

50

Summary Parallel Computing Toolbox and MATLAB Distributed Computing Server

Speed up your MATLAB execution with more hardware

Easily learn parallel MATLAB without being a parallel programming expert

51

For more information

Visit

www.mathworks.com/parallel-computing

52

Example: Filtering an ImageBuilt-in parallel support

With Parallel Computing Toolboxuse built-in parallel algorithms inImage Processing Toolbox

Run median filtering in parallel

Use pool of MATLAB workers

http://hirise.lpl.arizona.edu/From - NASA/JPL/University of Arizona

Noisy Image

Filtered Image

53

2D Median Filter

54

Example: Parameter Sweep of ODEsParallel for-loops Simulink

Parameter sweep of ODE system– Damped spring oscillator in Simulink – Sweep through different values

of damping and stiffness– Record peak value for each

simulation

Convert for to parfor

Use pool of MATLAB workers

0,...2,1,...2,1

5

xkxbxm

55

Scaling Up to Run on Multiple GPUs

Worker Worker

Worker

Worker

WorkerWorker

Worker

WorkerTOOLBOXES

BLOCKSETS

56

Running on Multiple GPUs

N = 1000; % Number of iterations

spmd% Assign each worker a different GPU gpuDevice(labindex);A = gpuArray(A); % transfer data

end

parfor ix = 1:M% Do the GPU-based calculationX = myGPUFunction(ix,A);% Gather data Xtotal(ix,:)= gather(X);

end

Single GPU Multiple GPUs

N = 1000; % Number of iterations

A = gpuArray(A); % transfer data to GPU

for ix = 1:M% Do the GPU-based calculationX = myGPUFunction(ix,A);% Gather data Xtotal(ix,:)= gather(X);

end