intro to matlab gpu programming

Upload: modlyzko

Post on 02-Jun-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Intro to Matlab GPU Programming

    1/35

    Matlab Optimization, parallelism,

    and GPU computing.

    Kai Mollerud

    CEMS IT office

  • 8/11/2019 Intro to Matlab GPU Programming

    2/35

    What Ill Cover

    Basics; what is parallel computing, why GPUs are so good at

    it.

    When is a GPU better than a CPU.

    What youll need / how to use the GPU Development process

    Learning to write fast, non-GPU, programs

    Turning non-GPU programs, into GPU programs

  • 8/11/2019 Intro to Matlab GPU Programming

    3/35

    Parallelism

    This is the key idea behind all high power computing,

    especially GPU computing.

    Parallelism can be difficult to fully understand, because

    people dont often do things in parallel.

    Here is an image of some real-world parallel problem solving:

  • 8/11/2019 Intro to Matlab GPU Programming

    4/35

    Analogue Parallelism

  • 8/11/2019 Intro to Matlab GPU Programming

    5/35

    How is This Parallelism?

    The chalk holder isperforming whats calleda SIMD operation, singleinstruction, multiple data.

    Each piece of data (chalk)must be of the same typeto fit in the array, but theycan have different values(color, length)

    Likewise, a computer canperform the sameoperation on each elementin an array simultaneously.

  • 8/11/2019 Intro to Matlab GPU Programming

    6/35

    So, why GPU computing?

  • 8/11/2019 Intro to Matlab GPU Programming

    7/35

    GPU vs. CPU

    A modern CPU has between 2 and 16 processing cores.

    CPUs are designed to handle a wide array of tasks, often

    performing several heterogeneous operations at once.

    A modern GPU on the other hand, can have up to 2048stream processors.

    A GPUs usual job is to decide what color each of the pixels on

    your monitor are, a 1080p monitor has 2,073,600 pixels that

    can change color ~60 times a second.

  • 8/11/2019 Intro to Matlab GPU Programming

    8/35

    Parallel Problems

    Not all problems are well suited to parallel computation.

    There are 3 levels of parallelism, determined by how much

    the operations involved depend on each other.

    Fine-grained, Coarse-grained, Embarrassingly Put simply, GPU computing is best suited to Embarrassingly parallel

    problems, and sometimes usable for problems with Coarse-grained

    parallelism.

    The technical reasoning here revolves around memory performance,

    ask me later if you would like a more detailed explanation.

  • 8/11/2019 Intro to Matlab GPU Programming

    9/35

    When to use GPU computing

    Just because a problem is parallel, doesnt mean GPU

    computing is the right choice.

    CPUs can do multiple operations at once, and run much faster than

    GPUs.

    Where GPUs really shine are problems that are parallel, andhave very large amounts of data to process.

    Deciding whether or not a problem will really benefit from

    GPU computing isnt always obvious until you have actually

    written the program. Luckily, matlab makes it easy to write a program for the CPU first, then

    adapt it to the GPU to see if its worth it.

  • 8/11/2019 Intro to Matlab GPU Programming

    10/35

    The Development Process

    Step 1) Write a program

    Step 2) Make the program fast

    Step 3) Adapt the program to use the GPU

  • 8/11/2019 Intro to Matlab GPU Programming

    11/35

    Step 1) Write a program

    When you start writing a program,

    performance is not important.

    Try and focus on good organization of your

    program, make it easy to read and modify.

    Keeping things organized will make the next 2

    steps much easier.

    Personally, I start by writing comments todescribe each block of code.

  • 8/11/2019 Intro to Matlab GPU Programming

    12/35

    Example Code #1

    first_draft.m

    1. Populates an array with some floating pointvalues

    2. Calculates the mean value of the array3. Perform an operation on each element

    4. Repeat 1-3 1000 times

    This obviously isnt a useful calculation, but it iscomputationally similar to some programs I haveseen researchers using.

  • 8/11/2019 Intro to Matlab GPU Programming

    13/35

    Step 2) Make it fast

    This is not a simple subject, computers arecomplex and making a program run quicklymeans understanding how the computer runs the

    program. An inefficient program wont get better just

    because you run it on the GPU.

    Rather than tell you every trick I know for

    speeding up programs, Ill show you how toexperiment and learn.

    Ill also show you a few tricks.

  • 8/11/2019 Intro to Matlab GPU Programming

    14/35

    Optimization tools

    Code profiler Programs run a bit slower in the profiler

    You can save the output of the profiler as a html file to look at later,this is useful when measuring performance changes.

    Control your runtime

    You will need to run your code again and again Scale down the simulation detail, comment out plotting functions, etc.

    If its part of a larger program, find a way to isolate it from the rest.

    tic + toc The code profiler does this for you, but sometimes you just want one

    number to look at, and these are easy to use. Use a fast computer.

    If your group runs simulations, you should think about getting adedicated computer to run them on.

  • 8/11/2019 Intro to Matlab GPU Programming

    15/35

    Optimization techniques

    Avoid nesting loops if at all possible

    Use for loops instead of while loops Not necessarily faster, but cleaner and easier to parallelize

    Avoid conditionals

    Use the find() function If you use an ifelse, put the most common part first.

    Consider using a switch() statement

    Avoid calling functions inside loops.

    Think about MEX functions for very big calculations lets you use C programs from matlab

    C is a lot faster than matlab

    Dont use the mean() function, its slow. Use sum()/numel()

  • 8/11/2019 Intro to Matlab GPU Programming

    16/35

    Example code #2

    Second_draft.m

    About 92% faster than #1

    Uses find() to avoid conditionals Eliminates the nested loops by using vector

    operations

    Replaces the mean() function withsum()/numel()

  • 8/11/2019 Intro to Matlab GPU Programming

    17/35

    Step 3) Using the GPU

    Matlab uses vectors for everything. GPUs are

    built for vector operations

    This makes the conversion really easy.

    To do GPU computing in matlab you will need:

    Parallel computing toolbox (university has this

    licensed)

    A nVidia graphics card with compute capabilityversion 1.3 or higher.

    entry cost of about $150 for a decent card

  • 8/11/2019 Intro to Matlab GPU Programming

    18/35

    GPU functions

    Performing a calculation on the GPU involves

    2-3 steps.

    Put the data you need into GPU memory

    Call a GPU enabled function on that data

    Move the results from GPU memory to CPU

    memory.

  • 8/11/2019 Intro to Matlab GPU Programming

    19/35

    Putting data on the GPU

    Matlabsparallel computation toolbox

    provides the gpuArray data type

    Any gpuArray variable is stored in GPU memory

    gpuArray supports most data types, and behave

    more or less the same as normal arrays

    Any operation on a gpuArray variable will return a

    gpuArray variable.

  • 8/11/2019 Intro to Matlab GPU Programming

    20/35

    Putting data on the GPU

    You can create gpuArraysin 2 ways

    Copy a variable from CPU memory to GPU

    memory

    Create a variable directly on the GPU

  • 8/11/2019 Intro to Matlab GPU Programming

    21/35

    Copying a variable to the GPU

  • 8/11/2019 Intro to Matlab GPU Programming

    22/35

    Copying a variable to the GPU

    a and b are independent, subsequentoperations on one do not affect the other

    a must be nonsparse, and must be of type

    single, double, int/uint 8/16/32/64, or logical i.e. no custom data types

    b has a 108 byte placeholder in CPU memory,

    and uses 1600 bytes on GPU memory Transferring takes time, dont do it inside a

    loop

  • 8/11/2019 Intro to Matlab GPU Programming

    23/35

    Creating data on a GPU directly

  • 8/11/2019 Intro to Matlab GPU Programming

    24/35

    Creating data on a GPU directly

    You can use; ones, zeros, inf, nan, true, false,

    eye, colon, rand, randi, randn, linspace,

    logspace

    This avoids the time cost of transferring from

    CPU memory to GPU memory.

  • 8/11/2019 Intro to Matlab GPU Programming

    25/35

  • 8/11/2019 Intro to Matlab GPU Programming

    26/35

    Example code #3

    third_draft.m

    Almost identical to #2

    Turns the array into a gpuArray so the operations

    are run on the GPU

    Actually a bit slower than #2

    That is, slower when using the same parameters.

    More on this shortly.

  • 8/11/2019 Intro to Matlab GPU Programming

    27/35

    Bringing GPU data back

    The gather() function takes in a gpuArray and

    copies it to CPU memory.

    Again, this takes time, try and leave data on

    the GPU as long as you can and transfer all of

    it back at once.

    I can go into detail about GPU vs CPU memory

    behavior later if theres time/interest, otherwiseask me / email me.

  • 8/11/2019 Intro to Matlab GPU Programming

    28/35

    Using the GPU in your code

    Knowing how to use the GPU is half the battle,

    the rest is knowing when.

    Theres a simple way to learn this, take some

    code, change something to a gpuArray and

    see how the runtime changes.

  • 8/11/2019 Intro to Matlab GPU Programming

    29/35

  • 8/11/2019 Intro to Matlab GPU Programming

    30/35

    Quantitative example

    I wrote 3 programs to do the same task. The task exhibits

    coarse-grained parallelism, and has a deterministic run-time.

    Naive.m is a simple, non-parallel implementation. It isnt exceptionally

    bad, but no effort has been made to make it run efficiently.

    CPU.m is a CPU-only, parallel implementation that is essentially as fastas it can be.

    GPU.m is very similar to CPU.m, but uses GPU operations wherever

    possible.

    I recorded performance metrics from these 3 programs across

    a range of inputs, increasing the size of the input data each

    time.

  • 8/11/2019 Intro to Matlab GPU Programming

    31/35

    Testing details

    The tests were run on a dell optiplex 990

    Intel i5-2400 4-cores @3.1Ghz (3.3 with turbo boost)

    4Gb 1333Mhz RAM

    nVidia GeForce GTX 650 Ti 1Gb GDDR5 memory @5400Mhz

    768 cell processors @941Mhz

    Windows 7 64-bit enterprise

    The numbers I gathered are unique to thiscomputer. Your results will vary, but shouldfollow similar trends.

  • 8/11/2019 Intro to Matlab GPU Programming

    32/35

    Runtime Vs. array size

  • 8/11/2019 Intro to Matlab GPU Programming

    33/35

    Elements per second

  • 8/11/2019 Intro to Matlab GPU Programming

    34/35

    Coding for the GPU

    Try not to move data between CPU and GPU veryoften

    Replace conditional logic with set theory (loopsand if statements VS. vector ops and find())

    Try to isolate variables. Storing values in an array to look at later can replace

    random accesses to those values while calculatingthem

    Be clever. You may need to change your entire approach to a

    problem to get the most out of GPU computing

  • 8/11/2019 Intro to Matlab GPU Programming

    35/35

    Questions?