parallel computing with gpu

33
Rohit khatana Parallel Computing With GPU Rohit Khatana 4344 Seminar guide Prof. Aparna Joshi ARMY INSTITUE OF TECHNOLOGY

Upload: rohit-khatana

Post on 17-Nov-2014

480 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Parallel computing with Gpu

Rohit khatana

Parallel Computing With GPU

Rohit Khatana4344

Seminar guideProf. Aparna Joshi

ARMY INSTITUE OF TECHNOLOGY

Page 2: Parallel computing with Gpu

Rohit khatana

Content

1.What is parallel computing?

2.Gpu

3.CUDA

4.Application

Page 3: Parallel computing with Gpu

Rohit khatana

What is Parallel Computing?

Performing or Executing a task/program on more than one machine or processor.

In simple way dividing a job in a group.

Page 4: Parallel computing with Gpu

Rohit khatana

For example

Page 5: Parallel computing with Gpu

Rohit khatana

Page 6: Parallel computing with Gpu
Page 7: Parallel computing with Gpu
Page 8: Parallel computing with Gpu
Page 9: Parallel computing with Gpu
Page 10: Parallel computing with Gpu
Page 11: Parallel computing with Gpu
Page 12: Parallel computing with Gpu

What kind of processors will we build?

(major design constraint: power)

Cpu: - Complex Control Hardware

Flexibility + Performance

Expensive in Terms of Power

GPU: - Simpler Control Hardware

More H/W for Computation

Potentially More power Efficient (ops/watt)

More Restrictive Programming Model

Page 13: Parallel computing with Gpu

Modern GPU has more ALU’s

Page 14: Parallel computing with Gpu

Graphics Logical Pipeline• The GPU receives geometry information

from the CPU as an input and provides a picture as an output

• Let’s see how that happens

Page 15: Parallel computing with Gpu

Host Interface

• The host interface is the communication bridge between the CPU and the GPU

• It receives commands from the CPU and also pulls geometry information from system memory

• It outputs a stream of vertices in object space with all their associated information (normals, texture coordinates, per vertex color etc)

Page 16: Parallel computing with Gpu

Vertex Processing• The vertex processing stage receives vertices from the

host interface in object space and outputs them in screen space

• This may be a simple linear transformation, or a complex operation involving morphing effects

• No new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping)

Page 17: Parallel computing with Gpu

Triangle Setup

• In this stage geometry information becomes raster information (screen space geometry is the input, pixels are the output)

• Prior to rasterization, triangles that are backfacing or are located outside the viewing frustrum are rejected

Page 18: Parallel computing with Gpu

Triangle Setup

• A fragment is generated if and only if its center is inside the triangle

• Every fragment generated has its attributes computed to be the perspective correct interpolation of the three vertices that make up the triangle

Page 19: Parallel computing with Gpu

Fragment Processing

• Each fragment provided by triangle setup is fed into fragment processing as a set of attributes

(position, normal, texcoord etc), which are used to compute the final color for this pixel

• The computations taking place here include texture mapping and math operations

Page 20: Parallel computing with Gpu

Memory Interface

• Fragments provided by the last step are written to the framebuffer.

• Before the final write occurs, some fragments are rejected by the zbuffer, stencil and alpha tests

Page 21: Parallel computing with Gpu

Memory Model of GPU

Page 22: Parallel computing with Gpu

Basic Architecture of GPU

Page 23: Parallel computing with Gpu

CUDA(compute unified device Architecture)

• CUDA is a parallel computing platform and programming model.

• Created by NVIDIA and implemented by the GPUs that they produce.

Page 24: Parallel computing with Gpu

CUDA

• CUDA gives developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs.

• CUDA supports standard programming languages , including C++,python , Fortran.

Page 25: Parallel computing with Gpu

Programming Model

• Threads are organized into blocks.

• Blocks are organized into a grid.

• A multiprocessor executes one block at a time.

• A warp is the set of threads executed in parallel.

• 32 threads in a warp.

Page 26: Parallel computing with Gpu

Typical CUDA/GPU Program

1. CPU allocates storage on GPU (cudaMalloc).

2. CPU copies input data from CPU GPU (cudaMemcpy).

3. CPU launches kernel on GPU to process the data.(Kernel function<<<no of threads>>>(parameter))

4. CPU copies results back to CPU from GPU (cudaMemcpy)

Page 27: Parallel computing with Gpu

simply squaring the elements of an array

__global__ void square(float * d_out, float * d_in){

// Todo: Fill in this function

int idx = threadIdx.x;

float f = d_in[idx];

d_out[idx] = f*f

}

theadIdx.x =gives the current thread number

GPU/CUDA programming

Page 28: Parallel computing with Gpu

Main program

int main(int argc, char **argv){

……………………

…………………….

float h_out[ARRAY_SIZE];

//declare GPU pointer

float * d_in;

float * d_out;

// allocate GPU memory

cudaMalloc( (void*) &d_in, ARRAY_BYTES);

cudaMalloc( (void*) &d_out, ARRAY_BYTES);

Page 29: Parallel computing with Gpu

Main program(cont.)

// transfer the array to the GPU

cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice);

// launch the kernel

square<<<1, ARRAY_SIZE>>>(d_out, d_in);

// copy back the result array to the CPU

cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost);

// print out the resulting array

for (int i =0; i < ARRAY_SIZE; i++) {

printf("%f", h_out[i]);

}

Page 30: Parallel computing with Gpu

Programming Model

Page 31: Parallel computing with Gpu

GPU vs CPU Code

Page 32: Parallel computing with Gpu

Conclusion• GPU computing is a good choice for fine-

grained data-parallel programs with limited communication

• GPU computing is not so good for coarse-grained program with a lot of communication

• The GPU has become a co-processor to the CPU.

Page 33: Parallel computing with Gpu

References

• 1.[‘IEEE’] Accelerating image processing capability using graphics processors Jason. Dalea, Gordon. Caina, Brad. ZellbaVision4ce Ltd. Crowthorne Enterprise Center, Crowthorne, Berkshire, UK, RG45 6AWbVision4ce LLC Severna Park, USA, MD2114

•  

• 2.Udacity cs344,Intro to parallel Programming with GPU

• 3.Wikipedia

• 4.Nividia docs