gpu - an introduction

21
Graphics Processing Unit Graphics Processing Unit DHAN V SAGAR CB.EN.P2CSE13007

Upload: dhan-v-sagar

Post on 28-Jan-2015

125 views

Category:

Technology


3 download

DESCRIPTION

GPU ,GPU Architecture, CUDA, TLP

TRANSCRIPT

Page 1: GPU - An Introduction

Graphics Processing UnitGraphics Processing Unit

DHAN V SAGARCB.EN.P2CSE13007

Page 2: GPU - An Introduction

Introduction

It is a processor optimized for 2D/3D graphics, video, visual computing, and display.

It is highly parallel, highly multi threaded multiprocessor optimized for visual computing.

It provide real-time visual interaction with computed objects via graphics images, and video.

Page 3: GPU - An Introduction

History

● Up to late 90's– No GPUs– Much simpler VGA controller

● Consisted of– A memory controller– Display generator + DRAM

● DRAM was either shared with CPU or private

Page 4: GPU - An Introduction

History

● By 1997– More complex VGA controllers

● Incorporated 3D accelerating functions in hardware

– Triangle set up and rasterization– Texture mapping and shading

A combination of shapes(Lines, polygons, letters, …)

into an image consisting of individual pixels

Page 5: GPU - An Introduction

History

● By 2000– Single chip graphics processor incorporated

nearly all functions of graphics pipeline of high-end workstations

● Beginning of the end of high-end workstation market

– VGA controller was renamed Graphic Processing Units

Page 6: GPU - An Introduction

Current Trends

Well defined APIs

Open GL:Open standard for 3D graphics programming

Web GL:Open GL extension for web

DirectX:Set of MS multimedia programming interfaces (Direct3D for 3D graphics)

Can implement novel graphics algorithms

Use GPUs for non-conventional applications

Page 7: GPU - An Introduction

Current Trends

Combining powers of CPU and GPU - heterogeneous architectures

GPUs become scalable parallel processors

Moving from hardware-defined pipelining architectures to more flexible programmable architectures

Page 8: GPU - An Introduction

Architechture Evolution

CPU

Graphics card

Display

Memoryfloating point co-processors attached to microprocessors.

Interest to provide hardware support for displays

Led to graphics processing units (GPUs)

Page 9: GPU - An Introduction

GPUs with dedicated pipelines

Input stage

Vertex shader stage

Geometry shader stage

Rasterizer stage

Frame buffer

Pixel shading stage

Graphics

memory

Graphics chips generally had a pipeline structure

individual stages performing Specialized operations, finallyleading to loading frame buffer for Display

Individual stages may have access to graphics memory for storing intermediate computed data.

Page 10: GPU - An Introduction

PROGRAMMING GPUS

• Will focus on parallel computing applications

• Must decompose problem into set of parallel computations

• Ideally two-level to match GPU organization

Page 11: GPU - An Introduction

Example

Data are inbig array

Small array

Small array

Small array

Small array

Small array

Tiny Tiny

Tiny Tiny

Page 12: GPU - An Introduction

GPGU and CUDA

GPGU

● General-Purpose computing on GPU

● Uses traditional graphics API and graphics pipeline

CUDA

● Compute Unified Device Architecture

● Parallel computing platform and programming model

● Invented by NVIDIA

● Single Program Multiple Data approach

Page 13: GPU - An Introduction

CUDA

➢ CUDA programs are written in C

➢ Within C programs, call SIMT “kernel” routines that are executed on GPU

➢ Provides three abstractions

➢ Hierarchy of thread groups➢ Shared memory➢ Barrier synchronization

Page 14: GPU - An Introduction

Cont..

Page 15: GPU - An Introduction

CUDA

● Lowest level of parallelism – CUDA Thread

● Compiler + Hardware can gang 1000s of CUDA threads together leads to various levels of parallelism within the GPU

● MIMD,SIMD,Instruction level Parallelism

Single Instruction, Multiple Thread (SIMT)

Page 16: GPU - An Introduction

Conventional C Code

// Invoke DAXPY

dapxy(n,2.0,x,y);

// DAXPY in C

void daxpy(int n,double a,double *x, double *y)

{

for (int i=0;i<n;++i)

y[i] = a*x[i] + y[i];

}

Page 17: GPU - An Introduction

Corresponding CUDA Code

// Invoke DAXPY with 256 threads per Thread Block

_host_

int nblocks = (n+255)/256;

daxpy<<<nblocks,256>>>(n,2.0,x,y);

//DAXPY in CUDA

_device_

Void daxpy(int n,double a,double *x, double *y)

{

int i = blockIdX.x*blockDim.x+threadIdx.x;

if(i<n) y[i]=a*x[i]+y[i];

}

Page 18: GPU - An Introduction

Cont...

● _device_ (OR) _global_ --- functions of GPU

● _host_ --- functions of the system processor

● CUDA variables declared in the _device_ are allocated to the GPU Memory,which is acessable by all the multithreaded SIMD processors

● Function call syntax for the function uses GPU is

name<<<dimGrid,dimBlock>>>(..parameterlist..)

● GPU Hardware handles Threads

Page 19: GPU - An Introduction

● Threads are blocked together and executed in group of 32 threads – Thread Block

● The hardware that executes a whole block of threats is called a Multithreaded SIMD Processor

Page 20: GPU - An Introduction

ReferenceReference

http://en.wikipedia.org/wiki/Graphics_processing_unit

http://www.nvidia.com/object/cuda_home_new.html

http://computershopper.com/feature/200704_the_right_gpu_for_you

http://www.cs.virginia.edu/~gfx/papers/pdfs/59_HowThingsWork.pdf

http://en.wikipedia.org/wiki/Larrabee_(GPU)#cite_note-siggraph-9

http://www.nvidia.com/geforce

“Larrabee: A Many-Core x86 Architecture for Visual Computing”, Kruger and Westermann, International Conf. on Computer Graphics and Interactive Techniques, 2005

“ An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness”Sunpyo Hong,Hyesoon Kim

Page 21: GPU - An Introduction

Thank You..