gpgpu - utrecht university 1.pdf · gpu programming requires a very different way of expressing...

47
GPGPU IGAD – 2014/2015 Lecture 1 Jacco Bikker

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

GPGPUIGAD – 2014/2015

Lecture 1

Jacco Bikker

Page 2: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Today:

Course introduction

GPGPU background

Getting started

Assignment

Page 3: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Introduction

GPU History

Page 4: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

History

3DO-FZ1

console

1991

Page 5: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

NVidia NV-1

(Diamond Edge 3D)

1995

History

Page 6: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

3Dfx –

Diamond Monster 3D

1996

History

Page 7: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Quake vs GLQuake

1997

History

Page 8: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Fixed function pipeline

vs

Programmable pipeline

2007

History

Page 9: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

History

Page 10: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Source: Naffziger, AMD

History

Page 11: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

History

Page 12: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

GPU - conveyor belt:

input = vertices + connectivity

step 1: transform

step 2: rasterize

step 3: shade

step 4: z-test

output = pixels

History

Page 13: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Introductionvoid main(void) {

float t = iGlobalTime;vec2 uv = gl_FragCoord.xy / iResolution.y;float r = length(uv), a = atan(uv.y,uv.x);float i = floor(r*10);a *= floor(pow(128,i/10)); a += 20.*sin(0.5*t)+123.34*i-100.*(r*i/10)*cos(0.5*t);r += (0.5+0.5*cos(a)) / 10; r = floor(N*r)/10;gl_FragColor = (1-r)*vec4(0.5,1,1.5,1);

}

https://www.shadertoy.com/view/4sjSRt

Page 14: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

IntroductionHistorically, the GPU is a co-processor.

GPUs perform well because they have a constrained execution model, which is based on parallelism.

GPU programming requires a very different way of expressing algorithms.

Page 15: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Introduction

This course

Teacher background

Your role

Learning objectives

ECTS / lectures / homework / assessment

Page 16: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

This course

AGT6:

7 lectures

We start at 10.00am

Demo time

Break half-way

Page 17: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Lecturer

Me : dr. Jacco Bikker - CUDA – Ray tracing – Rendering

Page 18: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Your role

You:

Maybe a GPGPU / shader expert

Use AGT6 to get further

Or just pass with a 6

Page 19: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Objectives

Objectives:

Get feet wet

Generic GPGPU concepts

*not*:

Detailed API knowledge

Page 20: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Details

AGT6:

3 ECTS = ~80 hours

Weekly homework, unverified

Final assignment: free form

Page 21: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Background

GPU architecture

Page 22: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

GPU architecture

CPU: Designed to run one thread as fast as possible.

Use large caches to minimize memory latency

Maximize cache usage using pipeline & branch prediction

Multi-core processing Task parallelism

Interesting tricks:

SIMD

“Hyperthreading”

Page 23: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

GPU architecture

GPU: Designed to combat latency using many threads.

Hide latency by computation

Maximize parallelism

Streaming processing Data parallelism

Interesting tricks:

Use typical GPU hardware (filtering etc.)

Cache anyway

S I M T

Page 24: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

GPU architecture

CPU

Multiple tasks = multiple

threads

Tasks run different instructions

10s of complex threads execute

on a few cores

Thread execution managed

explicitly

GPU

SIMD: same instructions on

multiple data

10.000s of light-weight threads

on 100s of cores

Threads are managed and

scheduled by hardware

Page 25: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

GPU architecture

Page 26: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

GPU architecture

SIMT Thread execution:

Group 32 threads (vertices, pixels, primitives) into warps

Each warp executes the same instruction

In case of latency, switch to different warp (thus: switch out 32

threads for 32 different threads)

Flow control: …

Page 27: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

GPU architecture

void main(void) // for each pixel{

float t = iGlobalTime;vec2 uv = gl_FragCoord.xy / iResolution.y;float r = length(uv), a = atan(uv.y,uv.x);float i = floor(r*10);a *= floor(pow(128,i/10)); a += 20.*sin(0.5*t)+123.34*i-100.*(r*i/10)*cos(0.5*t);r += (0.5+0.5*cos(a)) / 10; r = floor(N*r)/10;gl_FragColor = (1-r)*vec4(0.5,1,1.5,1);

}

Page 28: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

GPU architecture

Easy to port to GPU:

Image postprocessing

Particle effects

Ray tracing

Actually, a lot of algorithms are not easy to port at all.

Decades of legacy, or a fundamental problem?

Page 29: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Background

Why GPGPU

OpenCL vs Shaders vs CUDA

Page 30: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Why GPGPU

Some tasks are more efficient on the GPU

GPU has high theoretical peak performance

Prevent wasting processing power

Page 31: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

OpenCL vs shaders

No mapping to graphics context needed

Avoid thinking about various transformations of

coordinates (world / screen / texture)

Access to memory levels that are implicit in

OpenGL

OpenCL also runs on CPUs

Page 32: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

OpenCL vs CUDA

(but if you must:

“A Comprehensive Performance Comparison of CUDA and OpenCL”, Fang et al., 2011http://www.researchgate.net/publication/221084751_A_Comprehensive_Performance_Comparison_of_CUDA_and_OpenCL/links/0c96051c2bd67d9896000000 )

Page 33: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Getting Started

Tools of the trade

Template

Page 34: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Tools

Get your development tools here:

NVidia: https://developer.nvidia.com/opencl

AMD: http://developer.amd.com/tools-and-sdks/opencl-zone/

Intel: https://software.intel.com/en-us/intel-opencl

Page 35: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Template

Template available from N@TSchool!

Page 36: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Template

__kernel void main( write_only image2d_t outimg ){

int column = get_global_id( 0 );int line = get_global_id( 1 );// calculate checkerboard patternint tileX = column / 40;int tileY = line / 40;float color = (float)((tileX + tileY) & 1); // 0 or 1float4 white = (float4)( 1, 1, 1, 1 );write_imagef( outimg, (int2)(column, line), color * white );

}

Page 37: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Template

#version 330uniform sampler2D color;in vec2 P;in vec2 uv;out vec3 pixel;void main(){

// retrieve input pixelpixel = texture( color, uv ).rgb;// darken towards edgesfloat dx = P.x - 0.5, dy = P.y - 0.5;float distance = sqrt( dx * dx + dy * dy );float scale = 1 - max( 0, distance * 2.2 - 0.8 );pixel *= scale;

}

Page 38: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Template

bool Game::Init(){

// load shader and textureclOutput = new Texture( SCRWIDTH, SCRHEIGHT, Texture::FLOAT );shader = new Shader( "shaders/checker.vert", "shaders/checker.frag" );// load OpenCL codekernel = new Kernel( "programs/program.cl", "main" );// link cl output texture as an OpenCL bufferoutputBuffer = clCreateFromGLTexture2D( kernel->GetContext(),

CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, clOutput->GetID(), 0 );kernel->SetArgument( 0, &outputBuffer );// donereturn true;

}

Page 39: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Template

void Game::Tick(){

// run cl code to fill texturekernel->Run( &outputBuffer );// run shader on cl-generated textureshader->Bind();shader->SetInputTexture( GL_TEXTURE0, "color", clOutput );shader->SetInputMatrix( "view", mat4( 1 ) );DrawQuad();

}

Page 40: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Getting Started

MyFirst OpenCL app

OpenCL terminology

Page 41: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Terminology

A few words you need to know the meaning of:

1. Device

2. Host

3. Context

4. Kernel

5. Program

6. Compute unit (CUDA: CUDA core)

7. Work item (CUDA: thread)

8. Command queue (synchronous, asynchronous)

Page 42: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

MyFirst

To execute an OpenCL program:

1. Query the host system for OpenCL devices

2. Create a context to associate the OpenCL devices

3. Create programs that will run on one or more associated

devices

4. From the programs, select kernels to execute

5. Create memory objects on the host or on the device

6. Copy memory data to the device as needed

7. Provide arguments for the kernels

8. Submit the kernels to the command queue for execution

9. Copy the results from the device to the host.

clGetPlatformIDs(…)

clGetDeviceIDs(…)

clCreateContext(…)

clCreateCommandQueue(…)

clCreateProgramWithSource(…)

clBuildProgram(…)

clCreateKernel(…)

clCreateBuffer(…)

clEnqueueWriteBuffer(…)

clSetKernelArg(…)

clEnqueueNDRangeKernel(…)

clFinish(…)

clEnqueueReadBuffer(…)

Page 43: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

MyFirst

#include <stdio.h>#include "CL/cl.h"#define ITEMS 10const char *KernelSource ="__kernel void hello(__global float *input, __global float *output)\n"\"{\n size_t id = get_global_id(0);\n output[id] = input[id] * input[id];\n}";

void main(){

cl_int err; cl_uint num_of_platforms = 0;cl_platform_id platform_id; cl_device_id device_id;cl_uint num_of_devices = 0; size_t global = ITEMS;float inputData[ITEMS] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }, results[ITEMS] = { 0 };

clGetPlatformIDs( 1, &platform_id, &num_of_platforms );clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &num_of_devices );cl_context_properties props[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)platform_id, 0 };

cl_context context = clCreateContext( props, 1, &device_id, 0, 0, &err );cl_command_queue queue = clCreateCommandQueue( context, device_id, 0, &err );cl_program program = clCreateProgramWithSource( context, 1, (const char**)&KernelSource, 0, &err );clBuildProgram( program, 0, NULL, NULL, NULL, NULL );cl_kernel kernel = clCreateKernel( program, "hello", &err );

cl_mem input = clCreateBuffer( context, CL_MEM_READ_ONLY, 4 * ITEMS, 0, 0 );cl_mem output = clCreateBuffer( context, CL_MEM_WRITE_ONLY, 4 * ITEMS, 0, 0 );

clEnqueueWriteBuffer( queue, input, CL_TRUE, 0, 4 * ITEMS, inputData, 0, 0, 0 );clSetKernelArg( kernel, 0, sizeof( cl_mem ), &input );clSetKernelArg( kernel, 1, sizeof( cl_mem ), &output );

clEnqueueNDRangeKernel( queue, kernel, 1, 0, &global, 0, 0, 0, 0 );clFinish( queue );

clEnqueueReadBuffer( queue, output, CL_TRUE, 0, 4 * ITEMS, results, 0, 0, 0 );for( int i = 0; i < ITEMS; i++ ) printf( "%f ",results[i] );

clReleaseMemObject( input ); clReleaseMemObject( output );clReleaseProgram( program ); clReleaseKernel( kernel );clReleaseCommandQueue( queue ); clReleaseContext( context );

}

Page 44: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

MyFirst

bool Kernel::InitCL(){

cl_platform_id platform;cl_device_id* devices;cl_uint devCount;cl_int error;

...

}

Like I said, I don’t care much for API details…

Just start with the template, and modify /

replace it when the need arises.

Page 45: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Assignment

Create an OpenCL program that calculates Voronoi noise

for a 512x512 buffer and make it available to the CPU.

Measure the performance gain compared to CPU-only.

Reference: https://www.shadertoy.com/view/4djGRh

Page 46: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

Words of Advice

WebGL != OpenCL

Can’t do ‘by reference’, use pointers instead

float3 parameter: (float3)(1, 1, 1)

fract requires second parameter

sinf doesn’t exist, use sin

Also, see this helpful chart:

https://www.khronos.org/files/opencl-1-1-quick-reference-card.pdf

Page 47: GPGPU - Utrecht University 1.pdf · GPU programming requires a very different way of expressing algorithms. Introduction This course Teacher background Your role Learning objectives

“The End”(for now)