ms thesis defense “improving gpu performance by regrouping cpu-memory data” by deepthi gummadi

36
MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014

Upload: alan-acosta

Post on 30-Dec-2015

34 views

Category:

Documents


0 download

DESCRIPTION

MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi. CoE EECS Department April 21, 2014. About Me. Deepthi Gummadi MS in Computer Networking with Thesis LaTeX programmer at CAPPLab since Fall 2013 Publications - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

MS Thesis Defense

“IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA”

byDeepthi Gummadi

CoE EECS Department

April 21, 2014

Page 2: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 2

About Me

Deepthi Gummadi MS in Computer Networking with Thesis LaTeX programmer at CAPPLab since Fall 2013 Publications

“New CPU- to-GPU Memory Mapping Technique,” in IEEE SouthEast Conference 2014.

“The Impact of Thread Synchronization and Data Parallelism on Multicore Game Programming,” accepted in IEEE ICIEV-2014.

“Feasibility Study of Spider-Web Multicore/Manycore Network Architectures,” currently preparing.

“Investigating Impact of Data Parallelism on Computer Game Engine,” under review, IJCVSP Journal, 2014.

Page 3: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 3

Committee Members

Dr. Abu Asaduzzaman, EECS Dept.

Dr. Ramazan Asmatulu, ME Dept.

Dr. Zheng Chen, EECS Dept.

Page 4: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 4

“IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA”

Outline ►

Introduction Motivation Problem Statement Proposal Evaluation Experimental Results Conclusions Future Work

Q U E S T I O N S ? Any time, please.

Page 5: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 5

Introduction

Central Processing Unit (CPU) Technology

Interpret and Execute the program instructions.

What is new about CPU? Initially, Processor evolved in

sequential structure. In millennium, processor

speeds reached parallel. Currently, we have multi core

on-chip CPUs.CPU Speed Chart

Page 6: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 6

Cache Memory Organization

Why we use cache memory?

Several memory layers: Lower-level caches –

faster, performing computations.

Higher-level cache – slower, storage purposes.

Intel 4-core processor

Page 7: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 7

NVIDIA Graphic Processing Unit

Parallel Processing

Architecture

Components Streaming Multiprocessors

Warp Schedulers Execution pipelines Registers

Memory Organization Shared memory Global memory

GPU Memory Organization

Page 8: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 8

CPU and GPU

CPU GPU

Low Latency High Throughput, Moderate Latency

Cache Memory Shared Memory

Optimized MIMD Optimized SIMD

CPU and GPU work together to be more efficient.

Page 9: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 9

CPU-GPU Computing Workflow

Step 1: CPU allocates the memory and copies the data.

cudaMallac() cudaMemcpy()

Page 10: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 10

CPU-GPU Computing Workflow

Step 2: CPU sends function parameters and instructions to GPU.

Page 11: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 11

CPU-GPU Computing Workflow

Step 3: GPU executes the instructions based on received commands.

Page 12: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 12

CPU-GPU Computing Workflow

Step 4: After execution, the results will be retrieved from GPU DRAM to CPU memory.

Page 13: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 13

Motivation

■ Data level parallelism Spatial data partitioning Temporal data

partitioning Spatial instruction

partitioning Temporal instruction

partitioning

Two Parallelization Strategies

Page 14: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 14

Motivation

■ Parallelism and optimization techniques simplifies the programming for CUDA.

■ From developers view the memory is unified.

Page 15: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 15

Problem Statement

Traditional CPU to GPU global memory mapping technique is not good for GPU Shared memory

Page 16: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 16

Outline ►

Introduction Motivation Problem Statement Proposal Evaluation Experimental Results Conclusions Future Work

Q U E S T I O N S ?Any time, please.

“IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA”

Page 17: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 17

Proposal

Proposed CPU to GPU memory mapping to improve GPU shared memory performance

Page 18: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 18

Proposed Technique

Major Steps:Step 1: Start

Step 2: Analyze problems; determine input parameters.

Step 3: Analyze GPU card parameters/characteristics.

Step 4: Analyze CPU and GPU memory organizations.

Step 5: Determine the number of computations and the number of threads.

Step 6: Identify/Partition the data-blocks for each thread.

Step 7: Copy/Regroup CPU data-blocks to GPU global memory.

Step 8: Stop

Page 19: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 19

Proposed Technique

Traditional Mapping

■ Data directly copied from CPU to GPU global memory.

■ Retrieved from different global memory blocks.

■ It is difficult to store the data into GPU shared memory.

Proposed Mapping

■ Data should be regrouped and then copied from CPU to GPU global memory.

■ Retrieved from consecutive global memory blocks.

■ It is easy to store the data into GPU shared memory.

Page 20: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 20

Evaluation

System Parameters:

CPU Dual processor

speed: 2.13 GHz

Fermi card: 14 SM, 32 CUDA cores in each SM.

Kepler card: 13 SM, 192 CUDA cores in each SM

Page 21: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 21

Evaluation

Memory sizes of CPU and GPU cards.

Input parameters are size of rows and size of columns, whereas the output parameter is time.

Page 22: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 22

Evaluation

Electric charge distribution by Laplace’s equation for 2D problem (finite difference approximation)

ϵx(i,j)(Φi+1,j - Φi,j)/dx + ϵy(i,j)(Φi,j+1 - Φi,j)/dy +

ϵx(i-1,j)(Φi,j – Φi-1,j)/dx + ϵx(i,j-1)(Φi,j - Φi,j-1)/dy =0

Φ = electric potential

ϵ = medium permittivity

dx , dy = spatial grid size,

Φi,j = electric potential defined at lattice point (i, j)

ϵx(i,j), ϵy(i,j) = effective x- and y-direction permittivity defined at edges of the element cell (i, j).

Page 23: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 23

Evaluation

Electric potential can be considered as same for a uniform material, the equation becomes

(Φi+1,j - Φi,j)/dx + (Φi,j+1 - Φi,j)/dy +

(Φi,j – Φi-1,j)/dx + (Φi,j - Φi,j-1)/dy =0

23

Page 24: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 24

Outline ►

Introduction Motivation Problem Statement Proposal Evaluation Experimental Results Conclusions Future Work

Q U E S T I O N S ?Any time, please.

“IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA”

Page 25: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 25

Experimental Results

■ Conducted study on high electric charge distribution by Laplace’s equation.

■ Implemented on three versions CPU only. GPU with shared memory. GPU without shared memory.

■ Input / Outputs Problem size (n for NxN Matrix) Execution time

Page 26: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 26

Experimental Results

Nn,m = 1/5 (Nn,m-1 + Nn,m+1 + Nn,m + Nn-1,m + Nn+1,m) Where, 1 <= n <= 8 and 1 <= m <= 8

Validation of our CUDA/C code:

Both CPU/C and CUDA/C programs produce the same values

Page 27: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 27

Experimental Results

Nn,m = 1/5 (Nn,m-1 + Nn,m+1 + Nn,m + Nn-1,m + Nn+1,m) Where, 1 <= n <= 8 and 1 <= m <= 8

Validation of our CUDA/C code:

Both CPU/C and CUDA/C programs produce the same values

Page 28: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 28

Experimental Results

Impact of GPU shared memory As the number of

threads increases the processing time decreases (till 8X8 threads).

After 8X8 threads, GPU with shared memory shows better performance.

Page 29: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 29

Experimental Results

Impact of the Number of Threads At a constant shared

memory, the processing time of a GPU decreases as the number of threads increases (till 16X16).

After 16X16 threads, Kepler card shows better performance.

Page 30: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 30

Experimental Results

Impact of amount of shared memory As the size of GPU

shared memory increases, the processing time decreases.

Page 31: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 31

Experimental Results

Impact of the proposed data regrouping technique In the case of data

regrouping with shared memory, as the number of threads increases the processing time decreases.

Among the GPU with and without shared memory, with shared memory gives better performance for more number of threads.

Page 32: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 32

Conclusions

For fast effective analysis of complex systems, high performance computations are necessary.

NVIDIA CUDA CPU/GPU, proves its potential on high computations.

Traditional memory mapping follows locality principle. So, data doesn’t fit in GPU shared memory.

Beneficial to keep data in GPU shared memory than GPU global memory.

Page 33: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 33

Conclusions

To overcome this problem, we proposed a new memory mapping between CPU and GPU to improve the performance.

Implemented on three different versions.

Results indicates that proposed CPU-to-GPU memory mapping technique helps in decreasing the overall execution time by more than 75%.

Page 34: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 34

Future Extensions

■ Modeling and simulation of Nanocomposites: Nanocomposites requires large number of computations at high speed.

■ Aircraft applications:

High performance computations are required to study the mixture of composite materials.

Page 35: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 35

“IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA”

Questions?

Page 36: MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi

Gummadi 36

“IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA”

Thank you

Contact:

Deepthi Gummadi

E-mail: [email protected]