irregular programs on gpu

18
A Quantitative Study of Irregular Programs on GPUs By Prashant Momale IIT Kanpur Guided By Prof. S. K. Aggarwal

Upload: prashant-momale

Post on 20-Jun-2015

442 views

Category:

Education


3 download

DESCRIPTION

Quantitative Study of Irregular Programs on GPU

TRANSCRIPT

Page 1: Irregular Programs on GPU

A Quantitative Study of Irregular Programs onGPUs

ByPrashant Momale

IIT Kanpur

Guided ByProf. S. K. Aggarwal

Page 2: Irregular Programs on GPU

Introduction

Regular vs Irregular Algorithms- Regular Programs

(i) operate on large vectors or matrices

(ii) access them in statically predictable ways

- These codes often have high computational Demands

- exhibit extensive data parallelism

- access memory in a streaming fashion, and require little synchronization

i.e. Matrix Multiplication

Page 3: Irregular Programs on GPU

Introduction(Continue...)

Irregular Programs

- build, traverse, and update irregular data structures such as trees, graphs, and priority queues

i.e. domains like n-body simulation, data mining, decisions problems that use Boolean satisfiability, optimization theory, social networks

- more difficult to parallelize

- more challenging to map to GPUs than regular programs

Page 4: Irregular Programs on GPU

Introduction(Continue...)

Many Questions to be solved

- Several GPU implementation of irregular programs have been published but very little is known about them

- Some questions do not have clear answers like

(i) Does irregularity really manifest itself as a binary property?

(ii) How is the irregularity behavior of an application influenced by its input, if at all?

(iii) Does an increase in irregularity necessarily degrade performance or might it help in certain cases?

- Answers to above questions are really important to understand the behavior of irregular programs

Page 5: Irregular Programs on GPU

Irregularity

Regular Programs- Control flow and memory access are not data dependent

Ex. In matrix multiplication, knowing source code, starting address and input size and without knowing any matrix elements we can predict the behavior

Irregular Programs- Control flow and memory access are data dependent

- Input values determine the program's behavior

Ex. Binary Search Tree implementation

The values and the order in which they are processed affect the control flow and memory references

Page 6: Irregular Programs on GPU

Irregularity (Continued....)

Warp Concept

- GPU contains processing elements (PEs) and tightly coupled PEs form a streaming

multiprocessor (SM).

- Each PE in an SM can run an independent thread of instructions

- The PEs in each SM execute vector instructions that conditionally operate on 32 data items.

- A set of 32 threads that run together in this fashion is called a warp.

Page 7: Irregular Programs on GPU

Irregularity (Continued....)

Control Flow Irregularity- Sometimes all threads in warp can not perform same instruction.

- Threads automatically get subdivided into sets

- Threads from set performs same instruction

- But sets get executed in serial manner until they re-converge.

Situation where not all threads in warp follow the same control flow is call Thread Divergence.

This is a Control Flow Irregularity

Page 8: Irregular Programs on GPU

Irregularity (Continued....)

Memory Access Irregularity - Coalesced memory transaction

- When memory access is not coalesced, hardware has to perform many memory transactions, one after the other, compared to coalesced access.

This is how Memory Access Irregularity can lower the performance.

- Bank Conflict : Warp can simultaneously access 32 words in shared memory as long as they reside in different banks. If more than one word is touched within a bank bank conflict occurs.

Bank Conflict is another reason of memory access irregularity

Page 9: Irregular Programs on GPU

Metrics of Irregularity

(i) Control Flow Irregularity

CFI = (divergent branches ) / (executed instructions)

(ii)Memory-Access Irregularity

MAI = ( replayed instructions) / ( issued instructions)

Page 10: Irregular Programs on GPU

Metrics of Irregularity(Continued...)

- Both metrics ranges from 0% to 100%

- Higher the values higher is the irregularity

- CFI is usually low

- They are independent of runtime

- Both metric s measure irregularity at warp level

These metrics do not classify a program as regular or irregular. Rather, they measure the Degree of Irregularity

Page 11: Irregular Programs on GPU

Results and Analysis

- Analysis of observations about the irregularity exhibited by various CUDA kernels has be presented.

- Investigated the effect of different program inputs

- Effect of optimizations on programs

- Variability of the results between different runs

(i) on same GPU

(ii) on different GPU

(Benchmarks Used :

Irregular - BFS, Barnes Hut, Data Compression, Delaunay Mesh Refinement, Points-to Analysis, Survey Propagation, Single Source Shortest Path, TSP

Regular - Black Scholes, Histogram, Monte Carlo, Matrix Multiplication, N-Body )

Page 12: Irregular Programs on GPU

Results and Analysis(Continued....)

Amount of Irregularity

- CFI is usually very low. For above benchmarks it is less than 4.1%

- Most of the programs can not strictly classified as regular or irregular

- Two irregularities appear to be independent of each other

- Irregular control flow generally implies irregular memory access

Page 13: Irregular Programs on GPU

Results and Analysis(Continued....)

Input Sensitivity- Input sensitivity is very difficult to predict

- Difficult to do it in application independent way

(i) Input Oblivious - Irregularity remains largely constant for different inputs

(ii) Input-type Dependent - Irregularity varies largely across different types of inputs rather than within a single type

(iii) Input Dependent – Irregularity varies as size of the input varies

Page 14: Irregular Programs on GPU

Results and Analysis(Continued....)

(iii) Arithmetic Precision –

Change from single precision to double precision increases CFI and MAI for small inputs but decreases both for medium and large inputs

But the change is very small.

- It indicates that change in arithmetic precision does not affect the irregularity of program.

Page 15: Irregular Programs on GPU

Results and Analysis(Continued....)

Variability

- Observed for several kernels on different GPUs and same GPUs for multiple runs

Irregularities are quite stable for same GPU and vary somewhat between distinct GPUs

Page 16: Irregular Programs on GPU

Conclusion

- There is no type of programs as regular or irregular

- Irregularity is not necessarily bad for the performance

- By definition, irregular programs are data dependent but deferent inputs yield similar degrees of irregularity

- Irregularity does no vary much between distinct GPUs

It is expected that above conclusions hold across a broad range of CUDA-capable GPUs and hope that it will increase the understanding of the behavior of irregular GPU applications.

Page 17: Irregular Programs on GPU

References

Paper : A Quantitative Study of Irregular Programs on GPUs

By - Rupesh Nasre, Keshav Pingali, Martin Burtscher

Texas State University

Published in – IEEE International Symposium on Workload Characterization ( IISWC '13 )

Page 18: Irregular Programs on GPU

Results and Analysis(Continued....)

Effect of Optimizations and Arithmetic Precision

(i) Regular version of one program reads records from global memory but in optimized version if calculates the record values on the fly.

- This actually increase the Control Flow Irregularity

- But faster is the performance because computations are cheaper than reading values from global memory.

(ii) In optimized Single Source Shortest Path algorithm, nodes which are logically close to each other are kept close in memory.

- It increase the Memory-Access Irregularity but increases the spatial locality