utilization of gpu’s for general computing

11
Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji, et al.

Upload: hillary-petty

Post on 02-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Utilization of GPU’s for General Computing. Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda , Reiji, et al. Overview. Problem: Want to use the GPU for things other than graphics, however the costs can be high Solution: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Utilization of GPU’s for General Computing

Utilization of GPU’s for General Computing

Presenter: Charlene DiMeglio

Paper: Aspects of GPU for General Purpose High Performance Computing

Suda, Reiji, et al.

Page 2: Utilization of GPU’s for General Computing

Overview Problem:

Want to use the GPU for things other than graphics, however the costs can be high

Solution:

Improve the CUDA drivers

Results:

As compared to node of a supercomputer, worth it

Conclusion

These improvements make using GPGPU’s more feasible

Page 3: Utilization of GPU’s for General Computing

Problem: Need to computation power Why GPU’s?

GPU’s are not being fully realized as a resource, often sitting idle when not being used for graphics

Better performance for less power as compared to CPU’s

What’s the issue? Cost.

Efficient scheduling – timing data loads with its uses

Memory management – using the small amount of memory available effectively

Loads and stores – waiting for memory transfers, taking 100’s of cycles

Page 4: Utilization of GPU’s for General Computing

Solutions Brook+ by AMD, Larrabee by Intel

CUDA by NVIDA

Greatest technological maturity at the time

Paper investigating existing technology and suggested improvements

30 Multi-Processors

8 Streaming Processors

16kb

Page 5: Utilization of GPU’s for General Computing

NVIDA’s Tesla C1060 GPU vs.

Hitachi HA8000-tc/RS425 (T2K) Super Computer T2K – fastest supercomputer in Japan

T2K C1060

Cores/MPs 16 30

Clock frequency 2.3 GHz 1.3 GHz

Single SIMD vector length

4 32

Single peak 294 Gflops 933 Gflops

Main memory 32 GB 4 GB

Memory single peak

.109 .004

Cost ~$40,000 ~$2,500

Power 300 W 200 W

Page 6: Utilization of GPU’s for General Computing

Issues to Overcome High SIMD vector length

Small main memory size

High register spill cost

No L2 cache but rather read-only texture caches

Page 7: Utilization of GPU’s for General Computing

Methods to Hide Away Latency CUDA compiler option limits number of registers used per warp

1 warp = the 32 threads running in a block (SMID)

Maximizes number of warps that can run at a time

Could cause spills

Variable-sized multi-round data transfer scheduling with PCI express PCI express allows for data transfer, GPU and CPU computation to occur

in parallel

Allows for constant flow of information:

Allows for up to O(log x/x) as compared to uniform scheduling’s O()

Page 8: Utilization of GPU’s for General Computing

Methods to Hide Away Latency Computation time between communications > Communication latency

Worth sending the data over to the GPU

Increasing bandwidth and size of messages makes the constant term in overhead latency seem smaller

Efficient use of registers to prevent spills

Deciding what work to do where, GPU vs. CPU, work sharing

Minimizing divergent warps using atomic operations found in CUDA

Divergent warp occur when threads must follow both paths

Page 9: Utilization of GPU’s for General Computing

Results Variable-sized multi-round data transfer scheduling

Number of rounds

Page 10: Utilization of GPU’s for General Computing

Results Use of atomic instructions in CUDA to minimize latency

Page 11: Utilization of GPU’s for General Computing

Conclusion CUDA gives programmers the ability to harness the power of the GPU for

general uses.

The improvements presented allow this option to be more feasible.

Strategic use of GPGPU’s as a resource will improve speed and efficiency.

However, presented material mainly theoretical, not much strong data to back up

More suggestions than implementations, promoting GPGPU use