gpu power model nandhini sudarsanan sudar003@umn.edusudar003@umn.edu nathan vanderby...

Post on 20-Jan-2016

233 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GPU Power Model

Nandhini Sudarsanan sudar003@umn.eduNathan Vanderby vande501@umn.edu

Neeraj Mishra mish0088@umn.eduUsha Vinodh kuma0253@un.edu

Chi Xu xuchi@umn.edu

Outline

• Introduction and Motivation• Analytical Model Description• Experiment Setup• Results• Conclusion and Further Work

Introduction

Motivation

Outline

• Introduction and Motivation• Analytical Model Description

o Parser o Power Model

• Experiment Setup• Results• Conclusion and Further Work

Parser

Outline

• Introduction and Motivation• Analytical Model Description

o Parser o Power Model

• Experiment Setup• Results• Conclusion and Further Work

Power Model

• PTX Level

Power Model

• Assembly Level

Outline

• Introduction and Motivation• Analytical Model Description

o Parser o Power Model

• Experiment Setup• Results• Conclusion and Further Work

Experiment Setup - Hardware

• Measure Power Consumption and Temperatureo Current Clamp for PCIE & GPU Power Cable

Data Acquisition Card @ 100Hzo GPU Performance Countero Sample Temperature @ 10Hz, GPU sensor

Experiment Setup - Software

• Driver API• Generate and Modify PTX code

o Minimize control loops• CUDA 4.0

o Built in Binary -> Assembly Converter (cuobjdump)• MATLAB to build model• Remote login

CUDA- Fermi Architecture

• Third Generation Streaming Multiprocessor(SM)o 32 CUDA cores per SM, 4x over GT200o 1024 thread block size, 2x over GT200o Unified address space enables full C++ supporto  Improved Memory Subsystem

Benchmarks

• Small number of overhead operations (loop counters, initialization, etc.).

• Computational intensive work  to allow for an experiment of significant length for accurate  current measurement.

• Exhibit high utilization of the CUDA cores, few data hazards as possible.

• Grid and block sizes appropriately so that all SM are used, since idle SM leak.

•  Accordingly 7 benchmarks were selected from CUDA SDK.

Benchmarks

For this project we tested out a few benchmarks.• 2D convolution• Matrix Multipication• Vector Addition• Vector Reduction• Scalar Product• DCT 8x8• 3DFD

Limitations of PTX

• Higher level than assemblyo Divide & Sqrt: 1 PTX line, library in assembly

• Compiler optimizations from PTX -> assembly• Doesn’t reflect RAW dependencies• Performance counters use assembly

Outline

• Introduction and Motivation• Analytical Model Description

o Parser o Power Model

• Experiment Setup• Results• Conclusion and Further Work

Results

Outline

• Introduction and Motivation• Analytical Model Description

o Parser o Power Model

• Experiment Setup• Results• Conclusion and Further Work

Conclusion and Further Work

• Conclusion

• Further Worko Take into account context switcheso Consider Multiple kernels running simultaneously

The End

Thanks

Q&A

top related