gpu power model nandhini sudarsanan sudar003@umn.edusudar003@umn.edu nathan vanderby...

GPU Power Model

Nandhini Sudarsanan sudar003@umn.eduNathan Vanderby vande501@umn.edu

Neeraj Mishra mish0088@umn.eduUsha Vinodh kuma0253@un.edu

Chi Xu xuchi@umn.edu

Outline

• Introduction and Motivation• Analytical Model Description• Experiment Setup• Results• Conclusion and Further Work

Introduction

Motivation

Outline

• Introduction and Motivation• Analytical Model Description

o Parser o Power Model

• Experiment Setup• Results• Conclusion and Further Work

Parser

Outline

Power Model

• PTX Level

Power Model

• Assembly Level

Outline

Experiment Setup - Hardware

• Measure Power Consumption and Temperatureo Current Clamp for PCIE & GPU Power Cable

Data Acquisition Card @ 100Hzo GPU Performance Countero Sample Temperature @ 10Hz, GPU sensor

Experiment Setup - Software

• Driver API• Generate and Modify PTX code

o Minimize control loops• CUDA 4.0

o Built in Binary -> Assembly Converter (cuobjdump)• MATLAB to build model• Remote login

CUDA- Fermi Architecture

• Third Generation Streaming Multiprocessor(SM)o 32 CUDA cores per SM, 4x over GT200o 1024 thread block size, 2x over GT200o Unified address space enables full C++ supporto Improved Memory Subsystem

Benchmarks

• Small number of overhead operations (loop counters, initialization, etc.).

• Computational intensive work to allow for an experiment of significant length for accurate current measurement.

• Exhibit high utilization of the CUDA cores, few data hazards as possible.

• Grid and block sizes appropriately so that all SM are used, since idle SM leak.

• Accordingly 7 benchmarks were selected from CUDA SDK.

Benchmarks

For this project we tested out a few benchmarks.• 2D convolution• Matrix Multipication• Vector Addition• Vector Reduction• Scalar Product• DCT 8x8• 3DFD

Limitations of PTX

• Higher level than assemblyo Divide & Sqrt: 1 PTX line, library in assembly

• Compiler optimizations from PTX -> assembly• Doesn’t reflect RAW dependencies• Performance counters use assembly

Outline

Results

Outline

Conclusion and Further Work

• Conclusion

• Further Worko Take into account context switcheso Consider Multiple kernels running simultaneously

The End

Thanks

gpu power model nandhini sudarsanan sudar003@umn.edusudar003@umn.edu nathan vanderby...

ptx assemblydoesnt

ptx line

cuda sdk

idle sm leak

block sizes

gt2001024 thread block

temperaturecurrent clamp

neeraj mishra

Documents

using jakarta struts to rewrite the university admission...

born-digital aes and ces publications: archiving and...

timothy r. johnson(trj@umn.edu)

vinodh n. rajapakse (vinodh@math.umd.edu) advisor: prof....

civil engineering bce environmental engineering b.enve...

mathematical theory mathematical theory of gestures in music...

vinodh 2014

1 can lrt improve job access of the working poor? yingling...

grantha script lessons - vinodh rajan

gpu power model nandhini sudarsanan...

formulas gestures formulas gesturesmusic mathematics guerino...

umapathy k ranjitha k swetha s vinodh k … modeling and...

postdoctoral association townhall meeting pda@umn.edu

national resilience resource center llc marsh008@umn.edu...

irb discussion consent and assent issues in vulnerable...

vinodh kandavalli rate-limiting steps in transcription

para elink - an update 12th annual minnesota...

pediatrics emily borman-shoap, m.d. program director...

prabodhanam jan 2018 -...

image fusion using gaussian mixture models - bmva · 2013....