greengpu: a holistic approach to energy efficiency in gpu-cpu heterogeneous architectures kai ma,...
TRANSCRIPT
![Page 1: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/1.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
1
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang
Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996
2012 41st International Conference on Parallel Processing (ICPP)
Presented by Po-Ting Liu2013/07/25
![Page 2: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/2.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
2
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
![Page 3: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/3.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
3
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
![Page 4: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/4.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
4
Introduction
• Population of GPU-CPU heterogeneous architecture– High computational throughput– More efficient on SIMD operations– Better energy efficiency
• For instancePerformance Energy usage
Tianhe-1A 2.5 PetaFlops 4 MegaWatts
CPU base 2.5 PetaFlops 12 MegaWatts
NVIDIA. NVIDIA Tesla GPUs Power World's Fastest Supercomputer. http://goo.gl/STi9E
![Page 5: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/5.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
5
Introduction(cont.)
• However, it about
$2.7 million/year
for Tianhe-1A’s electricity bill
$2.7 million/year81 million/year in NTD
![Page 6: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/6.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
6
Introduction(cont.)
• GreenGPU– A holistic way to improve the energy efficiency and negligible
performance loss
• Two-tier design– First tier• Dynamically divide workload between CPU and GPU
– Second tier• Dynamically scale the frequencies of CPU and GPU
![Page 7: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/7.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
7
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
![Page 8: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/8.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
8
Motivation
• Case study on workload division between CPU and GPU– Properly divide the workload can reduce the idle time, and then save
the energy
*Benchmark: k-means
![Page 9: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/9.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
9
Motivation(cont.)
• Case study on frequency scaling for GPU memory–Properly scale down the under-utilized component can save
energy with negligible performance impact
nbody: core-bounded computation intensive
streamcluster(SC): memory-bounded memory intensive
Figure a Figure b
![Page 10: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/10.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
10
Motivation(cont.)
• Case study on frequency scaling for GPU core– There may be a frequency level of the component that is most
suitable
nbody: core-bounded computation intensive
streamcluster(SC): memory-bounded memory intensive
Figure a Figure b
![Page 11: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/11.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
11
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
![Page 12: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/12.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
12
System Design and Algorithms
FrequencyScaling(CPU)
WorkloadDivision
FrequencyScaling(GPU)
CPU GPU
CPUFrequency
CPUUtilization
GPUUtilization
GPUCore & Memory
FrequencyWorkload
CPUExecution
Time
GPUExecution
Time
Software
Hardware
Second Tier Second TierFirst Tier
![Page 13: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/13.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
13
System Design and Algorithms (cont.)
• First tier - Workload division - Overview– Dynamically divides the workloads between CPU and GPU– Based on execution time (CPU and GPU)– Conduct every iterations with fixed amount of work• Iteration defined as reduction point or common barrier point
![Page 14: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/14.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
14
System Design and Algorithms (cont.)
• First tier - Workload division - Example
assume each step is 5%: of next iteration: of next iteration
Workload(%) Execution time
CPU
GPU
![Page 15: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/15.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
15
System Design and Algorithms (cont.)
• First tier - Workload division - Avoid oscillating– Oscillation example• Optimal division point: (CPU/GPU)• Oscillating between (CPU/GPU) and (CPU/GPU)
– Solution• Linearly scale the execution time in the previous iteration based on the
possible workload to predict the execution time in next iteration• Example
(CPU/GPU) , must take 5% workload form GPU to CPU (CPU/GPU) for the next iteration If , keep using the current division (CPU/GPU) for next iteration
![Page 16: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/16.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
16
System Design and Algorithms (cont.)
• Second tier - CPU Frequency scaling - Strategy– On-demand• Linux default power saving strategy
– First• Running at lowest frequency (25MHz)
– Utilization rises above threshold (≥60%)• Setting to the peak frequency (100MHz)
– Utilization falls below threshold (<60%)• Scaling down the frequency step by step
– 75Mhz → 50MHz → 25MHz
Utilization100%
0%
Threshold60%
Frequency
100MHz
75MHz
50MHz
25MHz
![Page 17: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/17.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
17
System Design and Algorithms (cont.)
• Second tier - GPU Frequency scaling - Pseudo code
![Page 18: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/18.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
18
System Design and Algorithms (cont.)
• Second tier - GPU Frequency scaling - Loss factor– ,
– , is the interval index, is the level of frequency– , is the number of available frequency level– : current utilization(%) – : most suitable utilization for frequency level – : weight between Energy and Performance
![Page 19: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/19.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
19
System Design and Algorithms (cont.)
• Second tier - GPU Frequency scaling - Equations
– Loss factor of Core
– Loss factor of Memory
– Total Loss
– Weight
: weight between Core and Memory
: weight between Total loss and History weight
![Page 20: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/20.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
20
System Design and Algorithms (cont.)
• Problem for tiers affect each other
• Solution– Decouple the First tier and second tier• Configure the period of first tier to be much longer than second tier
– Overhead of first tier is much higher
![Page 21: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/21.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
21
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
![Page 22: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/22.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
22
Experiment
• Experimental environment– CPU:AMD Phenom II X2– GPU:NVIDIA 8800GTX– 2 power supply– 2 power meters• one for CPU, disk, main memory...• one for GPU
– OS:Ubuntu 10.04
![Page 23: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/23.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
23
Experiment (cont.)
• Benchmark– From Rodinia and NVIDIA SDK
![Page 24: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/24.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
24
Experiment (cont.)
• Frequency Scaling for GPU Cores and Memory
Benchmark: streamcluster (memory-bounded)Peak frequency of core: 576 MHzPeak frequency of memory: 900MHzScaling interval:3 seconds
![Page 25: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/25.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
25
Experiment (cont.)
• Frequency Scaling for GPU Cores and Memory
avg. energy saving: 5.97% without idle timeavg. energy saving: 29.2%
CPU+GPUavg. energy saving: 12.48%
![Page 26: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/26.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
26
Experiment (cont.)
• Workload Division between CPU and GPU
randomly set the initial division point
![Page 27: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/27.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
27
Experiment (cont.)
• Using both workload division and frequency scaling
avg. energy saving: 21%avg. performance loss: 1.7% (longer execution time)
![Page 28: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/28.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
28
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
![Page 29: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/29.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
29
Conclusion
• A holistic energy management framework for CPU-GPU heterogeneous architectures
• Dynamically divide the workload and scale the frequency
• Improve energy efficiency and only a few performance loss
• Achieve about 21% of average energy saving
![Page 30: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department](https://reader030.vdocuments.net/reader030/viewer/2022032723/56649d145503460f949e8297/html5/thumbnails/30.jpg)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
30
Thanks