practical dynamic thermal management on intel desktop computer guanglei liu department of electrical...
TRANSCRIPT
PRACTICAL DYNAMIC THERMAL MANAGEMENT ON INTEL DESKTOP COMPUTER
Guanglei LiuDepartment of Electrical and Computer Engineering
Florida International UniversityJuly 12, 2012
Major Professor: Dr. Gang Quan
Thermal Design Challenges
Figure from Intel Microprocessor Technology Lab, 2011
Number of transistors keeps increasing
• Nearly 40 billon transistors are
integrated into single die [Mizunuma, 2009
ICCAD]
More complicated architectures are built
• 80 core single chip processor has been
demonstrated by Intel [Vangal, 2007 ISSCC]
Environmental concerns
• In U.S, 46% of electricity is generated by fossil
fuels.
Electric Bill
• U.S. Datacenters: 120 billon kilowatt
hours in 2012 • 9 billion dollar, 15% of all energy in U.S.
High transistor density increases power density
High power density brings up the on-chip temperatures and causes thermal issues
Source: Environmental Protection Agency (EPA) Report
Thermal Issues
Increase package/cooling costs
• 1-3 dollar per watt [Skadron, ICSA 2003]• Data center, each watt on computing, ½ - 1 watt for
cooling [Brill, 2007]
Affect reliability
• As much as 50% reduction of device’s life span for
every 10oC increase [Yeo, DAC 2008]
Degrade performance
• 10-15% more circuit delay for each 15oC increase [Santarini, EDN 2005]
Crush the computing system
• Processor’s self-protect mechanism automatically shuts down processor to avoid physical damage [Rohou, WFDO 1999]
Increase Leakage power consumption
• Temperature from 65oC to 110oC can increase the leakage power by 38% for IC circuits.[Santarini, EDN 2005]
Computing system cooling solutions
Mechanical Cooling Solution
Air-cooling (e.g. fan + heat sink)
• Cooling cost takes 51% of overall server power budget [Lefurgy, COM 2003]
• Noise level increases 10dB as fan speed increases by 50% [Lyon, STMMS 2004]
Liquid-cooling
• High density liquid absorb 3500 times more heat than air [Chu, DMR 2004]
High cooling cost
Dynamic Thermal Management (DTM)
• Dynamic voltage and frequency scaling
(DVFS) technique [Kim, HPCA 2008]
• Task migration [Lim QED 2002]
• Clock gating [Gunther, ITJ 2001]
• Fetch toggling [Brooks, HPCA 2001]
Sacrifice system performance
Related Theoretical Work
Our Research Goal: To develop up a practical hardware platform that enables us to investigate the limitations of the existing theoretical work, and develop practical and effective DTM techniques to accommodate those limitations
Those theoretical work are derived based on simplified mathematical
thermal models and idealized assumptions
Thermal-aware throughput maximization
[Chantem et al., ISLPED 2009][Zhang et al., ICCAD 2007][Chatha et al., DAC 2010]
Peak temperature minimization
[Chaturvedi et al., ASPDAC 2011][Liu et al., RTAS 2010]
[Qiu et al., ICESS 2010]
Overall energy reduction under peak temperature constraints
[Bao et al., DATE 2010][Andrei et al., DAC 2009][Huang et al., DATE 2011]
Real-time guarantee under peak temperature constraint
[Chaturvedi et al., CIT 2010][Wang et al., RTS 2006]
[Huang et al., RTSS 2009]
Thermal management validation
[SUSCOM 2012]
• DTM techniques VS air-cooling• DTM vs DPM algorithm•Fundamental DTM principles validation
Reactive DTMSingle-core
•Limitations of theoretical works• Non-constant sampling period• Thermal profiling analysis
[GreenCom 2012]
Major contributions
Practical hardware platform
• Intel i5 Quad core• Linux operating system
[SouthEast 2011]
Proactive DTM algorithmMulti-core
[DATE 2012] [ASP2012]
• Neighbor-aware temperature prediction• Algorithm for multicore with task migration
Practical Hardware Platform
CoreTemp driver
Read on-chip thermal sensor
Lm-sensors Tool
Monitor system information
Cpufreq module
12 different speed levels
Fancontrol shell script
Manually adjust fan speed
Intel i5 quad coreTemperature
capturing
SPEC Benchmark
DVFS Technique
Fan Speed Control
Computing system hardware monitoring tool
Temperature value
Fan Speed Voltage
value
Fan control
DVFS technique
DVFS techniqueDVFS technique
Power measurem
ent
Task migration
CPU_affinity module
Migrate process between cores
Dell Precision T1500 workstation
Linux kernel version of 2.6.23
SPEC CPU2000 Benchmark
Integers and floating point operations
Fluke current clamp, Multimeter
Cooling/ CPU power consumption
Our Approach
Enhanced reactive DTM (ERDTM)
Build up a temperature vs. speed lookup table Run benchmarks with different speed
levels Collect corresponding peak
temperatures
Offline thermal profiling analysis
Buffer zone and safe region
Buffer zone:
Safe region:
Time
Temperature
Safe region
Buffer zoneTsafe
TTURESHOLD
is maximum possible temperature
increment 4oC
Experimental results
Four identical tasks assigned to four cores to simulate single-core environment
Temperature threshold is 55oC Construct the lookup table offline
Frequency lookup table
Experiment setup
FSDTM algorithm VS-DTM algorithm ERDTM algorithm
Number of violations
87 Number of violations
12Number of violations
0
DTM algorithm Performance evaluation
galgel ammp lucas equake vpr gcc parser crafty0.96
0.98
1
1.02
1.04
1.06
1.08
1.1FSDTM VS-DTM ERDTM
SPEC CPU2000 Benchmark
Thro
ughp
ut (%
)
ERDTM average throughput improvement is 8.1%
Neighbor-aware temperature prediction
Our Neighbor-aware prediction
where and are weights, which are obtained by collecting training data
Obtained offline
Individual increment factor
Processor temperature increment
Neighbor increment factor
Heat transfer from neighbor processor
Training process
Apply least-square estimation
Run the tasks and record temperature information
Neighbor-aware Task Migration
Always migrate task from hottest core to
the coolest core.
Conventional approach:
NADTM Algorithm
Predict thermal emergency
Migrate task
DVFS technique
Heat factor: to evaluate the processor hotness
Increasing factor: to evaluate the temperature increment
Our migration strategy
choose the migration candidate with the minimum
Performance analysis
Single task Multiple task
NADTM algorithm can effectively control the temperature under the threshold
It has a small temperature oscillation of 1oC
An average of 3.6% overall throughput
improvement
An average of 5.8% overall throughput
improvement
Thank You for Your Attention !
Journals
Peer Reviewed Conferences
1. Guanglei Liu, M. Fan, G. Quan, M. Qiu “On-Line Predictive Thermal Management under Peak Temperature Constraints for Practical Multi-core Platforms”, Journal of Low Power Electronics (ASP). (under review), 2012.
2. Guanglei Liu, G. Quan, M. Qiu “Practical Dynamic Thermal Management on An Intel Desktop Computer ” , Embedded Software Design, Journal of Sustainable Computing (SUSCOM) (under review), 2012.
3. H. Huang, V. Chaturvedi, Guanglei Liu, G. Quan, ”Leakage Aware Scheduling On Maximum Temperature Minimization For Periodic Hard Real-Time Systems”, Journal of Low Power Electronics (ASP), 2012.
1. Guanglei Liu, M. Fan, G. Quan, “Neighbor-Aware Dynamic Thermal Management for Multi-core Platform”, The 15th Design, Automation, and Test in Europe (DATE 2012), Dresden, Germany, March 12-16, 2012.
2. Guanglei Liu, G. Quan, M. Qiu, “The Practical On-line Scheduling for Throughput Maximization on Intel Desktop Platform under the Maximum Temperature Constraint“, The 2011 IEEE/ACM Green Computing and Communications (GreenCom 2011), Sichuan, China, August 4-5, 2011.
3. Guanglei Liu, G. Quan, ”Thermal Aware Scheduling on an Intel Desktop Computer,” IEEE SouthEast Conference (SouthEast 2011), Nashville, Tennessee, March 17-20, 2011.
4. Guanglei Liu, J. Fan, “Framework for Statistical Analysis of Homogeneous Multi- core Power Grid Networks“, IEEE 8th International Conference on ASIC (ASICON 2009), Changsha, China, October 20-23, 2009.
5. C. Liu, J. Tan, R. Chen, Guanglei Liu, J. Fan, “Thermal Aware Clocktree Optimization in Nanometer VLSI Systems Considering Temperature Variations“, IEEE 40th Southeastern Symposium on System Theory (SSST 2008), New Orleans, LA, March 17-18, 2008.