dynamic thermal management in modern processors
TRANSCRIPT
![Page 1: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/1.jpg)
Dynamic Thermal Management
in Modern Processors
Shervin Sharifi
PhD Candidate
CSE Department, UC San Diego
![Page 2: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/2.jpg)
Power Outlook
0.0
0.4
0.8
1.2
2001 2005 2009 2013 2017
Vdd (Volts)
Ideal
Realistic
•• VVdddd scaling will slow downscaling will slow down
•• Power will increase constantlyPower will increase constantly
•• Feature sizes decreaseFeature sizes decrease
•• Significant increase in Significant increase in
Power densitiesPower densities
0
200
400
600
800
1,000
1,200
1,400
2001 2005 2009 2013 2017
Power (Watts)
S. Borkar, "Thousand Core Chips: A Technology Perspective," DAC07 [B. [B. CharlotCharlot & K. & K. TorkiTorki, TIMA], TIMA]
![Page 3: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/3.jpg)
Temperature Induced Problems
• Thermal hot spots
– Accelerates failure mechanisms
• Exponentially dependent on temperature
(Electromigration, etc.)
– Performance loss
– Higher leakage power
• Spatial variations
– Performance mismatch,
Clock skew
– Mechanical stress
• Temporal variations
– Thermal cycling
Ajami, et al, ICCAD 01
Meterelliyoz, et al, ITC 2005
www.aitechnology.com
![Page 4: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/4.jpg)
Thermal Management
• Techniques to control the chip temperature
• Power management is not enough for thermal management
– TM techniques are concerned with thermal hotspots and temperature variations
– PM techniques usually concerned about the overall power consumption
• Off-chip : e.g. Cooling techniques
• On-chip : e.g. Temperature aware task scheduling
• Static : e.g. Temperature-aware floorplanning
• Dynamic : e.g. Thread migration
M. Santarini, EDN, Sep 2005
![Page 5: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/5.jpg)
Dynamic Thermal Management
• Low-Power design techniques
are not sufficient
• Goals of DTM– Address thermal hotspots
– Some recent techniques also address
spatial and temporal temperature variations
• Design for typical temperature instead of worst case temperature
• Achieve the highest performance under thermal constraints
• DTM techniques respond to
thermal emergencies by– Reducing the heat generation
– Distributing the heat generation
• DTM incurs overhead– Performance
– Hardware overhead
[Coskun, et al. ASP-DAC 08]
0%
20%
40%
60%
80%
100%
Load Balancing Energy-Aware Optimization
Thermally-Aware Optimization
>85 C
[80,85] C
[75,80) C
<75 C
![Page 6: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/6.jpg)
Sequence of Events in DTM
Trigger
Reached
Turn
DTM OnCheck
TempCheck
Temp
Turn
DTM Off
temperature
Maximum Temperature without DTM
Maximum Temperature
with DTM
Reduction in
Max. Temp.
DTM Trigger
Level
Startup
Delay
Policy
DelayShutoff
Delay
time
D.Brooks, M. Martonosi, "Dynamic Thermal Management for High-Performance Microprocessors." HPCA01
Performance Loss
x
x
Execution Ends
![Page 7: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/7.jpg)
Classification of DTM Techniques
• Software– Implemented only in software
– Example: Temperature aware scheduling
• Hardware throttling– Global
• Hardware support only for throttling the whole chip
– Local• Hardware support for throttling parts of the chip
– Examples: Clock gating
• Hybrid– Combinations of previous classes
![Page 8: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/8.jpg)
Classification of DTM Techniques
• Software
• Hardware throttling
– Global
– Local
• Hybrid
![Page 9: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/9.jpg)
Kernel
Temperature
Sensor
Kernel
Temperature
Sensor
Temperature Aware Scheduling for a
Single-threaded Processor
• OS-level scheduler for a single processor– Process-level control
– Access to hardware statistics
• Reacts to the thermal emergencies detected by the temperature sensors
• Hot processes are identified based on their CPU activity and slowed down
E. Rohou,et al., "Dynamically Managing Processor Temperature and Power", Workshop on Feedback-Directed Optimization, 1999
![Page 10: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/10.jpg)
• Advantages
– No hardware overhead
– Only penalizes the hot processes, the rest run at full
speed
– Fine granularity in adjusting the temperature
• Disadvantages
– Limited cooling capability
– Reduces the performance for the most demanding jobs
E. Rohou, et al., "Dynamically Managing Processor Temperature and Power", Workshop on Feedback-Directed Optimization, 1999
Temperature Aware Scheduling for a
Single-threaded Processor
![Page 11: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/11.jpg)
Temperature Aware Scheduling for
Simultaneously Multithreading Processors
• Assumption– Program’s hotspot behavior is characterized
by intensity of its accesses to int and fp register files
• Approach– Picks instruction from the thread that is likely
to cool or less quickly heat the register files
• Hardware performance counters – To identify int (fp) intensive threads
– To detect thermal danger
• To keep processor temperature below 85°C – 30% performance increase compared to fetch gating
J. Donald, et al., "Leveraging Simultaneous Multithreading for Adaptive Thermal Control", Workshop on Temperature-Aware Computing Systems, 2005.
fp_reg
i f i f
int_reg
![Page 12: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/12.jpg)
• Advantages
– No hardware overhead
– Doesn’t slow down the whole processor
• Disadvantages
– Limited cooling capability
– Works only for SMT processors
– Reduced thread fairness
J. Donald, et al., "Leveraging Simultaneous Multithreading for Adaptive Thermal Control", Workshop on Temperature-Aware Computing Systems, 2005.
Temperature Aware Scheduling for
Simultaneously Multithreading Processors
![Page 13: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/13.jpg)
ThermalHerd
• DTM for on-chip networks
• Dynamically steers network traffic to avoid hotspots
– Distributed traffic throttling
– Thermal correlation based traffic routing
• Distributed traffic throttling
– Quota reduces exponentially as temperature rises
Shang et al, “Temperature Aware On-Chip Networks,” IEEE Micro 2006
![Page 14: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/14.jpg)
ThermalHerd
• Thermal correlation based routing
• Thermal correlation– The mutual thermal effect of two units
• Choose minimal path with thermal correlations to hotspot less than a threshold
• Reduced network peak temperature by 10°C
– Throughput degradation of less than 1%
– Latency overhead of less than 1.2%
Shang et al, “Temperature Aware On-Chip Networks,” IEEE Micro 2006
![Page 15: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/15.jpg)
• Advantages
– No hardware overhead
– Low performance degradation
• Disadvantages
– Specific to the chips with on-chip networks
– Computation and communication are treated
independently, which is not optimal
ThermalHerd
Shang et al, “Temperature Aware On-Chip Networks,” IEEE Micro 2006
![Page 16: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/16.jpg)
Classification of DTM Techniques
• Response Mechanism
– Software
– Hardware throttling
• Global
• Local
– Hybrid
![Page 17: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/17.jpg)
Global Clock Gating
• The clock signal to the bulk of the processor logic is stopped for a short time period
• Dynamic thermal management for Pentium4– Limited to a few microseconds
Gunther, et al. "Managing the impact of increasing microprocessor power consumption." Intel Technology Journal. 2001
Jacobson et al. HPCA 2005
![Page 18: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/18.jpg)
• Advantages
– High cooling capability
• Disables the whole processor
• Eliminates clock tree power
– Low invocation time
• Disadvantages
– Performance impact is high
• Slows down all processes
– Hardware overhead
Global Clock Gating
Gunther, et al. "Managing the impact of increasing microprocessor power consumption." Intel Technology Journal. 2001
![Page 19: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/19.jpg)
Global Dynamic Voltage and
Frequency Scaling
• DVFS for DTM
• For DTM, only two voltage steps is enough– Low voltage → Low frequency but
less time to reduce temperature
• To keep temperature below 85°C– 20-30% slowdown
Skadron, et al., "Temperature-Aware Computer Systems: Opportunities and Challenges", IEEE Micro, 2003.
Frequency Voltage
Dynamic Power : a CL VDD2f
time
frequency
![Page 20: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/20.jpg)
• Advantages
– Fast reduction of temperature
– The processor can continue running
• Disadvantages
– All processes are slowed down
– Hardware overhead
Global DVFS
![Page 21: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/21.jpg)
Classification of DTM Techniques
• Response Mechanism
– Software
– Hardware throttling
• Global
• Local
– Hybrid
![Page 22: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/22.jpg)
Fetch Gating
• Utilizes ILP to hide the performance impact
• The important decision– Duty cycle
• ILP helps only with mild duty cycles
• At more aggressive duty cycles, slowdown becomes proportional to the duty cycle
• To keep temperature below 85°C– 10-20% slowdown
Skadron, et al., "Temperature-Aware Computer Systems: Opportunities and Challenges", IEEE Micro, 2003.
![Page 23: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/23.jpg)
• Advantages
– Processor is not disabled completely, available ILP
compensates for the wasted cycles
– Low invocation time
– Hardware overhead is relatively low
• Disadvantages
– Moderate cooling capability
– Choice of optimal duty cycle is not easy
Fetch Gating
Skadron, et al., "Temperature-Aware Computer Systems: Opportunities and Challenges", IEEE Micro, 2003.
![Page 24: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/24.jpg)
Activity Migration
• Computations are migrated to spare units in cold areas of the chip
• Power density is reduced by distributing the heat generation
• Activity ping-ponging
– One unit is disabled when the other one is active
• Register file
– About 12°C Max. temperature reduction with about 2% IPC loss
Die
Activity Ping-Ponging
Original
Unit
Duplicated
Unit
Heo, et al. "Reducing Power Density through Activity Migration “, ISLPED 2003
![Page 25: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/25.jpg)
Activity Ping-ponging
Time
Temperature
T2
T1
Reduced
Temperature
Migration Period
Heo, et al. "Reducing Power Density through Activity Migration " ISLPED 2003
Die
Original
Unit
Duplicated
Unit
![Page 26: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/26.jpg)
A Thermal-Aware Superscalar
Microprocessor
• A secondary pipeline
– Architecturally simple
– Ultra low power
• Response
– Clock gates the primary pipeline
– Secondary pipeline takes over
• To keep temperature below 85°C at
least 11% improvement in energy-cpu
time product compared to global DVFS
Lim, et al. "A thermal-aware superscalar microprocessor." ISQED 2002.
![Page 27: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/27.jpg)
• Advantages
– Effective in reducing local hotspots
– Processor is not disabled completely
• Disadvantages
– Extra hardware for the redundant units
– Longer interconnects lead to higher delays and
higher power
– Overhead of copying data to the new unit
Activity Migration
![Page 28: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/28.jpg)
Classification of DTM Techniques
• Response Mechanism
– Software
– Hardware throttling
• Global
• Local
– Hybrid
![Page 29: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/29.jpg)
Hybrid Architectural DTM
+ Enough Cooling Capability
+ Low Performance Impact- High Performance Impact
- High Performance Impact + High Cooling Capability
Mild Thermal Emergencies
Fetch Gating
Global DVFS
Severe ThermalEmergencies
K. Skadron, “Hybrid architectural dynamic thermal management”, DATE 2004.
![Page 30: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/30.jpg)
Hybrid Architectural DTM
• Combines
– Local Hardware Throttling
• Fetch gating
– Global Hardware Throttling
• Global DVFS
• To keep temperature below 85°C
– 25% Reduction in DTM overhead
compared to Global DVFS
K. Skadron, “Hybrid architectural dynamic thermal management”, DATE 2004.
![Page 31: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/31.jpg)
• Advantages
– Low performance impact
– Fast temperature reduction
– Ability to adjust the response to the severity of
thermal emergency
• Disadvantages
– Design complexity
– Hardware support for both fetch gating and DVFS
Hybrid Architectural DTM
K. Skadron, “Hybrid architectural dynamic thermal management”, DATE 2004.
![Page 32: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/32.jpg)
HybDTM
• Combines – Temperature-aware Scheduling
• Mild thermal adjustments
and preventing hotspots
– Global clock gating
• Severe thermal emergencies
• Execution time overhead to keep temperature below 65°C
– 10% compared to 20% in global clock gating
Kumar et al., “HybDTM: A Coordinated Hardware-Software Approach for Dynamic Thermal Management” , DAC 2006.
Proactive DTM trigger
T > Tsw
Reactive
DTM trigger
T > Thw
Per process temperature
Overall temperature
Thermal Model
Hardware directed
DTM
Level 1
Level 3
Level 2
Software directed
DTM
Priority management
Timesliceadjustment
Scheduler
Increasing temperature
![Page 33: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/33.jpg)
• Advantages
– Low performance impact
– Fast temperature reduction
– Finely adjusts the response to the severity of
thermal emergency
• Disadvantages
– Design complexity
HybDTM
Kumar et al., “HybDTM: A Coordinated Hardware-Software Approach for Dynamic Thermal Management,” , DAC 2006.
![Page 34: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/34.jpg)
Dynamic Temperature Aware Scheduling
• For multiple processors, there is no optimal dynamic scheduling algorithm for dynamically changing tasks [Liu’73]
– Heuristics are required
• Temperature information is passed to scheduler at each interval
• Scheduler makes decisions based on temperature
– Fast and simple implementation
OS- level
Scheduler
Job A
Job B
Job C
Temperature
from
sensors
34
![Page 35: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/35.jpg)
Adaptive-Random Policy
• Goal: To balance workload, minimize &
balance temperature with low scheduling
complexity
• Updates the probability of sending workload to
a core based on its temperature history
• Pn : Probability a core receiving workload
– Evaluated at each job arrival
• W: Evaluated periodically
– Interval length: 1 sec
– W =β / Avthr
Avthr : (Avg. T below Tthr) / Tthr
Tthr : Threshold temperature
β : Empirically set constant
(system dependent)
Hot
Cool
> 80oC
[75, 80]oC
< 75oC
Pn = 0
Pn = Po
Pn = Po+ W
35Ayse K. Coskun
![Page 36: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/36.jpg)
Adaptive Random
• OS level scheduler for MPSoCs– Takes floorplan into account
– Addresses temperature variations
• Adaptive Random
• When scheduling is not enough, it is combined with thread migration or local DVFS
• Reduces the hotspots to ~1%
• >10X reduction in temporal variations
• Performance impact– With thread migration : ~3%
– With DVFS : ~7%
A. Coskun, T. S. Rosing and K. Whisnant. “Temperature Aware Task Scheduling in MPSoCs,” DATE07
▼
![Page 37: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/37.jpg)
• Advantages
– Low overhead
– Local control of temperature
– Ability to reduce temperature variations
• Disadvantages
– Design complexity
– Large hardware overhead if local per-core DVFS is
used
Adaptive Random
A. Coskun, T. S. Rosing and K. Whisnant. “Temperature Aware Task Scheduling in MPSoCs,” DATE07
![Page 38: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/38.jpg)
Migration
DecisionsMigration Controller
PerformanceRequirements
Thread-CoreThermal Trend
PI Controller
CoreThermal
System
Chip
Thermal System
Thermal
Sensors
TemperatureSetpoints
Hybrid Control-theoretic DTM for MPSoCs
• Combines
– Thread migration
• Balancing the heat and optimizing the performance
– Local per-core DVFS
• Fine-grained local adjustments
• To keep temperature below 85°C
– 2.5X instruction throughput compared to distributed clock gating
Donald et al., “Techniques for Multicore Thermal Management: Classification and New Exploration,” ISCA 2006.
![Page 39: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/39.jpg)
• Advantages
– Low performance impact
– Distributed local control of temperature
– Formal control theory allows accurate design and
provable guaranty for temperature control
• Disadvantages
– Design complexity
– High hardware overhead
Hybrid Control-theoretic DTM for MPSoCs
Donald et al., “Techniques for Multicore Thermal Management: Classification and New Exploration,” ISCA 2006.
![Page 40: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/40.jpg)
Summary
Average-
High
Low-High
Low
None
Hardware
Cost
HighAverage-
HighLow-AverageHybrid
Average-
High
Average-
HighAverage
Local
Throttling
LowHighHighGlobal
Throttling
Hardware
LowLowLowSoftware
Design
Complexity
Cooling
Capability
Performance
Impact
![Page 41: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/41.jpg)
Proactive Dynamic Thermal Management
![Page 42: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/42.jpg)
Reactive vs. Proactive
• Reactive
– Not activated until the thermal emergency
– Typically have high performance impact
• Proactive
– Prevent thermal emergencies
– Part of the normal system operation
![Page 43: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/43.jpg)
Reactive vs. Proactive Management
• Reactive
70
75
80
85
90
Time
Temperature (C) .
43Ayse K. Coskun
![Page 44: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/44.jpg)
• Reactive
• e.g., DVFS,
fetch-gating,
workload migration,
…
70
75
80
85
90
Time
Temperature (C) .
70
75
80
85
90
Time
Temperature (C) .
Forecast
Reactive vs. Proactive
Management
44Ayse K. Coskun
• Proactive
![Page 45: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/45.jpg)
• Proactive
• Reduce and balance temperature – Adjust workload,
V/f setting, etc.
70
75
80
85
90
Time
Temperature (C) .
70
75
80
85
90
Time
Temperature (C) .
T after proactive
management
Reactive vs. Proactive Management
Forecast
45
• Reactive
• e.g., DVFS,
fetch-gating,
workload migration,
…
Ayse K. Coskun
![Page 46: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/46.jpg)
Flow
Temperature Data from
Thermal Sensors
Predictor (ARMA)
Periodic ARMA
Model Validation
&
Model Update
Temperature at time (tcurrent + tn)
for all cores
SCHEDULER
Temperature-Aware
Allocation on Cores46
[Coskun, et al. ISLPED 08]
![Page 47: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/47.jpg)
Temperature Modeling and Sensing
![Page 48: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/48.jpg)
Thermal model
K. Skadron, et. al. Micro 03
• Based on the dualitybetween heat flow and electrical phenomena
• Heat flow could be considered as a currentpassing through a thermal resistance, leading to a temperature differenceanalogous to voltage
• Thermal R and Cs are calculated based on the system characteristics
(R× C) Electrical RC Constant(Rth× Cth) Thermal RC Constant
(C) Electrical Capacitance (Cth) Thermal Capacitance
(R) Electrical Resistance (Rth) Thermal Resistance
Voltage DifferenceTemperature Difference
CurrentPower
Duality between Thermal and Electrical Quantities
![Page 49: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/49.jpg)
Thermal Model
K. Skadron, et. al. ACM TACO 04, Vol. 1, No. 1
![Page 50: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/50.jpg)
Challenges in using thermal sensors
• Limitations in sensor placement
– Sensors may not be placed at locations of interest
– Routing and I/O considerations
• Sensor overhead
– Sensing circuitry
– A/D converter
• Few sensors available
– Silicon real estate
– Low mean time to failure
• Sensor noise – Inaccuracies due to power variations,
sensor degradation, etc.
• Dynamic change of hot spot locations
– Static placement can not cover all locations
![Page 51: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/51.jpg)
Maximum temperature differences
• Temperature differences were
traditionally found by extensive
simulations
• Long simulation times
• Workload dependent
• No guarantee on the maximum
found temperature differences
• Under what situations does
this maximum difference
happen?
0.75
0.50
0.25
0 0.25 0.5
Width (0.5 mm)
Height
(0.75 mm)
Contour map of temperature difference
to a location of interest on die
[Sharifi, et al., TCAD’10]
![Page 52: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/52.jpg)
Sensor placement
• Sensor overhead
• Sensor placement limitations
• Sensor placement error
– Temperature difference between the hotspot and its corresponding sensor
• Sensor placement technique
– Covering all locations of interest guaranteeing a maximum tolerable placement error
– Minimum number of sensors
X
X
X
o
X
o
o
X
o
X
o
o Sensor
X Point of interest
X
![Page 53: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/53.jpg)
X
*
X
Sensor Placement Comparison
88888886667777Circular range*
56678884555677Our technique
76543217654321Accuracy (°C)
SOC2SOC1
• Circular range
[Lee. et. al. ICCD 05]
Based on exponential
dependence of temperature
difference to the distance
from a location of interest
Potential sensor
location
Circular ranges of
Hotspots 1 and 2
in other techniques
Observability areas of
the hotspots 1 and 2
*
X Point of interest
[Sharifi, et al., TCAD’10]
![Page 54: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/54.jpg)
Indirect temperature sensing
• Runtime temperature information is needed– Few sensors available
– Sensor noise
– Dynamic change of hot spot locations
� Accurate temperature estimates are required at the points other than sensor locations
� Applications– Small number of deployed sensors due to overhead/ placement
limitations
– Operational systems with degraded/failed sensors
– Changes in the locations of interest
![Page 55: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/55.jpg)
Accurate Temperature Estimation
Thermal
Network
System error sources
Measurement error sources
Thermal
SensorsAccurate
Temperature
Estimates
Power
Consumption
Values
Kalman Filter
Correct
(Measurement
Update)
Predict
(Time Update )
Observed
Temperature
Shervin Sharifi
![Page 56: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/56.jpg)
Accurate Temperature Estimation (Cont’d)
[Sharifi, et al., TCAD’10]
![Page 57: Dynamic Thermal Management in Modern Processors](https://reader031.vdocuments.net/reader031/viewer/2022030214/621e2fdba32f5d2fbd6f9a24/html5/thumbnails/57.jpg)
Questions?