a variation‑aware adaptive fuzzy control system for ... variation...a variation-aware adaptive...

14
This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore. A Variation‑Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Cui, Yingnan; Zhang, Wei; He, Bingsheng 2017 Cui, Y., Zhang, W., & He, B. (2017). A Variation‑Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(2), 683‑695. https://hdl.handle.net/10356/83221 https://doi.org/10.1109/TVLSI.2016.2596338 © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/TVLSI.2016.2596338]. Downloaded on 20 May 2021 06:23:23 SGT

Upload: others

Post on 19-Jan-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg)Nanyang Technological University, Singapore.

A Variation‑Aware Adaptive Fuzzy Control Systemfor Thermal Management of Microprocessors

Cui, Yingnan; Zhang, Wei; He, Bingsheng

2017

Cui, Y., Zhang, W., & He, B. (2017). A Variation‑Aware Adaptive Fuzzy Control System forThermal Management of Microprocessors. IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems, 25(2), 683‑695.

https://hdl.handle.net/10356/83221

https://doi.org/10.1109/TVLSI.2016.2596338

© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must beobtained for all other uses, in any current or future media, includingreprinting/republishing this material for advertising or promotional purposes, creating newcollective works, for resale or redistribution to servers or lists, or reuse of any copyrightedcomponent of this work in other works. The published version is available at:[http://dx.doi.org/10.1109/TVLSI.2016.2596338].

Downloaded on 20 May 2021 06:23:23 SGT

Page 2: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

A Variation-Aware Adaptive Fuzzy Control Systemfor Thermal Management of Microprocessors

Yingnan Cui, Wei Zhang, Bingsheng He

Abstract—Thermal failures pose severe threats to the reliabilityand the performance of modern microprocessors. To deal withthe threats, various kinds of thermal management techniques areproposed. For systems with strict temperature constraints, closed-loop thermal controllers are preferred. Because they have theadvantages of high control accuracy and high speed of response.However, most closed-loop thermal controllers proposed in pre-vious studies are sensitive to the variations of the system model,which are inevitable due to process variation, sensor noises andenvironmental influences. In this paper, we propose an adaptivefuzzy controller for the thermal management of microprocessors.The adaptive fuzzy controller has low sensitivity to noises andhigh adaptivity to system model variations. In the experiments,our adaptive fuzzy controller maintains the control quality whenfaced with severe variations of the system model. It achieves upto 18.5% better performance and up to 14.2% longer lifespan ofthe microprocessors when compared with other state-of-the-artthermal controllers.

I. INTRODUCTION

As the semiconductor technology steps into deep-sub-micron era, power density of microprocessors has nearlyreached a prohibitive level [1]. High power density has causedmany problematic issues, and high processor temperature isone of the most severe problems. Due to the physical natureof semiconductor devices, high temperature could cause manythermal-related failures to processors [2], [3], [4]. As a result,efficient thermal management techniques are highly demandedby modern microprocessors.

Among various thermal management techinques, dynamicvoltage and frequency scaling (DVFS) is widely adopted. TheDVFS-based techniques scale down the voltage and frequencylevel of a microprocessor during thermal emergencies toprevent the processor from being over-heated. In the viewof control theory, most DVFS-based thermal managementtechniques [5], [6], [7] can be categorized as open-loop controlsystems. However, the open-loop control systems suffer fromthree major disadvantages. First, the speed of response tothermal emergencies is slow. As a result, the processorswork under high temperature for longer time, which harmsthe reliability and lifespan of the microprocessors. Second,the temperature of processors with open-loop thermal controlsystems fluctuates severely, which also could cause malfunc-tion of processors [4]. Finally, open-loop thermal controlsystems cannot guarantee the upper bound of peak temperatureof processors. In order to avoid thermal-failures, open-loopthermal control systems usually set large margins betweenthe theoretical threshold temperature and the real thresholdtemperature applied in the processors. With the tighter thermalconstraint, the performance of the processors becomes worse.

To achieve higher control quality, closed-loop DVFS-basedthermal control systems are proposed. Guaranteed by controltheory, closed-loop thermal control systems provide highertemperature control accuracy, faster speed of response and lesstemperature fluctuation, when compared to the open-loop ther-mal control systems. The first closed-loop thermal controllerfor microprocessors is proposed in [8], which adopts the PIDcontroller design. As one of the most classic designs in controltheory, the PID controller has low complexity and is feasiblefor a wide range of systems. [9] proposes an optimal controllerdesign for the thermal management of microprocessors. Theoptimal controller achieves optimal performance of a processorwith a temperature constraint.

However, the PID controller and the optimal controller areboth sensitive to the variations of the thermal model of theprocessors. When the thermal model of the real processordeviates from the norminal model used during the designphase, the performance of the thermal controller could besignificantly affected. In reality, the variations of the thermalmodel of a processor is inevitable and unpredictable. Thereare three major factors that lead to the variations of thethermal model. Firstly, CMOS thermal sensors implementedwithin the processors usually have a considerable amount ofnoises [10]. Secondly, in the deep-sub-micron era, the processvariation brings uncertainty to the dynamic and leakage powermodel of a processor. Finally, the ambient temperature of thesurronding envrionment also affects the heat dissipation speedof a processor.

Due to the variations of the thermal model, it is difficultto guarantee the performance of a thermal controller whena model-sensitive design isadpoted. However, this challengecould be solved by using the fuzzy controller. Firstly, thedesign of a fuzzy controller does not rely on the thermal modelof a system. Secondly, according to control theory, the fuzzycontroller has higher robustness in nature. In other words,the fuzzy controller is less sensitive to the variations of thethermal model compared to the PID controller and the optimalcontroller. Thirdly, the complexity of the fuzzy controller ismuch lower than other kinds of advanced controllers with highrobustness and adaptiveness, like the neural-network-based orthe Kalmen-filter-based controllers. Coskun et al. [11] firstapply the fuzzy controller in a 3D-integrated processor tocontrol the temperature of the processor through DVFS andmicro-fluid cooling. This study does not consider the variationsof the thermal model and the fuzzy controller proposed in thestudy does not have adaptivity.

To further deal with the variations of the thermal model,we propose an adaptive fuzzy controller for the thermal

Page 3: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

2

T Ta

QR

C

x

P

0°K

t

Fig. 1: The thermodynamic model for general solid materials

management of microprocessors. In our design, the parametersof the controller are automatically adjusted according to thefeedback of the temperature measurement and best fit thevariantions of the thermal model caused by sensor noises,process variation and the environment change. To our bestknowledge, our work is the first adaptive fuzzy controller forthermal management of microprocessors. In the experiments,our adaptive fuzzy controller maintains the control qualitywhen faced with severe variations of the system model. Itachieves up to 18.5% better performance and up to 14.2%longer lifespan of the microprocessors when compared withother state-of-the-art thermal controllers.

The rest of this paper is organized as follows: Section IIdiscusses the necessity of adopting fuzzy controller. Section IIIgives a panoramic view of the fuzzy control system. Section IVgives the detailed description of our adaptive fuzzy controller.Section V extends the use of the adaptive fuzzy controller intomulti-core processors. Section VI evaluates the performanceof our fuzzy controller and compares our design with otherstate-of-the-art thermal controllers. Section VII introduces therelated studies of this work. Finally, Section VIII concludesthis paper.

II. MOTIVATION

In this section, we briefly introduce the thermal model,power model and process variation model for microprocessors.Based on these models, we build a model for the wholethermal control system of the microprocessors. We then showhow the various kinds of variations of the thermal modelaffects the performance of the controller. Finally, we explainthe reasons of adopting the adaptive fuzzy controller in thiswork.

A. Thermal Model for Microprocessors

The basic equations to describe thermodynamics of solidmaterials share the same mathematical forms with the equa-tions of electrical circuits. Based on this duality, the thermalmodel of solid materials can be visualized as an equivalentcircuit. Fig. 1 shows the thermal model of a small cube ofmaterials in the form of an circuit. For ease of discussion,we assume that in the cube all the surfaces are heat-insulate,except for the surface labeled as x. In other words, the heatcan only passes through surface x. Inside the cube, there existsa source which produces heat inside the cube. The heat sourceis modeled by the current source in Fig. 1. The heat generatedby the source in unit time is denoted as P . Besides, the

PCB

Package

Heatsink

Processor die

Heat spreader

0°K

T

Ta

Fig. 2: Side view of a microprocessor and its thermal model.

thermal resistance of the cube is denoted by the resistor R andthe thermal capacitance is denoted by the capacitor C in thecircuit. According to thermodynamics theory, the speed of theheat flowing through a surface is affected by the temperaturedifference between two sides of the surface. We assume thetemperature of the cube is T , and the temperature of theambient environment is Ta. With the above assumptions, thethermodynamics equation of the cube of materials is definedby Eq. (1).

P =T − TaR

+ C · dTdt

(1)

From Eq. (1), we can see the heat generated inside a cubehas two destinations. One part of the heat passes throughsurface x and dissipates into the ambient environment. Theamount of the heat dissipated in unit time is defined by thefirst term in Eq. (1). The rest of the heat is absorbed by thematerials in the cube which leads to the temperature rising ofthe cube. The second term in Eq. (1) defines the relationshipbetween the amount of heat absorbed by the cube and thetemperature rising speed. Assume in Fig. 1, the area of surfacex is A, and the thickness of the cube is t. We can computethe thermal resistance and thermal capacitance of the cubeaccording to Eq. (2) and Eq. (3) respectively, where k isthe thermal resistivity of the materials, and c is the thermalcapacity. Both k and c are defined by the physical nature ofthe materials.

R =t

k ·A(2)

C = c · t ·A (3)

Based on Eq. (1) to Eq. (3), we can build the thermalmodel for a microprocessor. Fig. 2 shows the side view ofa microprocessor with its package. The die of the processoris surrounded by its package. Within the package, there isa metal-made heat spreader that is directly attached to thedie to faciliate the heat dissipation. Outside the chip, thereis a heatsink to further accelerate heat dissipation speed. Forease of discussion, we assume the heat produced by themicroprocessor can only be dissipated through the heatsinkinto the air. Based on this assumption we build a thermalmodel for the chip which is represented by the RC circuitin Fig. 2.

Page 4: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

3

B. Power Model

In a thermal control system, the controller adjusts the powerconsumption of the microprocessor by changing the voltagelevel. Therefore, it is necessary to understand the relationshipbetween the voltage level and the power consumption of theprocessor. We note that in a processor, the functional circuitsand memory cells have different power models. Again, forthe ease of discussion, we only introduce the power model forfunctional circuits. The power of an IC chip contains two parts:the dynamic power and the static power. The dynamic poweris the power consumed by the switching of the gates. Thestatic power is mainly the power consumed by the leakagesof the circuit.

Eq. (4) shows the definition of dynamic power of a chip(denoted by Pd) [12], where C is the capacitance of the loadthat the circuit drives, Vdd is the supply power of the circuit,f is the clock frequency of the circuit, and α is the activityfactor that describes the average ratio of gate switches inthe whole circuit. When a microprocessor is at work, α canvary according to the different types of instructions that theprocessor executes.

Pd = α · C · V 2dd · f (4)

The static power of a microprocessor contains three kinds ofleakage power: the subthreshold leakage, the gate leakage andthe junction leakage. The gate leakage and junction leakage arenegligible when compared to the subthreshold leakage [12].Therefore, we only discuss the subthreshold leakage in thisstudy. According to the model proposed in [13], Eq. (5) definesthe subthreshold leakage of a integrated circuit, denoted byPsub. In the equation, N is the number of gates in thecircuit and Isub is the average subthreshold leakage currentper gate. The definition of the average subthreshold leakagecurrent Isub is defined in Eq. (6) and Eq. (7), where Vgs,Vsb and Vds are the gate-source, source-bulk and drain-sourcevoltages respectively, VT is the zero-biased threshold voltage,Vth is the thermal voltage kT/q, γ′ is the linearized body-effect coefficient, η is the drain induced barrier lowering(DIBL) coefficient, µ0 is the carrier mobility, Cox is gatecapacitance per area, W is the width of the gate and Leff

is the effective length of the gate. According to the quations,the average subthreshold leakage current per gate is affectedby the supply voltage Vdd, the chip temperature T and thegeometery characteristics of the CMOS gates.

Psub = N · Isub · Vdd (5)

Isub = A · e(Vgs−VT−γ

′Vsb+ηVds)nVth (1 − e

−VdsVth ) (6)

A = µ0 · Cox · W

Leff· V 2

th · e1.8 (7)

C. Process Variations

Fig. 3 shows the cross-section view of a nMOS transistor.The geometry parameters of a gate differ from the designedvalues due to the process variation. The variations in factors

source drain

p-substrate

VddGND

L

tox

n+ diffusion

oxide

metalNsub

Gate

Fig. 3: Cross-section view of nMOS transistor.

Controller

Dynamic power

Thermal model

Static power

+++-

N

TTth

Ta

Vdd

Pd

Psub

P

Process variation

Fig. 4: The block diagram for the thermal control system.

like gate length (L), gate oxide thickness (tox) and channeldoping (Nsub) affect the electronic characteristics of the tran-sistor. As the feature size of the CMOS circuits shrinks, theinfluence of the process variation is becoming more and moreimportant.

Due to short-channel effects, the leakage power of a transis-tor is most sensitive to process variation. In studies like [14],the distribution of the geometery parameters of the transistorsis usually assumed to be Gaussian. The average change inthe leakage power could be evaluated through Monte Carlosimulations using Hspice. Experimental results in [3] showthat when 10% of variation is assumed for the three parameters(L, tox and Nsub), the average subthreshold leakage current(Isub) changes up to 31.1%.

The statistical model for process variation is too compli-cated for analysing and experiments in this study. Therefore,we simply denote the influence of the process variation on thesubthreshold leakage power using a parameter δ. We assumethe subthreshold leakage power of a microprocessor is δPsub,where Psub is the nominal subthreshold leakage power withoutthe process variation.

D. Control System Model

The block diagram of a closed-loop thermal control systemfor a microprocessor is shown in Fig. 4. The machanism ofthe system is as follows. First we set a threshold tempera-ture for the microprocessor, denoted as Tth. The controllerthen decides the supply voltage level Vdd according to thedifference between the system temperature, denoted by T ,and the threshold temperature. The voltage level determinesthe power consumption of the processor, including both thedynamic power and the static power. Finally, according to thethermal model shown in Section II-A, the power consumptiondetermines the temperature of the processor. According toEq. (6), the temperature also affects the leakage power ofthe processor. This relationship is shown by the inner loopbetween the static power and the thermal model in Fig. 4. InFig. 4, the variations of the model are also identified. Firstly,the temperture sensor of the processor contains a noise signal,denoted by N . Secondly, the process variation has significant

Page 5: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

4

yet unpredictable influence on the static power. Finally, theambient temperature Ta may vary due to environment change.

The system shown in Fig. 4 is a non-linear system, whichmakes it difficult to analyze within limited pages. For ease ofdiscussion, we linearize the system model. The nonlinearityof the model comes from the power model. According toEq. (4), the dynamic power model is linearized as follows. Weuse a new variable X to denote the output of the controller,where X = V 2

dd. In Fig. 4, the dynamic power model blockbecomes a simple proportional module, where the proportionalparameter is denoted by K1. For the static power, we linearizethe model by applying first order taylor expansion on Eq. (6)around the voltage level V0. The result of the taylor expansionis shown by Eq. (8). We note that Psub is dependent ontemperature. But in a closed-loop thermal control system, thetemperature of the processor is stable around the thresholdtemperature. Therefore, we assume the temperature is a con-stant in the leakage power model. For modern processors, Vddis usually smaller than 1V [1], it is reasonable to set V0 = 0in the taylor expansion. According to Eq. (6), when Vdd = 0,there is Isub = 0. Then, by substituting the Isub in Eq. (5)by Eq. (8), we get Eq. (9). By using the variable X to denotethe controller output, we transform the static power model toanother proportional module with the proportional parameterdenoted by K2.

Isub = Isub(V0) + I ′sub(V0) · Vdd (8)

Psub = N · I ′sub(0) · V 2dd (9)

After the linearization, we apply the Laplace transformationto the system model and get a new system model shown inFig. 5. In control theory, by applying the Laplace tranformationon the system model, we get the so-called transfer functionmodel. The convience of using the transfer function model isthat we can build the system model by directly multiplyingthe tranfer functions of each block along the signal passingdirection in the block diagram of the system. For instance, theclosed-loop system model of Fig. 5 is as follows:

(Tth(s) − T (s)) ·H(s) · (K1 +K2) ·G(s) = T (s) (10)

We can further transfrom this function into the followingequation:

Tth(s)

T (s)=

(K1 +K2) ·H(s) ·G(s)

1 + (K1 +K2) ·H(s) ·G(s)(11)

The transfer function describes the characteristics of a systemin the frequency field. The variable s in the transfer functionrepresents the frequency level of a signal. A well-performingand stable closed-loop thermal control system should meet thefollowing two standards. Firstly, the system should be able tomaintain the temperature of the system around the thresholdtemperature. Secondly, the temperature of the system shouldnot be sensitive to the noises in the system. To achieve thegoals, we want the transfer function of the system to havethe following features. Firstly, when s is small, we shouldmake (K1 +K2) ·H(s) ·G(s) � 1. Because when we applyLaplace tranformation to the input signal of the system, which

H(s) G(s)++

N

T(s)Tth(s)X(s)

Pd

Psub

P

Process variation

K1

K2controller Thermal model

+-

Fig. 5: The transfer function model for the thermal controlsystem.

is the threshold temperature, we can find that the input singalis mainly composed of singals with low frequency levels.According to Eq. (11), when (K1 + K2) ·H(s) · G(s) � 1,Tth(s)/T (s) ≈ 1. That means the output temperature is equalto the input singal (the threshold temperature). Secondly, whens is large, we should make (K1 +K2) ·H(s) ·G(s) � 1. Thisis because the noises usually contains more high frequencysignals, and we do not want the temperature of the system totrace these high frequency signals.

Based on the above discussion, we can deduce how the vari-ation in thermal model affects the performance of the system.From Fig. 5, we can see that among the three major sourcesof variations, the process variation has directly changes thetransfer function of the system. Therefore, we first discuss theinfluence of the process variation.

We assume that for the system without the variations, aperfectly designed thermal controller has the transfer functiondenoted by H(s). The process variation changes the parameterK2 in Fig. 5 to a new value K ′2. The influence of the changein the parameter K2 is as follows. According to the previousdiscussion, if K ′2 > K2, the system becomes more sensitiveto noises compared to the original system. And of K ′2 < K2,the system becomes less accurate in tracing the input signalof the system.

According to previous studies, in 65nm or smaller technol-ogy, the static power of a chip is nearly equal to the dynamicpower [15]. Which means in the model shown in Fig. 5,there is K1 ≈ K2. According to previous studies [16], theprocess variation can lead to the changing of K2 by more than50%. In that case, the controller designed without consideringvariations in the system model can lead to malfunction in realprocessors.

The influence of the variation in ambient temperature andthe sensor noises can be explained as follows. In the systemmodel, we now assume the output of the system is T − Ta.This assumption does not change the tranfer function of thesytem because in Eq. (1), the derivative of T −Ta is the sameas T . With the new assumption, our system model changes toFig. 6. In the new model, the input of the system is Tth −Ta.It means that the output of the system must trace Tth−Ta. Weassume that the variation of Ta is denoted by ∆Ta. Adding∆Ta with the sensor noise N , we get a combined noise signal∆Ta + N . The influence of this combined noise signal tothe system depends on the sensitivity of the system model tonoises. When the system model changes due to the process

Page 6: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

5

System model

+-

T - Ta

Tth - Ta

N

Fig. 6: The system model with T − Ta as output.

FuzzificationInference

engineDefuzzification

Fuzzy logic

rules

Membership

functions

Input Output

Fig. 7: The standard structure of a fuzzy controller.

variation, the influence of this noise signal also changes.The above discussions are all based on the linearized system

model. In reality, the nonlinearity of the system model couldworsen the case. Therefore, a adaptive controller is requiredto deal with the variations of the system model. In addition,since the variations of the system model is nearly impossible tobe accurately measured during run-time. It makes the model-based controllers like the PID controller and the optimalcontroller very difficult to design. The design of the fuzzycontroller does not require an accurate system model. As aresult, an adaptive fuzzy controller could be an ideal solutionto the microprocessors with various sources of variations.

III. PRELIMINARIES OF FUZZY CONTROLLERS

The fuzzy control theory is based on the fuzzy set theorywhich is proposed by L. A. Zadeh in 1965. As the word”fuzzy” indicates, the fuzzy set theory is designed to depictproblems with vagueness where traditional mathematical the-ories are difficult to be applied. Based on the fuzzy set theory,the fuzzy controller imitates the reasoning process of a human.

Fig. 7 shows the standard structure of the fuzzy controller.The fuzzy controller is composed of three major modules: thefuzzification module, the inference engine and the defuzzi-fication module. The fuzzification module converts the inputvariables into the so-called linguistic variables, which are sentinto the inference engine. The inference engine decides theoutput of the controller according to the fuzzy logic rulesand the membership functions. The output of the inferenceengine is also a linguistic variable. The defuzzification moduleconverts the output linguistic variable into a numeric variable,which is the final output value of the fuzzy controller.

In a fuzzy controller, all the operations are related to thelinguistic variables. As indicated by the name, the values ofa linguistic variable are a group of words which are used todescribe amount or size, like “High” , “Low”, and etc. For alinguitic variable, each of its possible value is assigned witha membership function, which shows the level of accuracy ofusing that value to describe the related numeric variable. Forexample, assume there is a linguistic variable denoted by A.A is used to evaluate the value of a numerical variable α.

0

1

α

Low High

α1 α2 α3

0.5

Membership

Fig. 8: The membership functions for linguistic variable A.

0

1

α

Low High

α*

Membership

0

1

β

Small Big

Membership

Fig. 9: Inference process of a fuzzy controller.

Assume A has two linguistic values: “High” and “Low”. Fig.8 shows the membership functions of the linguistic variableA. For each possible value of α, the membership functiondefines the membership value of α for the related value of thelinguistic variable. When α < α1, the membership of α for“High” is 0 and the membership for “Low” is 1. In this case,we can say that α is 100% “High” and 0% “Low”. And whenα = α2, we can say that α is 50% “High” as well as 50%“Low”.

The nature of the linguistic variable makes it capable ofdescribing things with vagueness. For instance, when we talkabout the temperature in our lifes, we do not have an accuratestandard for defining high temperature and low temperature.In extream cases like when temperature is over 40◦C or under0◦C, we have the clear notions of high and low. But forthe temperature between this range, the definition of highand low becomes vague. The linguistic variable quantifiesthis vagueness through the membership function and thusprovides a tool for converting human thinking process intomathematical operations.

After the fuzzification module converts the input of thefuzzy controller into a linguistic variable, the inference enginethen decides the output of the fuzzy controller. The inferenceprocess is based on the fuzzy logic rules, which are a groupof conditional statements defining the output of the controllerunder different input values. The input and output are bothrepresented by linguistic variables. For example, assume ina fuzzy controller, the input linguistic variable is A and theoutput linguistic variable is B. The possible value of A is“High” or “Low” and the possible value of B is “Big” or“Small”. Then a possible set of fuzzy logic rules of thiscontroller could be as follows:• If A is “High”, then B is “Big”.• If A is “Low”, then B is “Small”.For ease of discussion, we continue to use this example

to introduce the inference engine of the fuzzy controllers.Assume at a time, the input of the fuzzy controller α has thevalue α∗. The fuzzification module converts the input α∗ to alinguistic variable. The values of the membership functions of

Page 7: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

6

Processor

system

Fuzzy

controller

YT FYth-

Processor

system

Thermal

sensor

Fig. 10: The structure of the self-adaptive fuzzy control systemfor thermal management of microprocessors.

the linguistic variable are denoted by MHigh(α∗) (for “High”)and MLow(α∗) (for “Low”). Based on the fuzzy logic rulesthe inference engine changes the membership functions of Bas defined in Eq. (12) and Eq. (13). In Eq. (12), M∗Big(β) isthe membership function of value “Big” after inference andMBig(β) is the original membership function. Similarly, inEq. (13), M∗Small(β) is the new membership function and theMSmall(β) is the original one. Fig. 9 shows the inferenceprocess. The new membership functions of B is representedby the shaded areas.

M∗Big(β) =

{MBig(β), MBig(β) < MHigh(α∗)

MHigh(α∗), MBig(β) > MHigh(α∗)(12)

M∗Small(β) =

{MSmall(β), MSmall(β) < MLow(α∗)

MLow(α∗), MBig(β) > MLow(α∗)(13)

The output of the inference engine is the linguistic vari-able B with the new membership functions M∗Big(β) andM∗Small(β). The defuzzification module converts B into anumberic value, which is the final output of the fuzzy con-troller. A commonly used defuzzification method is the centerof gravity method (COG), which use the horizontal coordinateof the center of gravity of the shaded area shown in Fig. 9 asthe output of the fuzzy controller. Assume the outline of theshaded area is defined by a function U(β), then the output ofthe fuzzy controller,β∗, given by COG method is defined in

β∗ =

∫U(β) · βdβ∫U(β)dβ

(14)

IV. SELF-ADAPTIVE FUZZY CONTROLLER DESIGN

As discussed in Section II-A, we have three major reasonsto use adaptive fuzzy controller in this paper. Firstly, factorslike temperature sensor noises, the process variation and theambient temperature makes it difficult to build an accuratethermal model for the microprocessor. Compared with model-depending controllers like the PID controller and the optimalcontroller, the fuzzy controller does not rely on the model ofthe system. Secondly, the fuzzy controller has high robustness.It is less sensitive to the noises in the system than manyother kinds of controllers. Finally, the variations of the thermalmodel of microprocessors require adaptivity for the thermalcontrollers.

Fig. 10 shows the structure of the closed-loop thermalcontrol system with the proposed adaptive fuzzy controller.The closed-loop system works as follows. Assume the currentprocessor temperature is Y . The measured temperature bythermal sensor, denoted as Ym, is compared with the threshold

temperature, denoted by Yth. According to the difference ofYm and Yth, denoted as T , the fuzzy controller decides thefrequency level of the processor, denoted as F . The DVFScontrol module regulates the frequency and voltage level ofthe processor according to F . This changes the power profileof the mircoprocessor and therefore controls the temperatureof the chip.

In the process, the adaptive module keeps track of T andYm to detect the variations in the output responses of themicroprocessors. By analyzing T and Ym, the self-adaptivemodule decides how to set the parameters in the fuzzy con-troller to adapt to the variations in thermal related parametersof the microprocessor. In designing the self-adaptive controlsystem, we have two objectives. First, the control qualityshould be guaranteed; and second, the complexity of thecontroller should be minimized. The following two sectionsrespectively introduce our design of the fuzzy controller andits self-adaptive modules.

A. Fuzzy Controller DesignThe basic structure of the fuzzy controller is introduced in

Section III. We design the fuzzy controller in the followingorder. Firstly, we decide the input and output of the controller.Secondly, we design the linguistic variables and the relatedmembership functions for the input and the output of thecontroller. Third, we design the fuzzy logic rules. Finally, weselect the best parameters for the fuzzy controller.

As shown in Fig. 10, the difference between the measuredprocessor temperature and the temperature threshold T , shouldbe an input of the fuzzy controller. Besides, we also use thetemperature changing speed as another input for the fuzzycontroller, denoted by dT . dT reflects the trend in the changingof the temperature of the processor. Using dT as an additionalinput, the controller could predict the temperature change andtake proactive moves to avoid thermal emergencies. The outputof the fuzzy controller is the frequency level of the processor,denoted by F . F is normalized to maximum frequency levelof the system.

The linguistic variables and the related membership func-tions of the fuzzy controller is shown in Fig. 11. Eachlinguistic variable is assigned with three values. For the inputlinguistic variables, three linguistic variables are the minimumnumber of linguistic values that cover all the fundamentalsituations. Increasing the number of values for each linguisticvariable could significantly increase the complexity of the sys-tem. For the membership functions of the linguistic variables,we adopt the symmetric triangular function, which is oneof the most commonly used kind of membership functions.The membership functions of the input and the output of thecontroller is shown in Fig. 11. F ∗ in Fig. 11a and dF ∗ inFig. 11b can be adjusted to achieve better performance ofthe system. Fmin in Fig. 11c is the minimal frequency levelof the processor. Table I shows the fuzzy logic rules for thecontroller, which are designed by the “common sense” ofpeople. For the defuzzification module, we adopt the COGdefuzzification method as mentioned in Section III.

In searching the optimal parameters of the fuzzy controllerT ∗ and dT ∗, we adopt the random search methods. This is

Page 8: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

7

T0

Membership

1

T*-T*

HighLow

Normal

(a) T

dT0

Membership

1

dT*-dT*

NegativePositive

Zero

(b) dT

F

Membership

1

1Fmin

HighLowMedium

0

(c) F

Fig. 11: Membership functions of the linguistic variables.

TABLE I: The fuzzy logic rules for our fuzzy controller.

If ThenT is dT is F isLow Negative HighLow Zero HighLow Positive Medium

Normal Negative HighNormal Zero MediumNormal Positive Low

High Negative MediumHigh Zero LowHigh Positive Low

because the non-linear feature of fuzzy control systems makesit impossible to optimize the parameters of the fuzzy controlleranalytically. First we set the searching ranges for T ∗ and dT ∗.The searching range of T ∗ is set to [0, Tmax]. Where Tmax

is the highest temperature overflow compared to the thresholdtemperature allowed by the system. Tmax is usually a designobjective of the thermal controller. The searching range ofdT ∗ is set to [0, dTmax]. Where dTmax represents the highestpossible temperature rising speed in a processor. We use theMonte Carlo methods to search the optimal parameters in thesearching range. The objective of the random search is tominimize the overflow of the processor temperature over thethreshold temperature.

B. Self-Adaptive Module Design

As mentioned in Section I, the objective of the adaptivemodule of the fuzzy controller is to make the controlleradaptive to the variations of the thermal model. To achievethe goal, the adaptive control theory provides two differentkinds of adaptive modules [17]. Shown in Fig. 12a, the firstkind of adaptive modules analyze the variations in systemmodel and adjust controllers to eliminate the influence ofthe variations. This kind of adaptive modules contain modelidentifiers which are too complicated to be implemented inmicroprocessors. Shown in Fig. 12b, the second kind ofadaptive modules compare the output of the system with anideal output produced by a reference model, and adjust thecontroller to make the system output to trace the ideal output.

Fuzzy controller System

Adaptivemodule

OutputInput

(a) The model-identifier-based adaptive module.

Fuzzy controller System

OutputInput

Adaptivemodule

Modelidentifier

(b) The reference-model-based adaptive module.

Fig. 12: The structures of adaptive modules.

Reference

model

AdaptorFuzzy inverse

model

Processor

system

Fuzzy

controllerY

TFYth

U

ΔF

-

-

ΔY

Fig. 13: The block diagram of the adaptive fuzzy controller.

The reference model mentioned here is the model of an idealsystem with makes the closed-loop control system stable andaccurate. This kind of adaptive modules are called modelreference adaptive (MRA) modules.

Fig. 13 shows the structure of the MRA-based adaptivefuzzy controller. The mechanism of the MRA module is asfollows. Based on the same input of the fuzzy thermal controlsystem, the reference model produces the ideal output for thethermal control system to trace. Then the difference of thereal temperature of the system and the ideal output of thereference module is sent to a fuzzy inverse model which usefuzzy logics to decide the additional amount of frequency thatshould be adjusted to the processor to eliminates the differencebetween the real output of the system and the ideal output ofthe reference model. Finally, the adapter applies the additionalamount of the frequency to the output of the fuzzy controllerby adjusting the parameters of the fuzzy controller.

In the design, a good reference model is critical for theperformance of the adaptive fuzzy control system. As shownin Fig. 13, the input of the reference model is the differencebetween system temperature and the threshold temperature,denoted as T . And the ideal output of the reference modelis denoted by U . As discussed in Section II-D, the transferfunction of an ideal system, denoted by G(s), should meetthe condition. When s� 1, there is G(s) � 1 and when s�1, there is G(s) � 1. Eq. (15) defines the transfer functionfor the reference model. We use this reference model for twomain reasons. Firstly, as long as α� 1, the transfer functionmeets the requirement mentioned above. Secondly, the G(s)defined in Eq. (15) is similar to the thermal model mentionedin Section II, which makes it easy for the controller to tracethe ideal output of the reference model.

T (s)

U(s)= G(s) =

1

s+ α(15)

The transfer function shows the frequency response ofthe system. Implementing the reference model based on thetransfer function is not straightforward. We convert the transferfunction into a differential equation to faciliate the implemen-tation of the reference model. Eq. (16) shows the differentialequation.

T = U ′ + α · U (16)

The fuzzy inverse model decides the amount of the outputof the fuzzy controller that should be adjusted based on ∆Y .For the sake of system complexity, we also set three values

Page 9: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

8

ΔY0

Membership

1

ΔY*-ΔY*

PositiveNegative

Zero

(a) Input (∆Y )

0

Membership

1

ΔF ΔF *-ΔF*

PositiveNegative

Zero

(b) Output (∆F )

Fig. 14: The linguistic variables and the related membershipfunctions for the fuzzy inverse model.

TABLE II: Fuzzy logic rules of fuzzy inverse model.

If ∆Y is Then ∆F isPositive Negative

Zero ZeroNegative Positive

for each linguistic variable in the fuzzy inverse model. Thelinguistic variables and the related membership functions aredefined in Fig. 14. As mention in Section IV-A, we also useMonte Carlo method to find the best parameters for ∆Y ∗ and∆F ∗. As we all know, when ∆Y > 0, F should be reducedin order to reduce the temperature of the system to reduce∆Y , and vice versa. Therefore, the fuzzy logic rules for thefuzzy inverse model can be easily designed. The completeset of fuzzy logic rules for the fuzzy inverse model is shownin Table II. The defuzzification module of the fuzzy inversemodel also adopts the COG method.

The adapter changes the parameters in the fuzzy controllerto apply the ∆F decided by the fuzzy inverse model. For thesake of efficiency, the adapter should not directly add ∆F tothe output of the fuzzy controller. If ∆F is directly appliedto the output of the fuzzy controller, in the next cycle ∆Ybecomes smaller, and so does ∆F . But since the output F ofthe fuzzy controller is unchanged, it causes ∆Y to rise again inthe cycle that follows. In this way, the adaptive module alwaysamends the output of the fuzzy controller. However, if ∆Y isrealized by changing the parameters in the fuzzy controller,after the system becomes stable, the fuzzy controller doesnot require further adjustment. In such case, the system couldshutdown the adaptive module and save considerable amountof time and energy. For the adapter to apply the change, theeasiest way is to horizontally shift the membership functionsof the linguistic variables of the output F . Fig. 15 shows theprocess of shifting membership functions of F activated bythe adapter.

The complexity of the fuzzy controller is O(n), where ndenotes the number of fuzzy logic rules. Since the number offuzzy logic rules are usually fixed, the complexity of the fuzzycontroller can be viewed as O(1). The complexity of the self-

F

Membership

1

1Fmin

HighLowMedium

0 F

Membership

1+ΔFFmin+ ΔF

HighLowMedium

0

Fig. 15: Membership function adjusted by the adapter.

Algorithm 1 Coolest first scheduling algorithmInput: ready tasks, the cores in the processorOutput: the schedule

1: if a time slack terminates then2: sort the cores according to current temperature.3: sort ready tasks according to power consumption.4: repeat5: pick the task with highest power6: schedule the task to the coolest core7: until all the cores are occupied8: end if

adaptive module is also O(1). Although the reference modelcontains a differential equation, with efficient discrete algo-rithms like Runge-Kutta methods, the computing complexityis negligible.

V. EXTENSION TO MULTI-CORE PROCESSORS

The previous section discusses the design of the adaptivefuzzy controller in a unicore processor. When used in multi-core processors, the advantages of adaptive fuzzy controllerbecome more significant. Firstly, in a multi-core processor, thethermal model of each core is coupled with each other due tothe inter-chip heat transfer. It makes the thermal model of amulti-core processor difficult to build compared to the single-core processor case. In addition, in multi-core processors withthe asymmetrical floorplan, the thermal model of each coredepends on its location in the die. Only controllers withadaptive abilities to deal with the situation. Secondly, the in-dieprocess variation worsens the situation for building an accuratethermal model for the multi-core processor. It further enlargesthe difference in the thermal model of each individual core.

The increase in the number of cores in the system bringshigher flexibility in thermal control. For multi-core processors,thermal-aware scheduling is an efficient technique to reducethe peak temperature of the system [18]. Since our fuzzycontroller is a core-throttling thermal management technique,it should be combined with an efficient thermal-aware schedul-ing algorithm to achieve higher performance. When combinedwith a thermal-aware scheduling algorithm, the potential pos-sibility of thermal emergencies significantly reduces, whichleads to less switches of the voltage and frequency level ofthe system.

An efficient thermal-aware scheduling algorithm shouldbalance the power consumption on the processor both spatiallyand temporally. In this study, the thermal-aware schedulingalgorithm we selected to work cooperatively with the adaptivefuzzy controller is the coolest first algorithm [19]. The pseudocode of the algorithm is shown in Alg. 1. The algorithm aimsat scheduling the task with highest power consumption to theprocessor with coolest temperature. The effectiveness of thealgorithm in reducing the number of thermal emergencies hasbeen proved in the previous study [19]. We note that the powerconsumption of the tasks could either be acquired off-linethrough profiling or be measured on-line using performancecounters.

Page 10: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

9

VI. EVALUATION

A. Experiment Setup

In the experiments, we use HotSpot 5.02 as our thermal sim-ulation platform for the microprocessor [20]. The experimentsare carried out on both single-core and multi-core processors.The single-core processor is a 45nm ARM11 processor. Themulti-core processor is a 45nm ARM Cortex-A7 processorwith four cores. The DVFS module is assumed to be ableto adjust the frequency from 0.9GHz to 1.5GHz. We usetwo benchmarks in the experiments. The first benchmark isa endless loop of floating point computation, which is usedto test the step response of the closed-loop control systembecause the power consumption of is stable. The secondbenchmark is the SPEC CPU2006 suite, which is used to testthe performance of system real-world applications. We usethe GEM5 + MCPAT simulation platform to collect the powertrace of the benchmarks [21], [22] .

In the experiments, three different kinds of thermal con-trollers are adopted for comparision. First, we use a simplethreshold-trigger DVFS controller as the baseline of the exper-iments. Second, we adopt a PID controller which is proposedin [23]. The PID controller is the most commonly adoptedcontrollers in control systems. Finally, we adopt a state-of-the-art thermal controller design from [9]. The controlleris designed using the optimal control theory, therefore itrefered to as the optimal controller. For the three types ofcontrollers, the duration of one control cycle is set to 10 ms.The temperature traces of the processors without any thermalcontrol methods are aslo shown in the experiments.

In the experiments, we set the variation of the system forthree different sources. The first source of variation is thenoises of the temperature sensors. The noise singals containedby the output of the temperature sensor is set to be a whitenoise with a amplitude of 0.5◦C. This setting is based onthe senser noise model proposed in [10]. The second sourceof variation is the ambient temperature. We set the differentambient temperature levels in different experiments. The thirdsource of variation is the process variation of the processors.As discussed in Section II-C, with process variation, theleakage power can be computed by δPsub, where Psub is thesubthreshold leakage power of the processor without processvariation and δ is a factor defined by the process variation. Forthe single core processor, we set δ = 1.5, which is within thereasonal range according to [16]. For the multi-core processor,we set δ for the four cores using a normal distribution withmean of 1 and standard deviation of 0.3. This is also basedon the discussion in [16].

B. Step Response

In control theories, step response is a commonly used metricto examine the control qualities of control systems. The stepresponse of a control system is defined as the output of thesystem under the input of a step function. Since the input of thesystem is very simple, the step response clearly shows the na-tive characterisitics of the control system. In the experiments,the single core processor is tested. The benchmark is thefirst benchmark mentioned in Section VI-A which produces

60

65

70

75

80

85

0 50 100 150 200 250 300 350 400

Tem

per

atu

re (

°C)

Time (ms)

None

Baseline

PID

Optimal

Fuzzy

Fig. 16: Step responses in ideal situation.

a stable power consumption. The initial temperture of theprocessor is set to be 60◦C. The threshold temperature ofthe processor is set to 75◦C. The ambient temperature of thesystem is set to 25◦C.

In the first experiment, we tested the performance of thecontrol system without the variation of the thermal model. 16shows the step responses of different control systems underthe ideal situation. We can see that without thermal control,the processor temperature reaches a stable level of around 82◦.Among the four controllers, the performance of the baselinecontroller is the worst. The temperature of the processorfluctuates periodically around the threshold temprature. This isbecause the baseline controller considers too little informationto decide the control output. It only reacts when the tem-perature is above the threshold. The PID controller, which isdesigned according to the system model, achieves much betterresult. However, when compared to the optimal controller andthe adaptive fuzzy controller, the output of the PID controllershows higher overflow above the threshold and more severefluctuation. This is because the structure of the PID controlleris too simple and limits its ability to adjust the systemcharacteristics. With more advanced structures, the optimalcontroller and our adaptive fuzzy controller both achieves goodperformance in temperature control. But when the temperatureis under the threshold, the temperature controlled by theoptimal controller rises faster than our fuzzy controller. Thismeans with the optimal controller, the processor runs fasterthan with the fuzzy controller, which further results in betterperformances. According to [9], the optimal controller isable to optimize the performance of the processor with atemperature constraint. The experimental results also supportsthis point.

In the second experiment, we test the tolerance of the controlsystems to the sensor error. Fig. 17 shows the temperaturetraces of the processors with different controllers after temper-ature becomes relatively stable. It is obvious that the baselinemethods is seriously affected by the sensor error. This isbecause the control decision of the baseline controller solelyrelies on the output of the temperature sensor and there isno method to identify the noise signals from the output. Forthe other three controllers, the PID controller shows highestsensitivity to the noises. This is also due to the simplicity inthe structure of the PID controller. The optimal controller andthe fuzzy controller both show high tolerance to the sensornoises.

Page 11: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

10

73

73.5

74

74.5

75

75.5

76

76.5

77

100 150 200 250 300 350

Tem

per

atu

re (

°C)

Time (ms)

Threshold Baseline PID Optimal Fuzzy

Fig. 17: Step responses with temperature sensor error.

60

65

70

75

80

85

90

0 50 100 150 200 250 300 350 400

Tem

per

atu

re (

°C)

Time (ms)

None

Baseline

PID

Optimal

Fuzzy

Fig. 18: Step responses in parameter variation.

Finally, we test the performance of the controllers underprocess variation and ambient temperature change. First, weset the ambient temperature as 35◦C, which represents aoutdoor environment in summer time. Second, we set the δ ofthe subthreshold leakage power model as 1.5. Fig. 18 showsthe related temperature traces. Due to the rise of the ambienttemperature and the leakage power, the processor temperaturereaches a higher level than in the previous experiments. Firstwe can see that the temperature trace of the baseline controlleris not significantly affected. This is because the change ofsystem characteristics does not affect the control policy of thebaseline controller. The only difference is that with increasedleakage power, the temperature rises faster under the highvoltage level and decrease slower under the low voltage level.Totally, the baseline controller achieves a slightly higher aver-age temperture than in the previous experiment. Second, theresult of PID controller shows a steady error when comparedto the threshold. According to Section II-D, the rise in leakagepower makes the control system more sensitive to noises.Refering to Fig. 6, the ambient temperature acts as a noiseof the input signal of the system. When the two factors arecombined together, the output of the PID controller deviatesfrom the pre-defined threshold for around 2.5◦C. Third, theresult of optimal controller fluctuates severely. This is becausethe optimal controller is structured based on a more delicatesystem model [9] and the variation of the system model have asignificant affect on the performance of the controller. Finally,the adaptive fuzzy controller achieves a nearly unchangedpower trace compared to the previous experiment. The resultshows that our design have a good adaptivity to the variationof the system model.

C. Real Applications

We test the performance of the single core ARM processorwith different kinds of thermal controllers under real-world

workloads using the SPEC CPU2006 benchmark suite. In theexperiments, The initial temperture of the processor is set tobe 60◦C. The threshold temperature of the processor is set to75◦C. The ambient temperature of the system is set to 25◦C.

First we test the performance of the processor without thevariations. Table III shows the execution time of the bench-marks in the processors using different kinds of thermal con-trollers. The performance of the processor without any thermalcontrol methods is always the best since the benchmarks areexecuted with the highest voltage level. For different kinds ofthermal controllers, the average increases ofn the executiontime of the benchmarks are 27.5% (baseline), 19.5% (PID),8.1% (optimal) and 11.7% (fuzzy) respectively. The baselinecontroller results in the worst performance due to its contin-uous switching of the voltage levels. The optimal controllerachieves best performance among the thermal controllers dueto its optimization due to its performance-aware optimization.The fuzzy controller outperforms the PID controller becausethe PID controller leads to a more fluctuating temperature tracewhich results in more switches of the voltage level.

TABLE III: Execution times of the benchmarks (withoutvariations).

Bench. Execution time (s)None Baseline PID Optimal Fuzzy

400 1040.76 1307.41 1248.25 1085.14 1148.16401 1353.56 1726.88 1552.42 1416.57 1540.00403 905.22 1144.55 1059.74 959.1 1006.15429 3386.09 4203.11 4030.73 3826.21 3765.57445 260.29 310.87 309.79 274.03 293.09456 997.97 1283.68 1157.42 1113.99 1153.32458 9406.80 12948.49 11615.81 9992.4 10350.70462 111.64 133.74 132.60 120.45 127.10464 67426.00 84066.01 81638.28 78398.56 72916.57471 1153.36 1591.83 1390.07 1262.79 1331.50473 16433.83 21378.5 20314.06 17878.81 18050.46483 276.77 356.62 334.71 287.86 301.07

Table IV shows the execution time of the benchmarks withvariations. In the experiments, the three sources of variationsare combined together. The execution time of the processorwithout thermal controllers keeps the same. We note that pro-cess variation also affects the execution time of the processor,but to focus our discussion on thermal control, we do notconsider this influence in this study. For different kinds ofthermal controllers, the average increases of the execution timeof the benchmarks are 29.9% (baseline), 36.1% (PID), 28.5%(optimal) and 17.6% (fuzzy), respectively. When comparedwith the results in Table III, the baseline controller is theleast affected by the variations. As discussed in Section VI-B,this is because the model change does not affect the decisionmaking procedure of the baseline controller. The increasein the execution time mostly comes from the rising of theleakage power of the processor. The PID controller and theoptimal controller are both severely affected by the variationsof the thermal model. Due to their lack of adaptivity, thesetwo controllers cannot perform as well as designed. Ourfuzzy controller, on the other hand, is capable of adaptingto the variations of the system model and achieve the bestperformance among the four kinds of c ontrollers.

As mentioned in Section I, the control quality of the thermal

Page 12: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

11

TABLE IV: Execution times of the benchmarks (with varia-tions).

Bench. Execution time (s)None Baseline PID Optimal Fuzzy

400 1040.76 1357.87 1485.72 1480.46 1227.49401 1353.56 1737.82 2036.16 1766.68 1596.54403 905.22 1146.14 1137.28 1165.38 1077.21429 3386.09 4219.54 5077.18 4134.8 3837.48445 260.29 315.8 328.639 336.823 295.12456 997.97 1399.32 1257.29 1174.41 1188.32458 9406.80 13165.7 14871.2 13749.8 11231.41462 111.64 137.08 141.672 136.312 118.60464 67426.00 84852.01 86681.9 81374.5 86110.65471 1153.36 1608.89 1371.62 1539.46 1344.37473 16433.83 21461.41 26156.7 22614.3 21465.20483 276.77 357.5 335.663 355.728 305.24

0%

20%

40%

60%

80%

100%

None Baseline PID Optimal Fuzzy

Fig. 19: Normalized lifespan of the processors.

controllers not only affects the performance of the system butalso affects the reliability of th processors. High temperatureand severe temperature fluctuation both pose negative effectson the lifespan of the processors. According to the reliabilitymodel proposed in [24], we estimate the lifespan of the proces-sors with different thermal control solutions. In the estimation,the temperature traces collected from the experiments withprocess variations is used. Fig. 19 shows the lifespan of theprocessors with different thermal controllers. The lifespan isnormalized to the ideal lifespan of the processor which isresulted from working under a stable temperature level underthe threshold. From the figure we can see that our adaptivefuzzy controller achieve longest lifespan when compared toother kinds of controllers. This is because the temperaturetraces achieved by our adaptive fuzzy controller is more stablethan the other controllers. With process variation, the controlquality of the optimal controller is more severely damaged andtherefore results in the shortest lifespan among the three kindsof controllers.

D. Multi-core Processors

In the end, we evaluates the performance of our adap-tive fuzzy controller on multi-core processors. The processormodel used in the simulation is the ARM Cortex 7 processorwith four individual cores in one chip. We assume each corehas an individual thermal control unit. In the simulation,we run all the benchmarks on the processor. The thermal-aware scheduling algorithm mentioned in Section V is used toschedule the tasks on the processor. The three sources of thevariation are all applied to the processor.

Fig. 20 shows the execution time of the benchmarks on theprocessors with different types of thermal controllers. Similar

26

28

30

32

None Baseline PID Optimal fuzzy

Exec

uti

on

tim

e (1

03s)

Fig. 20: Execution time of the benchmarks (ARM Cortex 7).

like the experimental results on the single-core processor. Ouradaptive fuzzy controller outperforms the PID controller andthe Optimal controller in execution time by up to 6.18%.When comparing to the result of the single core processor, thereduction in execution time achieved by our fuzzy controllerhas decreased. This is because of the following two reasons.Firstly, the area of the processor has increased due to theintegration of multiple cores, which makes the heat easier todissipate. Secondly, the use of the thermal-aware schedulingalgorithms efficiently improves the thermal condition of thecores. The two reasons combined results in a more balancedpower consumption on the processor and therefore reduces theroom of temperature management for the thermal controllers.Therefore the advantages of our adaptive fuzzy controllerbecomes less obvious on the multi-core processor.

VII. RELATED WORK

Previous works mainly adopted DVFS based techniquesand/or task scheduling to manage the processor tempera-ture of CMP. DVFS techniques control the temperature ofmicroprocessors through dynamically adjusting the voltageand frequency levels [8], [6], [5], [25]. Most DVFS-basedDTM systems [6], [5], [25] can be categorized as open-loopcontrol systems, because in such works, the decision makingprocesses are usually triggered by thermal emergencies. Open-loop control systems have three major disadvantages. First,the respond speed to thermal emergencies is slow. As aresult, the processors will work under high temperature forlonger time, which harms the reliability and lifespan of themicroprocessors. Second, the temperatures of processors withopen-loop thermal control systems fluctuate severely, whichalso could cause malfunction of processors [4].

To achieve higher control qualities, closed-loop DVFS-based thermal control systems have been proposed. Based oncontrol theory, closed-loop thermal control systems providehigher temperature accuracy, fast response time and significantreduction to temperature fluctuations. In [8], a PID controllerwas adopted to control the DVFS module. As one of the mostclassic designs in control theory, PID controllers have lowcomplexity and are feasible for most kinds of systems. Fu et al.[23] proposed an improved design of PID controller. Wang etal. adopted optimal control theory and proposed a closed-loopoptimal controller targeting thermal and power control [9]. Theoptimal controller provides theoretical optimal control qualityin a multi-objective optimization problem, however with highcomplexity. In [11], Sabry et al. proposed fuzzy controller for

Page 13: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

12

thermal control with microfluidic cooling. Fuzzy controllershave the advantages of good control qualities, low complexity,and high endurance to errors and system parameter variations.

Thermal-aware task scheduling is another widely used ap-proach to address the thermal challenges in microproces-sors [26], [18], [27]. The fundamental idea is to assign tasksto the cores in such a manner that overheating of cores can beavoided. For instance, Hung et al. [26] proposed several fastheuristics to minimize the run-time peak temperature in multi-core systems. Coskun et al. [18] proposed an integer linearprogramming (ILP) based algorithm to reduce the numberof thermal hotspots on a chip through workload distribution.However, both thermal-aware DVFS and task scheduling basedapproaches have their own limitations. Pure DVFS-basedthermal management may suffer from performance degrada-tion and significant time and energy overheads under highworkload. On the other hand, the temperature control qualityof pure scheduling algorithms is less effective compared to theDVFS-based solutions.

Thus, hybrid solutions are proposed in order to take the ad-vantage of both techniques [28], [29], [6], [30], [31], [32]. Forinstance, Liu et al. [28] used clustering algorithm to solve thescheduling and voltage assignment problem for task graphs.Bao et al. [29] used a heuristic to distribute idle slacks amongperiodic tasks to reduce the peak temperature. Hanumaiah etal. [6] formulated the thermal management problem of multi-core processors into a non-linear optimization problem andproposed optimal task scheduling and voltage scaling policies.Kumar et al. [30] first adopted the hybrid methodology indynamic thermal management. The proactive task and voltageassignment was according to a fast regression-based thermalmodel. Rao et al. [31] proposed a new processor speedmodel and optimized the processors performance based on themodel in a thermal-aware system with DVFS. Ma et el. [32]proposed an application assignment algorithm to optimize theperformance of multi-core processors with the optimal thermalcontroller.

However, due to aggressive scaling of technology, theincreasing process variations have posed serious challenges tothe traditional thermal management solutions. Process varia-tions coupled with environmental variations like ambient tem-perature fluctuations, increasing noise in sensor temperaturereadings, etc. result in highly unpredictable system, forcingpessimistic design decisions with larger power and thermalmargins compromising performance and energy efficiency[33], [34], [35]. As a result, the above discussed thermal-aware scheduling and DVFS techniques often fail to controlrising temperature since despite the existing variations, thesetechniques treat all processing cores as the same with nominalvalues [34]. Hence, the impact of process variations on thepower, performance and temperature of multi-core processorsneeds to be carefully investigated. Though, the studies onprocess variations can be dated back to the 1970s [34],serious consideration to its challenges began only when thedevice scaling reached 90nm and beyond as the manufacturingprocesses lacks controllability with miniaturizing of devicesizes [33], [34]. Moreover, with more transistors integrated ona single chip, the chip is susceptible to larger variations due to

added environmental effects like ambient temperature, signalnoises due to proximity effects etc. [16]. Process variationsmake a deterministic system into statistical or probabilisticin nature, which requires statistical models to describe howtransistor parameters vary within a die.

On the thermal aspects, Kursun et al. [36] first studied theinfluence of process variations on temperature profile of amulti-core processor, and proposed a sensor network whichidentifies the difference in the power consumption of eachprocessor. Finally, they proposed a variation-aware thermalscheduling algorithm which takes the difference in powerconsumption of each core into consideration. Most recently,Tavana et al. combined DVFS and task mapping to achieveenergy efficiency using simulated annealing solution for re-ducing energy delay product. Paterna et al. [37] analyzed theinfluence of ambient temperature variation on the thermal man-agement solutions for embedded processors in smartphones.These approaches all require the measurement or modeling ofprocess variation at design time, which significantly limits theflexibility and incurs extra complexity and cost.

VIII. CONCLUSION

In this paper, we propose an adaptive fuzzy controllerfor DVFS-based dynamic thermal management of micropro-cessors. Based on the fuzzy logic and the adaptive controltheory, the controller has the benefits of high control quality,low sensitivity to system noises and the variations of systemmodel. Our adaptive fuzzy controller achieves up to 18.5%better performance and up to 14.2% longer lifespan of themicroprocessors when compared with other state-of-the-artthermal controllers.

REFERENCES

[1] A. B. Kahng, “The itrs design technology and system drivers roadmap:Process and status,” in Proceedings of the 50th Annual Design Automa-tion Conference, ser. DAC ’13, 2013, pp. 34:1–34:6.

[2] B. Li, L.-S. Peh, and P. Patra, “Impact of process and temperaturevariations on network-on-chip design exploration.” in NOCS. IEEEComputer Society, 2008, pp. 117–126.

[3] J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers, “The impact oftechnology scaling on lifetime reliability,” in Dependable Systems andNetworks, 2004 International Conference on. IEEE, 2004, pp. 177–186.

[4] J. S. S. T. Association et al., “Failure mechanisms and models forsemiconductor devices,” JEDEC Publication JEP122-B, 2003.

[5] V. Hanumaiah and S. B. K. Vrudhula, “Temperature-aware dvfs for hardreal-time applications on multicore processors.” IEEE Trans. Computers,vol. 61, no. 10, pp. 1484–1494, 2012.

[6] V. Hanumaiah, S. B. K. Vrudhula, and K. S. Chatha, “Performance opti-mal online dvfs and task migration techniques for thermally constrainedmulti-core processors.” IEEE Trans. on CAD of Integrated Circuits andSystems, vol. 30, no. 11, pp. 1677–1690, 2011.

[7] J. Lee and N. S. Kim, “Analyzing potential throughput improvement ofpower- and thermal-constrained multicore processors by exploiting dvfsand pcpg.” IEEE Trans. VLSI Syst., vol. 20, no. 2, pp. 225–235, 2012.

[8] K. Skadron, T. F. Abdelzaher, and M. R. Stan, “Control-theoretictechniques and thermal-rc modeling for accurate and localized dy-namic thermal management,” in Proceedings of the Eighth InternationalSymposium on High-Performance Computer Architecture (HPCA’02),Boston, Massachusettes, USA, February 2-6, 2002, 2002, pp. 17–28.

[9] Y. Wang, K. Ma, and X. Wang, “Temperature-constrained power controlfor chip multiprocessors with online model estimation.” in ISCA. ACM,2009, pp. 314–324.

[10] A. Bakker and J. H. Huijsing, High-accuracy CMOS smart temperaturesensors. Springer, 2000, vol. 595.

Page 14: A Variation‑Aware Adaptive Fuzzy Control System for ... Variation...A Variation-Aware Adaptive Fuzzy Control System for Thermal Management of Microprocessors Yingnan Cui, Wei Zhang,

13

[11] M. M. Sabry, A. K. Coskun, and D. Atienza, “Fuzzy control forenforcing energy efficiency in high-performance 3d systems.” in ICCAD.IEEE, 2010, pp. 642–648.

[12] N. H. Weste and D. M. Harris, Integrated circuit design. Pearson, 2011.[13] A. P. Chandrakasan, W. J. Bowhill, and F. Fox, Design of high-

performance microprocessor circuits. Wiley-IEEE press, 2000.[14] A. Srivastava, R. Bai, D. Blaauw, and D. Sylvester, “Modeling and

analysis of leakage power considering within-die process variations,”in Proceedings of the 2002 international symposium on Low powerelectronics and design. ACM, 2002, pp. 64–67.

[15] H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif, “Full chip leakageestimation considering power supply and temperature variations,” in Pro-ceedings of the 2003 international symposium on Low power electronicsand design. ACM, 2003, pp. 78–83.

[16] K. Agarwal and S. Nassif, “Characterizing process variation in nanome-ter cmos,” in Design Automation Conference, 2007. DAC’07. 44thACM/IEEE. IEEE, 2007, pp. 396–399.

[17] L. R. M. M. K. A. Landau, I.D., Apative Control. Springer-Verlag,2011.

[18] A. K. Coskun, T. S. Rosing, K. A. Whisnant, and K. C. Gross,“Temperature-aware mpsoc scheduling for reducing hot spots and gra-dients,” in Proceedings of the 2008 Asia and South Pacific DesignAutomation Conference. IEEE Computer Society Press, 2008, pp. 49–54.

[19] K. Stavrou and P. Trancoso, “Thermal-aware scheduling: a solutionfor future chip multiprocessors thermal problems,” in Digital SystemDesign: Architectures, Methods and Tools, 2006. DSD 2006. 9th EU-ROMICRO Conference on. IEEE, 2006, pp. 123–126.

[20] K. Skadron and et al., “Temperature-aware microarchitecture,” in ISCA,2003, pp. 2–13.

[21] e. e. Binkert, “The gem5 simulator,” SIGARCH Comput. Archit. News,vol. 39, no. 2, pp. 1–7, Aug. 2011.

[22] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P.Jouppi, “Mcpat: an integrated power, area, and timing modeling frame-work for multicore and manycore architectures.” in MICRO. ACM,2009, pp. 469–480.

[23] Y. Fu, N. Kottenstette, C. Lu, and X. D. Koutsoukos, “Feedback thermalcontrol of real-time systems on multicore processors.” in EMSOFT.ACM, 2012, pp. 113–122.

[24] O. Semenov, A. Vassighi, and M. Sachdev, “Impact of self-heating effecton long-term reliability and performance degradation in cmos circuits,”Device and Materials Reliability, IEEE Transactions on, vol. 6, no. 1,pp. 17–27, 2006.

[25] H. Jung and M. Pedram, “Stochastic dynamic thermal management: Amarkovian decision-based approach,” in Computer Design, 2006. ICCD2006. International Conference on. IEEE, 2007, pp. 452–457.

[26] W.-L. Hung, Y. Xie, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin,“Thermal-aware task allocation and scheduling for embedded systems,”in Proceedings of the conference on Design, Automation and Test inEurope-Volume 2. IEEE Computer Society, 2005, pp. 898–899.

[27] A. K. Coskun, T. S. Rosing, and K. Whisnant, “Temperature aware taskscheduling in mpsocs,” in Proceedings of the conference on Design,automation and test in Europe. EDA Consortium, 2007, pp. 1659–1664.

[28] Y. Liu, Y. Yang, and J. Hu, “Clustering-based simultaneous task andvoltage scheduling for noc systems,” in Proceedings of the InternationalConference on Computer-Aided Design. IEEE Press, 2010, pp. 277–283.

[29] M. Bao, A. Andrei, P. Eles, and Z. Peng, “Temperature-aware idletime distribution for energy optimization with dynamic voltage scaling,”in Proceedings of the Conference on Design, Automation and Test inEurope. European Design and Automation Association, 2010, pp. 21–26.

[30] A. Kumar, L. Shang, L.-S. Peh, and N. K. Jha, “Hybdtm: a coordinatedhardware-software approach for dynamic thermal management,” in Pro-ceedings of the 43rd annual Design Automation Conference. ACM,2006, pp. 548–553.

[31] R. Rao and S. Vrudhula, “Efficient online computation of core speedsto maximize the throughput of thermally constrained multi-core proces-sors,” in Proceedings of the 2008 IEEE/ACM International Conferenceon Computer-Aided Design. IEEE Press, 2008, pp. 537–542.

[32] K. Ma, X. Li, M. Chen, and X. Wang, “Scalable power control formany-core architectures running multi-threaded applications,” in ACMSIGARCH Computer Architecture News, vol. 39, no. 3. ACM, 2011,pp. 449–460.

[33] S. Herbert and D. Marculescu, “Variation-aware dynamic volt-age/frequency scaling,” in High Performance Computer Architecture,

2009. HPCA 2009. IEEE 15th International Symposium on. IEEE,2009, pp. 301–312.

[34] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De,“Parameter variations and impact on circuits and microarchitecture,” inProceedings of the 40th annual Design Automation Conference. ACM,2003, pp. 338–342.

[35] W. Schemmert and G. Zimmer, “Threshold-voltage sensitivity of ion-implanted mos transistors due to process variations,” Electronics letters,vol. 10, no. 9, pp. 151–152, 1974.

[36] E. Kursun and C.-Y. Cher, “Temperature variation characterization andthermal management of multicore architectures,” IEEE micro, vol. 29,no. 1, pp. 0116–126, 2009.

[37] F. Paterna, J. Zanotelli, and T. S. Rosing, “Ambient variation-tolerantand inter components aware thermal management for mobile systemon chips,” in Design, Automation and Test in Europe Conference andExhibition (DATE), 2014. IEEE, 2014, pp. 1–6.

Yingnan Cui Yingnan Cui received his Bachelordegree in Automation from Harbin Institute of Tech-nology (2006-2010). He is now a Ph.D. student inSchool of Computer Engineering, Nanyang Techno-logical University, Singapore (2010-). His researchtopic focuses on run-time thermal management ofmulti-core systems.

Wei Zhang Dr. Wei Zhang received her Ph.D.degree in Electrical Engineering from Princeton Uni-versity. She joins Hong Kong University of Scienceand Technology in 2013 and establishes Reconfig-urable System Lab. She was an assistant professorin School of Computer Engineering at NanyangTechnological University, Singapore (2010-2013).She is a co-investigator of Singapore-MIT Alliancefor Research and Technology and works on low-power electronics. She is a collaborator of ASTAR-UIUC Advanced Digital Sciences Center and works

on FPGA acceleration for multimedia applications.

Bingsheng He Dr. Bingsheng He received the bach-elor degree in computer sicence from Shanghai JiaoTong University (1999-2003), and the Ph.D. degreein computer science in Hong Kong University ofScience and Technology (2003-2008). Dr. He is anassistant professor in School of Computer Engineer-ing of Nanyang Technological University, Singapore.His research interests are high performance comput-ing, cloud computing, and database systems.