POWER MANAGEMENT AND ENERGY EFFICIENCY
2016 Operating Systems DesignEuiseong Seo ([email protected])
* Adopted “Power Management for Embedded Systems, Minsoo Ryu”
Need for Power Management
¨ Power consumption matters¨ PCs
¤ Energy cost¤ Thermal dissipation
¨ Mobile devices¤ Battery lifetime¤ Thermal dissipation
¨ Server systems¤ Energy cost¤ Electrical infrastructure¤ Power usage effectiveness
Power and Performance
¨ Power ∝ voltage2 ∗ clock¨ Clock ∝ voltage¨ Therefore, Power ∝ clock3
¨ Already server processors reached 150 Watt TDP
Power Consumers in a System
¨ Processors¤ Dominate power consumption¤ Usually consume 100 watts out of 300 watts
¨ Memory¤ Significant contributor
4Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 4Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
Power Consumption of Memory� In a server system
� Memory consumes 19% of system power on average� Some work notes up to 40% of total system power
Power Consumers in a System
¨ Storages¤ A server HDD consumes 5 to10 watts¤ A laptop HDD consumes 1 to 5 watts¤ A SATA SSD consumes 1 to 5 watts¤ An NVME consumes 30~ watts
¨ NIC¤ A 10Gbps NIC consumes 5 to 20 watts
¨ Peripherals¤ Insignificant
Idle Power Consumption
¨ Only 30% of servers in data centers are fully utilized while keeping the other 70% in idle state¤ Idle servers consume between 60% and 66% of the
peak load power consumption
5Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 5Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
Idle Power Consumption� Only 30% of servers in data centers are fully utilized
while keeping the other 70% in idle state� Idle servers consume between 60% and 66% of the peak
load power consumption
Two Dimensions on Power Management
¨ Power management when the system is idle ¤ Select the most efficient idle state
¨ Power management when the system is active ¤ Dynamically change operating frequency and/or
voltage
APM and ACPI
¨ APM (Advanced Power Management)¤ Activated when system becomes idle
n Screen saver→ sleep→ suspend
¤ Controlled by firmware (BIOS)n Need reboot for reconfiguration
¤ OS has no knowledge
¨ ACPI (Advanced Config. and Power Interfaces)¤ Controlled by OS¤ First released in 1996 by Compaq, HP, Intel and MS
ACPI
¨ Standard interface specification¤ Brings power management under the control of the
operating system ¤ The specification is central to Operating System-
directed configuration and Power Management (OSPM)
8Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 8Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
ACPI� Standard interface specification
� Brings power management under the control of the operating system
� The specification is central to Operating System-directed configuration and Power Management (OSPM)
OS Power Management
Hardware: CPU, BIOS etc.
Software driversACPI
Applications
ACPI Functions
¨ System power management ¤ The entire computer
¨ Processor power management ¤ When OS is idle but not sleeping, it puts processors in
low- power states
¨ Device power management¤ ACPI tables describe motherboard devices, their power
states, the power planes the devices are connected to
Firmware-Level ACPI Architecture
¨ Three components¤ ACPI tables
n Contain definition blocks that describe all the hardware that can be managed through ACPI
n Include both data and machine-independent byte-code n OS must have an interpreter for the AML bytecode
¤ ACPI BIOS n Performs basic management operations on the hardware n Include code to help boot the system and to put the system to
sleep or wake it up
¤ ACPI registersn A set of hardware management registers defined by the ACPI
specification
Firmware-Level ACPI Architecture
11Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 11Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
Firmware-Level ACPI Architecture
ACPI States
12Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 12Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
ACPI States
Global States
¨ G0: Working (S0) ¤ Processor power states (C-state): C0, C1, C2, C3
¨ G1: Sleeping (e.g., suspend, hibernate) ¤ Sleep State (S-state): S0, S1, S2, S3, S4
¨ G2: Soft off (S5) ¤ Almost the same as G3 Mechanical Off, except that the
power supply unit (PSU) still supplies power at a minimum ¤ Other components may remain powered so the computer
can "wake" on input from the keyboard, clock, modem, LAN, or USB device
¨ G3: Mechanical off
Processor States (C-State)
¨ Global state is G0 (working) ¨ Four processor states
¤ C0: Operating n Performance state (P-State) n P0: highest performance, highest power n P1 ~ Pn: lower performance, lower power
¤ C1: Haltn The processor is not executing instructions, but can return to an executing state
essentially instantaneously ¤ C2: Stop-Clock (optional)
n The processor maintains all software-visible state, but may take longer to wake up
¤ C3: Sleep (optional)n The processor does not need to keep its cache coherent, but maintains other
state
Processor States (C-State)
¨ Intel Pentium M at 1.6 Ghz
15Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 15Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
Performance States (P-State)
� Intel Pentium M at 1.6 GHz
Device States (D-State)
¨ The device states D0–D3 are device-dependent¤ D0: Fully On
n The operating state
¤ D1 and D2n Intermediate power-states whose definition varies by device
¤ D3: Offn The device is powered off and unresponsive to its busn D3 Hot: Aux power is providedn D3 Cold: No power provided
Sleeping States (S-State)
¨ Four sleeping states¤ S1: Power on Suspend (POS)
n All the processor caches are flushed n The power to the CPU(s) and RAM is maintained n Wakeup takes about 1 ~ 2 seconds on desktops
¤ S2: CPU powered offn Dirty cache is flushed to RAM (Often not used)
¤ S3: Suspend to RAM (STR), or Standby, Sleep n RAM remains powered n Wakeup takes about 3 ~ 5 seconds on desktops
¤ S4: Suspend to Disk (STD) or hibernationn All content of the main memory is saved to non-volatile memory
such as a hard drive, and is powered down
Dynamic Voltage and Frequency Scaling
¨ Adjusting clock speed and operating voltage dynamically
¨ Most modern processors provide¨ Low clock switching overhead¨ usually within a few µs.
Deadline
Time
Performance Deadline
Time
Performance
Four Considerations for DVFS
¨ Workload amount¤ Adjust the processor frequency depending on the load
¨ Workload characteristics ¤ Compute-intensive vs. memory-intensive
¨ Deadline constraints ¤ Lowest possible frequency for meeting deadlines
¨ Load balancing ¤ Migrate or scale?
Workload Amount and DVFS
¨ Static approaches¤ Performance policy
n CPU runs at the maximum frequency regardless of load¤ Power save policy
n CPU runs at the minimum frequency regardless of load
¨ Dynamic approaches¤ On demand policy
n Increase the clock speed to the maximum frequency when the system load goes above the predefined threshold
n Decrease the clock speed gradually when the system load becomes below the predefined threshold
¤ Conservative policyn Gracefully increase the CPU speed rather than jumping to the
maximum speed
Workload Characteristics and DVFS
¨ Two types of workload¤ Compute-intensive
n The program execution is exclusively bound to the processor ¤ Memory-intensive
n The program makes heavy access to memory n The processor would spend a significant fraction of the time
waiting for memory ¨ A simple solution
¤ High processor frequency and low memory frequency for compute-intensive load
¤ Low processor frequency and high memory frequency for memory-intensive load
CPU VS Memory-Intensive
¨ Execution time variation ¤ CPU frequency ranging from 733 MHz to 333 MHz
25Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 25Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
CPU- vs. Memory-Intensive� Execution time variation
� CPU frequency ranging from 733 MHz to 333 MHz
GPU and Memory-Intensive
¨ Compute-intensive applications¤ Dense matrix multiplication¤ Run on NVIDIA GeForce GTX 280 GPU
26Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 26Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
GPU- and Memory-Intensive� Compute-intensive applications
� Dense matrix multiplication� Run on NVIDIA GeForce GTX 280 GPU
GPU and Memory-Intensive
¨ Memory-intensive applications¤ Dense matrix transpose¤ Run on NVIDIA GeForce GTX 280 GPU
27Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 27Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
GPU- and Memory-Intensive�Memory-intensive applications
� Dense matrix transpose� Run on NVIDIA GeForce GTX 280 GPU
Load Balancing and DVFS
¨ DVFS can be independently applied to each processor on multicore hardware ¤ But this may not lead to optimal power saving from a global
point of view ¨ A simple scenario
¤ We need to decide whether to transfer a thread from processor A to an idle processor B, or increase the frequency of A
¤ Compute Pmigrate_from_A_to_B and Pincrease_A_freq
¤ Transfer if Pmigrate_from_A_to_B < Pincrease_A_freq
¤ Otherwise, increase the frequency of A
Case Study: Intel Core2 Duo E685010
TABLE IIVOLTAGE (Vcpu (V)) AND CLOCK FREQUENCY ( fCPU (GHZ)) LEVELS FOR
INTEL CORE2 DUO E6850 PROCESSOR.
DVFS level Vcpu fcpu DVFS level Vcpu fcpuLevel 1 1.30 3.074 Level 4 1.15 2.281Level 2 1.25 2.852 Level 5 1.10 1.932Level 3 1.20 2.588 Level 6 1.05 1.540
TABLE IIIMEASURED AND ANALYTICAL MODELS OF INTEL CORE2 DUO E6850
POWER CONSUMPTION.
Vcpu(V ) fcpu(GHz) Measurement (W) Analyticalmodel (W)
1.056 1.776 21.520 21.2121.080 1.888 24.000 23.9561.104 2.004 26.320 26.8561.160 2.338 33.760 34.8381.224 2.672 43.200 44.4091.280 3.006 55.440 54.236
reflects the supply voltage and frequency changes. We choosea high-end DVFS-enabled microprocessor, i.e., Intel Core2Duo E6850 processor, along with the LTC3733 3-phase syn-chronous step-down DC–DC converter that supports discon-tinuous mode, which is a representative setup of a modernhigh-performance DVFS-enabled microprocessor.
The microprocessor power consumption model is describedin (18). The parameters Ce, a1, and a2 is obtained fromactual measurements. We insert a shunt monitor circuit rightin front of the DC–DC converter of the Intel Core2 DuoE6850 processor, and measure the power supply current withan Agilent A34401 digital multimeter. We compensate theDC–DC converter efficiency from the measured current values,and characterize IO. We run PrimeZ benchmark and changeVcpu and fcpu performing direct access to the BIOS (basicinput/output system) as described in Table II because the IntelSpeedStep supports only two voltage levels. We finally derivethe following power consumption model:
Pcpu = 8.4503V 2cpu fcpu +(36.3851Vcpu �33.9503), (30)
where the units of Pcpu, Vcpu, and fcpu are W, V, and GHz,respectively. The difference between the analytical model andmeasurement results is less than 4.6% as shown in Table III.The DC–DC converter parameters are given in Table IV. Thevalues are chosen according to guidelines in datasheet andreference designs offered by the vendor.
The delay overhead of DVFS transition is given in Table V.
TABLE IVDC–DC CONVERTER PARAMETERS OF LTC3733 3-PHASE CONVERTER
FOR INTEL CORE2 DUO E6850.
Parameter Value Parameter ValueVIN 12 (V) VOUT VO in Table IIC 8840 (µF) L 1 (µH) per phaseRL 2.3 (mW) fDC 530 (kHz) per phase
max(IL) 75 (A)
TABLE VDVFS TRANSITION DELAY OVERHEAD FOR INTEL CORE2 DUO E6850PROCESSOR WITH LTC3733 CONVERTER. (Epll =5 µs AND THE NUMBER
OF CYCLES ARE CALCULATED AT f = 3.074 GHz).
Level Actual value (µs) Proposed model (µs)Tuc Total Cycles Tuc Total Cycles
2!1 4.77 9.77 30018 4.11 9.11 280113!1 12.29 17.29 53141 12.21 17.21 528903!2 5.95 10.95 33672 5.72 10.72 329504!1 22.29 27.29 83894 24.24 29.24 898944!2 14.69 19.69 60531 16.21 21.21 652014!3 7.33 12.33 37921 8.06 13.06 401505!1 34.81 39.81 122389 40.37 45.37 1394575!2 26.47 31.47 96733 31.44 36.44 1120255!3 27.33 32.33 99383 21.87 26.87 826065!4 9.43 14.43 44361 11.49 16.49 506946!1 57.68 62.68 192684 60.90 65.90 2025906!2 49.50 54.50 167525 51.65 56.65 1741576!3 31.89 36.89 113409 41.52 46.52 1429946!4 28.35 33.35 102531 30.14 35.14 1080066!5 12.14 17.14 52688 16.84 21.84 67141
Downscale 0 5 15370 0 5 15370
The value of Tpll , 5 µs, is specified in the Intel Core2 DuoE6850 datasheet. The actual values are obtained from SPICEsimulation results. We obtain TX by observing the settling timeof VO(t) from SPICE results, and substitute it into equationsin Section IV to calculate the delay overhead. The estimatedoverhead from the proposed macro model well follows thetrend of actual values. For upscaling, the delay overhead issum of underclocking-related overhead Tuc and PLL locktime loss Tpll . Unlike assumption of previous works, theunderclocking-related overhead is the dominant factor for mostcases as we have discussed in Section III. For downscaling,PLL lock time is the only delay overhead, and thus theoverhead values are the same for all cases.
The energy overhead values of a DVFS transition forcontinuous- and discontinuous-mode operations are given inTables VI and VII, respectively. For the actual value, we obtainIL(t), IO(t), and VO(t) from SPICE simulation and substitutethem into (14), (16), and (19). There is no Ecap in the Tablesas it is implied in Eir. The value of Eir for the case 1 ! 6in Table VI is large because it drains significant amount ofcharge from bulk capacitor to the ground. On the other hand,Eir for the same case in Table VII is much smaller because ituses most of the stored charge to supply the load. This resultis very different from previous models such as [6] as theysimply calculate the overhead based on the charge transfer toand from the bulk capacitor.
B. Case 2: ARM Cortex-A8 ProcessorThe second target DVFS system is the ARM Cortex-A8
processor with LTC3446 converter. ARM Cortex-A8 processoris an application processor targeting high-end mobile productssuch as smartphones, tablets, and netbooks. It exhibits powerconsumption of 600 mW at full speed. We perform a procedure
10
TABLE IIVOLTAGE (Vcpu (V)) AND CLOCK FREQUENCY ( fCPU (GHZ)) LEVELS FOR
INTEL CORE2 DUO E6850 PROCESSOR.
DVFS level Vcpu fcpu DVFS level Vcpu fcpuLevel 1 1.30 3.074 Level 4 1.15 2.281Level 2 1.25 2.852 Level 5 1.10 1.932Level 3 1.20 2.588 Level 6 1.05 1.540
TABLE IIIMEASURED AND ANALYTICAL MODELS OF INTEL CORE2 DUO E6850
POWER CONSUMPTION.
Vcpu(V ) fcpu(GHz) Measurement (W) Analyticalmodel (W)
1.056 1.776 21.520 21.2121.080 1.888 24.000 23.9561.104 2.004 26.320 26.8561.160 2.338 33.760 34.8381.224 2.672 43.200 44.4091.280 3.006 55.440 54.236
reflects the supply voltage and frequency changes. We choosea high-end DVFS-enabled microprocessor, i.e., Intel Core2Duo E6850 processor, along with the LTC3733 3-phase syn-chronous step-down DC–DC converter that supports discon-tinuous mode, which is a representative setup of a modernhigh-performance DVFS-enabled microprocessor.
The microprocessor power consumption model is describedin (18). The parameters Ce, a1, and a2 is obtained fromactual measurements. We insert a shunt monitor circuit rightin front of the DC–DC converter of the Intel Core2 DuoE6850 processor, and measure the power supply current withan Agilent A34401 digital multimeter. We compensate theDC–DC converter efficiency from the measured current values,and characterize IO. We run PrimeZ benchmark and changeVcpu and fcpu performing direct access to the BIOS (basicinput/output system) as described in Table II because the IntelSpeedStep supports only two voltage levels. We finally derivethe following power consumption model:
Pcpu = 8.4503V 2cpu fcpu +(36.3851Vcpu �33.9503), (30)
where the units of Pcpu, Vcpu, and fcpu are W, V, and GHz,respectively. The difference between the analytical model andmeasurement results is less than 4.6% as shown in Table III.The DC–DC converter parameters are given in Table IV. Thevalues are chosen according to guidelines in datasheet andreference designs offered by the vendor.
The delay overhead of DVFS transition is given in Table V.
TABLE IVDC–DC CONVERTER PARAMETERS OF LTC3733 3-PHASE CONVERTER
FOR INTEL CORE2 DUO E6850.
Parameter Value Parameter ValueVIN 12 (V) VOUT VO in Table IIC 8840 (µF) L 1 (µH) per phaseRL 2.3 (mW) fDC 530 (kHz) per phase
max(IL) 75 (A)
TABLE VDVFS TRANSITION DELAY OVERHEAD FOR INTEL CORE2 DUO E6850PROCESSOR WITH LTC3733 CONVERTER. (Epll =5 µs AND THE NUMBER
OF CYCLES ARE CALCULATED AT f = 3.074 GHz).
Level Actual value (µs) Proposed model (µs)Tuc Total Cycles Tuc Total Cycles
2!1 4.77 9.77 30018 4.11 9.11 280113!1 12.29 17.29 53141 12.21 17.21 528903!2 5.95 10.95 33672 5.72 10.72 329504!1 22.29 27.29 83894 24.24 29.24 898944!2 14.69 19.69 60531 16.21 21.21 652014!3 7.33 12.33 37921 8.06 13.06 401505!1 34.81 39.81 122389 40.37 45.37 1394575!2 26.47 31.47 96733 31.44 36.44 1120255!3 27.33 32.33 99383 21.87 26.87 826065!4 9.43 14.43 44361 11.49 16.49 506946!1 57.68 62.68 192684 60.90 65.90 2025906!2 49.50 54.50 167525 51.65 56.65 1741576!3 31.89 36.89 113409 41.52 46.52 1429946!4 28.35 33.35 102531 30.14 35.14 1080066!5 12.14 17.14 52688 16.84 21.84 67141
Downscale 0 5 15370 0 5 15370
The value of Tpll , 5 µs, is specified in the Intel Core2 DuoE6850 datasheet. The actual values are obtained from SPICEsimulation results. We obtain TX by observing the settling timeof VO(t) from SPICE results, and substitute it into equationsin Section IV to calculate the delay overhead. The estimatedoverhead from the proposed macro model well follows thetrend of actual values. For upscaling, the delay overhead issum of underclocking-related overhead Tuc and PLL locktime loss Tpll . Unlike assumption of previous works, theunderclocking-related overhead is the dominant factor for mostcases as we have discussed in Section III. For downscaling,PLL lock time is the only delay overhead, and thus theoverhead values are the same for all cases.
The energy overhead values of a DVFS transition forcontinuous- and discontinuous-mode operations are given inTables VI and VII, respectively. For the actual value, we obtainIL(t), IO(t), and VO(t) from SPICE simulation and substitutethem into (14), (16), and (19). There is no Ecap in the Tablesas it is implied in Eir. The value of Eir for the case 1 ! 6in Table VI is large because it drains significant amount ofcharge from bulk capacitor to the ground. On the other hand,Eir for the same case in Table VII is much smaller because ituses most of the stored charge to supply the load. This resultis very different from previous models such as [6] as theysimply calculate the overhead based on the charge transfer toand from the bulk capacitor.
B. Case 2: ARM Cortex-A8 ProcessorThe second target DVFS system is the ARM Cortex-A8
processor with LTC3446 converter. ARM Cortex-A8 processoris an application processor targeting high-end mobile productssuch as smartphones, tablets, and netbooks. It exhibits powerconsumption of 600 mW at full speed. We perform a procedure
10
TABLE IIVOLTAGE (Vcpu (V)) AND CLOCK FREQUENCY ( fCPU (GHZ)) LEVELS FOR
INTEL CORE2 DUO E6850 PROCESSOR.
DVFS level Vcpu fcpu DVFS level Vcpu fcpuLevel 1 1.30 3.074 Level 4 1.15 2.281Level 2 1.25 2.852 Level 5 1.10 1.932Level 3 1.20 2.588 Level 6 1.05 1.540
TABLE IIIMEASURED AND ANALYTICAL MODELS OF INTEL CORE2 DUO E6850
POWER CONSUMPTION.
Vcpu(V ) fcpu(GHz) Measurement (W) Analyticalmodel (W)
1.056 1.776 21.520 21.2121.080 1.888 24.000 23.9561.104 2.004 26.320 26.8561.160 2.338 33.760 34.8381.224 2.672 43.200 44.4091.280 3.006 55.440 54.236
reflects the supply voltage and frequency changes. We choosea high-end DVFS-enabled microprocessor, i.e., Intel Core2Duo E6850 processor, along with the LTC3733 3-phase syn-chronous step-down DC–DC converter that supports discon-tinuous mode, which is a representative setup of a modernhigh-performance DVFS-enabled microprocessor.
The microprocessor power consumption model is describedin (18). The parameters Ce, a1, and a2 is obtained fromactual measurements. We insert a shunt monitor circuit rightin front of the DC–DC converter of the Intel Core2 DuoE6850 processor, and measure the power supply current withan Agilent A34401 digital multimeter. We compensate theDC–DC converter efficiency from the measured current values,and characterize IO. We run PrimeZ benchmark and changeVcpu and fcpu performing direct access to the BIOS (basicinput/output system) as described in Table II because the IntelSpeedStep supports only two voltage levels. We finally derivethe following power consumption model:
Pcpu = 8.4503V 2cpu fcpu +(36.3851Vcpu �33.9503), (30)
where the units of Pcpu, Vcpu, and fcpu are W, V, and GHz,respectively. The difference between the analytical model andmeasurement results is less than 4.6% as shown in Table III.The DC–DC converter parameters are given in Table IV. Thevalues are chosen according to guidelines in datasheet andreference designs offered by the vendor.
The delay overhead of DVFS transition is given in Table V.
TABLE IVDC–DC CONVERTER PARAMETERS OF LTC3733 3-PHASE CONVERTER
FOR INTEL CORE2 DUO E6850.
Parameter Value Parameter ValueVIN 12 (V) VOUT VO in Table IIC 8840 (µF) L 1 (µH) per phaseRL 2.3 (mW) fDC 530 (kHz) per phase
max(IL) 75 (A)
TABLE VDVFS TRANSITION DELAY OVERHEAD FOR INTEL CORE2 DUO E6850PROCESSOR WITH LTC3733 CONVERTER. (Epll =5 µs AND THE NUMBER
OF CYCLES ARE CALCULATED AT f = 3.074 GHz).
Level Actual value (µs) Proposed model (µs)Tuc Total Cycles Tuc Total Cycles
2!1 4.77 9.77 30018 4.11 9.11 280113!1 12.29 17.29 53141 12.21 17.21 528903!2 5.95 10.95 33672 5.72 10.72 329504!1 22.29 27.29 83894 24.24 29.24 898944!2 14.69 19.69 60531 16.21 21.21 652014!3 7.33 12.33 37921 8.06 13.06 401505!1 34.81 39.81 122389 40.37 45.37 1394575!2 26.47 31.47 96733 31.44 36.44 1120255!3 27.33 32.33 99383 21.87 26.87 826065!4 9.43 14.43 44361 11.49 16.49 506946!1 57.68 62.68 192684 60.90 65.90 2025906!2 49.50 54.50 167525 51.65 56.65 1741576!3 31.89 36.89 113409 41.52 46.52 1429946!4 28.35 33.35 102531 30.14 35.14 1080066!5 12.14 17.14 52688 16.84 21.84 67141
Downscale 0 5 15370 0 5 15370
The value of Tpll , 5 µs, is specified in the Intel Core2 DuoE6850 datasheet. The actual values are obtained from SPICEsimulation results. We obtain TX by observing the settling timeof VO(t) from SPICE results, and substitute it into equationsin Section IV to calculate the delay overhead. The estimatedoverhead from the proposed macro model well follows thetrend of actual values. For upscaling, the delay overhead issum of underclocking-related overhead Tuc and PLL locktime loss Tpll . Unlike assumption of previous works, theunderclocking-related overhead is the dominant factor for mostcases as we have discussed in Section III. For downscaling,PLL lock time is the only delay overhead, and thus theoverhead values are the same for all cases.
The energy overhead values of a DVFS transition forcontinuous- and discontinuous-mode operations are given inTables VI and VII, respectively. For the actual value, we obtainIL(t), IO(t), and VO(t) from SPICE simulation and substitutethem into (14), (16), and (19). There is no Ecap in the Tablesas it is implied in Eir. The value of Eir for the case 1 ! 6in Table VI is large because it drains significant amount ofcharge from bulk capacitor to the ground. On the other hand,Eir for the same case in Table VII is much smaller because ituses most of the stored charge to supply the load. This resultis very different from previous models such as [6] as theysimply calculate the overhead based on the charge transfer toand from the bulk capacitor.
B. Case 2: ARM Cortex-A8 ProcessorThe second target DVFS system is the ARM Cortex-A8
processor with LTC3446 converter. ARM Cortex-A8 processoris an application processor targeting high-end mobile productssuch as smartphones, tablets, and netbooks. It exhibits powerconsumption of 600 mW at full speed. We perform a procedure
Case Study: Exynos 4210
31Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 31Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
DVFS Overhead (Exynos 4210)
Linux Power Management Architecture
33Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 33Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
Linux Power Management Architecture
Amit Kucheria at 2011 Embedded Linux Conference
Policy ManagementLayer (Governors)
Device DriverLayer
CPUidle Architecture
34Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 34Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
CPUidle Architecture
CPUidle Governors
¨ Ladder Governor ¤ Takes a simple, step-wise approach to selecting an idle
state ¤ Enters the lightest state first, and will only move on to
the next deeper state if a sleep was long enough
¨ Menu Governor ¤ Picks the deepest possible idle state straight away ¤ Considers the expected sleep time, latency
requirements, previous C-state residency, etc
Idle Task
¨ When there are no runnable processes, and CFS schedules the idle task (PID 0)
36Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 36Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
Idle Task�When there are no runnable processes, and CFS
schedules the idle task (PID 0)
Tickless Idle
¨ Traditional systems use a periodic interrupt 'tick’¤ Update the system clock¤ Tick requires wakeup from idle state
¨ Tickless idle eliminates the periodic timer tick when the CPU is idle¤ The CPU can remain in power saving states for a longer
period of time, reducing the overall system power consumption
CPUFreq Architecture
40Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 40Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
CPUfreq Architecture
CPUFreq Governor
41Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 41Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
CPUfreq Governors
Governor Operations
Performance • Always set CPU to the highest frequency between scaling_min_freq and scaling_max_freq
Powersave • Always set CPU to the lowest frequency between scaling_min_freq and scaling_max_freq
Ondemand• Set frequency depending on the current usage
• Rapidly increase the frequency and gracefully decrease the frequency
Conservative• Basically operates like ondemand
• Gracefully increase and decrease the frequency
Userspace• Set CPU to the frequency using
scaling_setspeed by user
Ondemand Governor
42Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 42Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
Ondemand Governor
min
max
up_threshold = 95%Load
Time
Freq.
Conservative Governor
43Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 43Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
Conservative Governor
min
max
up_threshold = 80%down_threshold = 20%
Time
Freq.
Load
Basic Operations of CPUfreq
¨ Sample the processor utilization periodically¨ Adjust frequency based on the utilization¨ Adjust voltage based on frequency
44Real-Time Computing and Communications Lab., Hanyang University
http://rtcc.hanyang.ac.kr 44Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr
Basic Operations of CPUfreq� Sample the processor utilization periodically� Adjust frequency based on the utilization� Adjust voltage based on frequency