virtual melting temperature: managing server load to minimize cooling … · 2018-06-20 · virtual...

Post on 16-Apr-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Virtual Melting Temperature: Managing Server Load to Minimize Cooling

Overhead with Phase Change Materials

Matt Skach1, Manish Arora2,3, Dean Tullsen3, Lingjia Tang1, Jason Mars1

University of Michigan1 -- Advanced Micro Devices, Inc.2 -- UC San Diego3

ISCA ‘18

Datacenters

2

Facebook Ireland Datacenter

Facebook datacenter

Huge warehouses full of servers that host the internet and the cloud

Datacenters Cooling

3

● Heat must be removed to prevent:○ Overheating○ Thermal downclocking○ Component failure

http://www.asetek.com/media/1031/rackcdu_d2c_datacenter.jpg

Global Energy Consumption (CIA World Factbook)

4

Energy Consumption Electricity Consumption (TWh/year)

1 China 6,100

2 United States 4,100

3 European Union 3,100

4 India 1,300

5 Russia 1,000

6 Japan 980

7 Canada 640

Datacenter Energy Consumption (Avgerinou, 2017)

5

Energy Consumption Electricity Consumption (TWh/year)

1 China 6,100

2 United States 4,100

3 European Union 3,100

Datacenters (global, est.) 1,600

4 India 1,300

5 Russia 1,000

6 Japan 980

7 Canada 640

Datacenter Energy Consumption (Avgerinou, 2017)

6

Energy Consumption Electricity Consumption (TWh/year)

1 China 6,100

2 United States 4,100

3 European Union 3,100

Datacenters (global, est.) 1,600

4 India 1,300

5 Russia 1,000

6 Japan 980

Datacenter Cooling (global, est.) 650

7 Canada 640

Datacenter Cooling

7

● Datacenter cooling is very expensive○ Infrastructure can cost 10s of

millions of dollars for large DCs

(Kontorinis, 2014)

○ Generally, more power efficient systems are more expensive up front

Open Compute cooling system

Datacenter Workloads

● Diurnal load is problematic○ Work is uneven○ Work is distributed○ Heat is produced when work is done

8

Google Search: US Load

Datacenter Cooling

● Build a big cooling system for peak load○ Underutilized most of the time

9

Expensive 100% coverage, low utilization

Datacenter Cooling ctd.

● Build a big cooling system for peak load○ Underutilized most of the time

10

Expensive 100% coverage, low utilization

Datacenter Cooling ctd.

● Build a big cooling system for peak load○ Underutilized most of the time

11

Expensive

Best

100% coverage, low utilization

50% coverage, maximum utilization

Thermal Time Shifting (TTS) [ISCA ‘15]

3am 7am 7pm 12am Time

Coo

ling

Load

Store heat toflatten peak

Release heatduring off hours

CoupledDecoupled

12

Cooling Load

● Metric of heat that must be removed● Datacenter is primarily concerned with IT & support equipment

13

http://www.slideshare.net/spsu/12-cooling-load-calculations

A Phase Change Material (PCM)

14

● Store energy in a Solid->Liquid phase change● Commercial paraffin wax offers the best properties of currently

available PCMs (Skach, 2015)

The problem with passive TTS

Thermal Time Shifting:

● Paraffin has a limited range of melting temperatures● Melting temperature cannot be changed● Power and temperature profiles vary over lifetime of servers

15Wikimedia Commons

Virtual Melting Temperature

● Datacenters need more flexibility● Create a “virtual” melting temperature separate from the actual melting

temperature

16Microsoft, Wikimedia Commons

Test Infrastructure

● 2U High Throughput Server● 2-day Google Workload trace divided between 5 datacenter workloads

17

Test Methodology

● 5 common datacenter workloads1. Web Search2. Data Caching3. Video Encoding4. Virus Scan5. Clustering

● Consider datacenter where all are colocated○ Contention mitigation techniques applied (eg. Bubble Up (Mars, 2011) and

Protean Code (Laurenzano, 2014))

18

Baseline: Load Balancing Schedulers

● Round Robin and Coolest First

19

Baseline: Load Balancing Schedulers

● Round Robin and Coolest First

● Problem: Average cluster temperature is too low to melt wax

Thermal Aware VMT

● Categorize jobs based upon thermal characteristics○ Binary classification: Would they melt significant wax in isolation?

21

Thermal Aware VMT

● Grouping Value (GV): Controllable ratio of group size○ Proportional to hot group size

● Locate ‘hot jobs’ together in ‘hot group’ to melt wax

22

Thermal Aware VMT Results

● Hot Group sized to melt wax during peak hours

23

Thermal Aware VMT Results

● Balance between melting wax too soon and not melting enough wax

24

GV=24: Hot group is too big

GV=22: Hot group is just right

GV=20: Hot Group is too small

Thermal Aware VMT Results

● Balance between melting wax too soon and not melting enough wax

25

GV=24: Hot group is too big

GV=22: Hot group is just right

GV=20: Hot Group is too small

Wax Aware VMT

● Begin with same setup as VMT-TA● When wax in hot group is fully melted, expand hot group

26

Wax Aware VMT Results

● Hot Group slightly too small: automatically expands during peak load

27

Wax Aware VMT Results

● Wax expansion preserves significant cooling load reduction

28

GV=24: Hot group is too big

GV=22: Hot group is just right

GV=20: Hot Group is too small

Wax Aware VMT Results

● Wax expansion preserves significant cooling load reduction

29

GV=24: Hot group is too big

GV=22: Hot group is just right

GV=20: Hot Group is too small

VMT-TA vs. VMT-WA

● Both work well at ideal GV● VMT-WA offers much more flexibility for unpredictable load

30

Smaller Hot Group

BiggerHot Group

Summary

● VMT stores thermal energy when passive TTS alone cannot○ Reduces maximum cooling load of a diurnal workload○ Configurable for varying datacenter power and load levels

● VMT-enabled thermal energy storage can:○ Reduce cooling system size 12%○ Or allow up to 14% more servers under the same cooling budget

31

Thank you!

32

Questions?

33

top related