virtual melting temperature: managing server load to minimize cooling … · 2018-06-20 · virtual...

33
Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase Change Materials Matt Skach 1 , Manish Arora 2,3 , Dean Tullsen 3 , Lingjia Tang 1 , Jason Mars 1 University of Michigan 1 -- Advanced Micro Devices, Inc. 2 -- UC San Diego 3 ISCA ‘18

Upload: others

Post on 16-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Virtual Melting Temperature: Managing Server Load to Minimize Cooling

Overhead with Phase Change Materials

Matt Skach1, Manish Arora2,3, Dean Tullsen3, Lingjia Tang1, Jason Mars1

University of Michigan1 -- Advanced Micro Devices, Inc.2 -- UC San Diego3

ISCA ‘18

Page 2: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Datacenters

2

Facebook Ireland Datacenter

Facebook datacenter

Huge warehouses full of servers that host the internet and the cloud

Page 3: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Datacenters Cooling

3

● Heat must be removed to prevent:○ Overheating○ Thermal downclocking○ Component failure

http://www.asetek.com/media/1031/rackcdu_d2c_datacenter.jpg

Page 4: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Global Energy Consumption (CIA World Factbook)

4

Energy Consumption Electricity Consumption (TWh/year)

1 China 6,100

2 United States 4,100

3 European Union 3,100

4 India 1,300

5 Russia 1,000

6 Japan 980

7 Canada 640

Page 5: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Datacenter Energy Consumption (Avgerinou, 2017)

5

Energy Consumption Electricity Consumption (TWh/year)

1 China 6,100

2 United States 4,100

3 European Union 3,100

Datacenters (global, est.) 1,600

4 India 1,300

5 Russia 1,000

6 Japan 980

7 Canada 640

Page 6: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Datacenter Energy Consumption (Avgerinou, 2017)

6

Energy Consumption Electricity Consumption (TWh/year)

1 China 6,100

2 United States 4,100

3 European Union 3,100

Datacenters (global, est.) 1,600

4 India 1,300

5 Russia 1,000

6 Japan 980

Datacenter Cooling (global, est.) 650

7 Canada 640

Page 7: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Datacenter Cooling

7

● Datacenter cooling is very expensive○ Infrastructure can cost 10s of

millions of dollars for large DCs

(Kontorinis, 2014)

○ Generally, more power efficient systems are more expensive up front

Open Compute cooling system

Page 8: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Datacenter Workloads

● Diurnal load is problematic○ Work is uneven○ Work is distributed○ Heat is produced when work is done

8

Google Search: US Load

Page 9: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Datacenter Cooling

● Build a big cooling system for peak load○ Underutilized most of the time

9

Expensive 100% coverage, low utilization

Page 10: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Datacenter Cooling ctd.

● Build a big cooling system for peak load○ Underutilized most of the time

10

Expensive 100% coverage, low utilization

Page 11: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Datacenter Cooling ctd.

● Build a big cooling system for peak load○ Underutilized most of the time

11

Expensive

Best

100% coverage, low utilization

50% coverage, maximum utilization

Page 12: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Thermal Time Shifting (TTS) [ISCA ‘15]

3am 7am 7pm 12am Time

Coo

ling

Load

Store heat toflatten peak

Release heatduring off hours

CoupledDecoupled

12

Page 13: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Cooling Load

● Metric of heat that must be removed● Datacenter is primarily concerned with IT & support equipment

13

http://www.slideshare.net/spsu/12-cooling-load-calculations

Page 14: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

A Phase Change Material (PCM)

14

● Store energy in a Solid->Liquid phase change● Commercial paraffin wax offers the best properties of currently

available PCMs (Skach, 2015)

Page 15: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

The problem with passive TTS

Thermal Time Shifting:

● Paraffin has a limited range of melting temperatures● Melting temperature cannot be changed● Power and temperature profiles vary over lifetime of servers

15Wikimedia Commons

Page 16: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Virtual Melting Temperature

● Datacenters need more flexibility● Create a “virtual” melting temperature separate from the actual melting

temperature

16Microsoft, Wikimedia Commons

Page 17: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Test Infrastructure

● 2U High Throughput Server● 2-day Google Workload trace divided between 5 datacenter workloads

17

Page 18: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Test Methodology

● 5 common datacenter workloads1. Web Search2. Data Caching3. Video Encoding4. Virus Scan5. Clustering

● Consider datacenter where all are colocated○ Contention mitigation techniques applied (eg. Bubble Up (Mars, 2011) and

Protean Code (Laurenzano, 2014))

18

Page 19: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Baseline: Load Balancing Schedulers

● Round Robin and Coolest First

19

Page 20: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Baseline: Load Balancing Schedulers

● Round Robin and Coolest First

● Problem: Average cluster temperature is too low to melt wax

Page 21: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Thermal Aware VMT

● Categorize jobs based upon thermal characteristics○ Binary classification: Would they melt significant wax in isolation?

21

Page 22: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Thermal Aware VMT

● Grouping Value (GV): Controllable ratio of group size○ Proportional to hot group size

● Locate ‘hot jobs’ together in ‘hot group’ to melt wax

22

Page 23: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Thermal Aware VMT Results

● Hot Group sized to melt wax during peak hours

23

Page 24: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Thermal Aware VMT Results

● Balance between melting wax too soon and not melting enough wax

24

GV=24: Hot group is too big

GV=22: Hot group is just right

GV=20: Hot Group is too small

Page 25: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Thermal Aware VMT Results

● Balance between melting wax too soon and not melting enough wax

25

GV=24: Hot group is too big

GV=22: Hot group is just right

GV=20: Hot Group is too small

Page 26: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Wax Aware VMT

● Begin with same setup as VMT-TA● When wax in hot group is fully melted, expand hot group

26

Page 27: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Wax Aware VMT Results

● Hot Group slightly too small: automatically expands during peak load

27

Page 28: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Wax Aware VMT Results

● Wax expansion preserves significant cooling load reduction

28

GV=24: Hot group is too big

GV=22: Hot group is just right

GV=20: Hot Group is too small

Page 29: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Wax Aware VMT Results

● Wax expansion preserves significant cooling load reduction

29

GV=24: Hot group is too big

GV=22: Hot group is just right

GV=20: Hot Group is too small

Page 30: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

VMT-TA vs. VMT-WA

● Both work well at ideal GV● VMT-WA offers much more flexibility for unpredictable load

30

Smaller Hot Group

BiggerHot Group

Page 31: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Summary

● VMT stores thermal energy when passive TTS alone cannot○ Reduces maximum cooling load of a diurnal workload○ Configurable for varying datacenter power and load levels

● VMT-enabled thermal energy storage can:○ Reduce cooling system size 12%○ Or allow up to 14% more servers under the same cooling budget

31

Page 32: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Thank you!

32

Page 33: Virtual Melting Temperature: Managing Server Load to Minimize Cooling … · 2018-06-20 · Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase

Questions?

33