improving the efficiency of cloud infrastructures with elastic tandem machines (ieee cloud 2013)

24
Universität Stuttgart Institute of Parallel and Distributed Systems (IPVS) Universitaetsstr. 38 70569 Stuttgart Germany Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines Sixth IEEE International Conference on Cloud Computing Santa Clara, CA, USA June 29th, 2013 Frank Dürr

Upload: frank-duerr

Post on 10-Dec-2014

349 views

Category:

Technology


3 download

DESCRIPTION

The presentation of our full paper presented at IEEE Cloud 2013. Abstract: In this paper, we propose a concept for improving the energy efficiency and resource utilization of cloud infrastructures by combining the benefits of heterogeneous machine instances. The basic idea is to integrate low-power system on a chip (SoC) machines and high-power virtual machine instances into so-called Elastic Tandem Machine Instances (ETMI). The low-power machine serves low load and is always running to ensure the availability of the ETMI. When load rises, the ETMI scales up automatically by starting the high-power instance and handing over traffic to it. For the non-disruptive transition from low-power to high-power machines and vice versa, we present a handover mechanism based on software-defined networking technologies. Our evaluations show the applicability of low-power SoC machines to serve low load efficiently as well as the desired scalability properties of ETMIs.

TRANSCRIPT

Page 1: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

Institute of Parallel and

Distributed Systems (IPVS)

Universitaetsstr. 38

70569 Stuttgart

Germany

Improving the Efficiency of Cloud Infrastructures

with Elastic Tandem Machines

Sixth IEEE International Conference on Cloud Computing

Santa Clara, CA, USA

June 29th, 2013

Frank Dürr

Page 2: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Overview

• Motivation

• System Model

• Elastic Tandem Machines

• Evaluation

• Summary

2

Page 3: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Motivation

• Date centers contain up to tens of thousands of hosts

• Energy-efficiency one of the major challenges

• The ideal host is energy proportional [Barroso, Hölzle]

◦ Energy consumption should be proportional to utilization/load

3

power

consumption

utilization 100%

max

Ideal System Real System

0% (idle) 100% 0% (idle)

power

consumption

utilization

Efficiency

100%

Efficiency 0% Efficient area

of operation

Page 4: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Goal

Building the ideal energy-proportional machine

• (Almost) no power consumption while being idle

• Elasticity: Scaling up to nominal (maximum) requested resources

4

100% idle

Fill this area of

inefficient operation!

power

consumption

utilization

Page 5: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Contribution: Elastic Tandem Machines

System on a Chip (SoC)

Machine

• Low performance

• Low power consumption:

~ 2 Watt

Classic high power VM

on commodity PC Hardware

• High performance

• High power consumption

Elastic Tandem Machine: Best of both worlds

• Low power consumption in idle/weak load

• Scale up to maximum nominal resources

• Transparency: Clients see only one ideal machine

+

Transparent integration of heterogeneous hardware

100 Mbps

NIC

700 MHz ARM

512 MB RAM

16 GB

SD Card ~ 35$

[source: www.dell.com]

Page 6: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Contributions in Detail

Show that SoCs can serve low load in realistic settings

• Web server in 3-tier system architecture

Concept for implementing Elastic Tandem Machines

• Handover concept to switch between SoC and VM

◦ Adaptive: based on dynamic load

◦ Transparent, seamless, non-disruptive

▪ Client just sees one “ideal” machine

▪ Existing (TCP) connections don‘t break during handover

◦ “In network” based on Software-defined Networking (SDN)

• Proof of concept implementation and evaluation

6

Page 7: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Overview

• Motivation

• System Model

• Elastic Tandem Machines

• Evaluation

• Summary

7

Page 8: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

System Model (1)

Target environment: Data center of IaaS provider

• SoC machines (Low-power Micro Instances; LPMI)

• Classic VMs on PC hosts (High-power Instances; HPI)

• One LPMI + one HPI = one Elastic Tandem Machine (ETMI)

• Network:

◦ Core switches SDN-enabled

◦ Programmable forwarding

tables

SDN

Controller

Core

Switches

Client

Data center

Top of Rack

Switches …

Internet

ETMI DB

HPI LPMI

Page 9: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

System Model (2)

3-Tier web service

• ETMI runs web server (middle tier)

◦ One public IP address for ETMI Transparency

◦ One web server instance on LPMI and HPI

• File/DB servers in backend

◦ Store all persistent data and state

◦ Not part of optimization!

9

SDN

Controller

Core

Switches

Client

Data center

Top of Rack

Switches …

HPI LPMI

Internet

DB ETMI

Public Service

IP

Page 10: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Overview

• Motivation

• System Model

• Elastic Tandem Machines

◦ Overview

◦ System Components

◦ Seamless handover concept

• Evaluation

• Summary

10

Page 11: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Basic Concept: Overview

• HTTP requests either forwarded to …

◦ … LPMI during low load

◦ … HPI during high load

SDN-based programming

of network (forwarding tables)

• LPMI always running

◦ Service always available

• HPI booted on demand on

LPMI overload

• HPI shutdown if current load

would not overload LPMI

SDN

Controller

Core

Switches

Internet

ETMI

Low load path

to LPMI

Page 12: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

System Components

Handover Controller:

• Switches between LPMI

and HPI based on their

load

• Programs core switches

using OpenFlow

• Boots or shuts down HPI

via Virtual Machine

Manager

• Hysteresis and “ignore

period” to prevent

oscillation

12

Core

Switches

Top of Rack

Switches …

Handover

Controller

OpenFlow

MAC Address re-writing:

• If destination IP matches public IP

write MAC of LPMI (or HPI) in frame

IP aliasing:

• NICs configured with (same)

public IP address of service

• Private IPs used for communication

with controller

Page 13: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

System Components

Load Monitors:

• Notify controller of

LPMI overload and

HPI under-load

• Load metric:

Incoming data rate

• Threshold scheme

(overload, under-load

thresholds)

• Offline benchmarking

to define LPMI

overload threshold

13

Core

Switches

Top of Rack

Switches …

Internet

Load Monitor Load Monitor

Overload!

Page 14: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

• Problem: Simple switching breaks existing TCP connections

◦ HTTP 1.1: Multiple requests send over same TCP connection!

• Solution: “Pinning” of existing

connections to old instance

◦ Controller queries instances for

accepted or established connections

▪ Connection monitor (ss or netstat)

◦ Inserts high priority entry into

core switch forwarding table:

▪ (client IP, client port,

public IP, public port)

MAC_rewriting(instance MAC)

Seamless Handover

14

Internet

Connection

Monitor

Connection

Monitor

Connections?

connection

pinning

Page 15: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Client ControllerConnection

Monitor LPMI

SYN

Web ServerLPMI

query open connections

ACK / SYN

pin open connections

t1

t2

...

Seamless Handover

• It‘s not that simple!

◦ There‘s a race condition

• Solution: Block connection requests before querying

◦ Controller programs firewalls on LPMI/HPI

◦ Unblock after flow re-direction

Connection accepted

after query (t1)!

Page 16: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Overview

• Motivation

• System Model

• Elastic Tandem Machines

• Evaluation

• Summary

16

Page 17: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Evaluation Setup

• Elastic Tandem machine with:

◦ Low-power Instance (SoC): Raspberry Pi

▪ 700MHz ARM CPU, 512 MB RAM,

100 Mbps Ethernet NIC

◦ High-power Instance (PC):

▪ AMD Athlon 64 X2 Dual Core 4.2 GHz

2 GB RAM, 1 Gbps Ethernet NIC

◦ Running Apache Web server, PHP,

Tomcat servlet engine

• Backend: NFS file server, MySQL

• Core switch: PC with Open vSwitch

and multiport NIC

◦ Line rate forwarding (no bottleneck)

• SDN handover controller based on Floodlight

17

NFS

MySQL LPMI

Apache,

Tomcat

HPI

Apache,

Tomcat

ETMI

Handover

Controller

OpenFlow

HTTP-Client

Page 18: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

LPMI (SoC) Performance for Static Web Pages

18

20 requests/s

Increase avg. request rate

by 1 request/s every 50 s (Poisson distr.)

Low-power SoC can serve

realistic low-load

(too slow for processing-intensive jobs paper)

Scenario:

• Real static

web pages from: http://www.netsys2013.de/

Performance:

• Throughput:

◦ Max. 26 pages/s

• Response time

◦ Significant

increase at

20 requests/s

(> 150 ms)

◦ Performance limit

Page 19: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

ETMI Performance for Static Web Pages

Handover LPMI HPI

Increase request rate

by 1 request/s every 50 s until 2500s,

Then decrease rate at 1 request every 50 s

ETMI scales up

transparently

Configuration:

• Switch between LPMI

and HPI at data rate

Toverload = 80 KB/s

Tunderload = 53 KB/s

Performance:

• Scales to maximum

HPI performance

• Seamless handover

◦ No broken HTTP

connections

Handover HPI LPMI

Page 20: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Energy Efficiency (Static Web Page Scenario)

Idle mode power consumption:

• SoC: 1.85 W PC host: 141.22 W

20

The SoC area

(left figure)

PC Host SoC

Page 21: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Energy Efficiency – Comparison with

Virtualization (Static Web Page Scenario)

Idle mode power consumption:

• SoC: 1.85 W

• PC host: 141.22 W

76 (idle) VMs per host for same energy efficiency

Fair comparison: PC host must serve same load as 76 SoCs

• At 76x4 request/s = 304 request/s:

◦ 76 SoCs: 76 x 1.89W = 143.65 W

◦ PC host: 1 x 184.46 W

• At 76x8 request/s = 608 request/s:

◦ 76 SoCs: 76 x 1.92 W = 145.92 W

◦ PC hosts: 2 x 184.46 W = 368.92 W

Our PC host could only serve max. 300 request/s!!!

22%

energy savings

60%

energy savings

Page 22: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

(Closest) Related Work (more see paper)

SoC & host integration

• B.-G. Chun, G. Iannaccone, R. Katz, G. Lee, and L. Niccolini, ACM SIGOPS

Operating Systems Review, 44(1), 2010

◦ Integration of discrete server systems as one design option

◦ Our handover mechanism is one (network centric) technical solution for a

transparent integration

Load balancing mechanisms

• R. Wang, D. Butnariu, and J. Rexford, Hot-ICE 2011

◦ SDN-based approach for keeping TCP connections alive

▪ Approach 1: Re-directs packets to controller (possibly high load on controller)

▪ Approach 2: Timeout heuristic (problem of setting timeout)

◦ We utilize readily available end-system information about connections

◦ We handle dynamic state consistently through firewall “locks”

22

Page 23: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Summary and Future Work

Elastic Tandem Machine

• Concept for transparent integration of SoCs and classic VMs

• Low power consumption at weak load

• Elasticity: Scale up to nominal resources

• SDN-based seamless handover concept

Future work

• Integrating more than two machine types

◦ Micro instance, small instance, large instance, …

• Predictive load/performance models to plan handover in advance

23

Page 24: Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

IPVS

Research Group

“Distributed Systems”

Discussion

24

Full paper:

http://goo.gl/Vkdmfc

Contact:

Frank Dürr

email: [email protected]

WWW: http://goo.gl/o6u2A