cif16: building the superfluid cloud with unikernels (simon kuenzer, nec europe)

34
Building the Superfluid Cloud with Unikernels SCALE 14X, January 2016 Simon Kuenzer, NEC Europe Ltd.

Upload: the-linux-foundation

Post on 21-Mar-2017

948 views

Category:

Technology


0 download

TRANSCRIPT

Building the Superfluid Cloud with Unikernels

SCALE 14X, January 2016

Simon Kuenzer, NEC Europe Ltd.

Building the Superfluid Cloud with Unikernels

• The Superfluid Cloud

• Implementation and Results

• Future Work

• Open Source

The Superfluid Cloud

The Vision

5 © NEC Corporation 2016

The Superfluid Cloud

ACCESS NETWORK AGGREGATION NETWORK CORE NETWORK

low delay

low compute/storage capacity

higher delay

high compute/storage capacity

LTE

Multi-cell

aggregation site PoP PoP

PoP Point-of-

Presence site

Internet

Data

center

DC

pla

tfo

rm

5G

base station

site

mic

ro-D

C

pla

tfo

rm

mic

ro-D

C

pla

tfo

rm

mic

ro-D

C

pla

tfo

rm

mic

roserv

er

pla

tfo

rm

mic

roserv

er

pla

tfo

rm

mic

roserv

er

pla

tfo

rm

mic

roserv

er

pla

tfo

rm

DSLAM

deploy deploy deploy deploy

6 © NEC Corporation 2016

New Use Cases

▌Personalized edge services e.g., parental control, firewalls

▌Virtual CDNs e.g., temporary, on-demand scaling, and (live-)event-driven CDNs:

baseball match, OS update roll-out

▌Hierarchical data processing and aggregation e.g., on-the-fly video surveillance

▌Virtualized access to Smart City sensors and actuators e.g., traffic management, public building safety

▌and many others...

7 © NEC Corporation 2016

Technology Enabler: Unikernels

▌Light-weight service deployment with Unikernels based on Mini-OS, OSv, MirageOS, HaLVM, rumprun, ...

driver1

driver2

app 1

GENERAL-PURPOSE OPERATING SYSTEM

KER

NEL S

PA

CE

US

ER

SP

AC

E

app 2

app N

driverN

Vdriver1

vdriver2

app

MINIMALISTIC OPERATING SYSTEM

SIN

GLE A

DD

RES

S

SP

AC

E

vs.

Standard OS Unikernel

8 © NEC Corporation 2016

Unikernels we work on...

▌In numbers (Xen)...

High throughput/performance

Fast instantiation, migration

Low memory footprint

Isolation

10GBit/s throughput

<20ms instantiation time

5MB or less when running

Provided by Virtualization

app

MiniOS

▌On Xen...

app

OSv

▌On KVM...

9 © NEC Corporation 2016

CubieBoard 2

Technology Enabler: Microservers

▌New powerful single board computers Low physical space

Low power supply

Can operate at areas where it is difficult to carry out maintenance

ARM x86 MIPS

Edge Router Lite

Minnowboard Max

Gizmo 2

Raspberry Pi 2

Can be operated at the Network Edge

Initial support by hypervisors

Implementation and Results

Numbers, numbers, numbers!

11 © NEC Corporation 2016

1. HIGH PERFORMANCE I/O

2. FAST INSTANTIATION AND MASSIVE CONSOLIDATION

3. SMALL MEMORY FOOTPRINT, SPECIALIZATION

Our Superfluid Platform based on XEN

1. HIGH PERFORMANCE I/O

2. FAST INSTANTIATION AND MASSIVE CONSOLIDATION

3. SMALL MEMORY FOOTPRINT, SPECIALIZATION

High Performance I/O

13 © NEC Corporation 2016

Fast Unikernel I/O with ClickOS

▌Fast network I/O

Support for many VMs on a single host

10 Gbit/s network throughput or higher

Low delay for processing packets: ~45µs

Mostly introduced with ClickOS[1] work

[1] MARTINS, J., AHMED, M., RAICIU, C., OLTEANU, V., HONDA, M., BIFULCO, R., AND HUICI, F. ClickOS and the art of network function virtualization. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14) (Seattle, WA, Apr 2014), USENIX Association, pp. 459–473.

Click

MiniOS

ClickOS

14 © NEC Corporation 2016

ClickOS: Network Middlebox performance: Scaling out

Intel Xeon E1650 6-core 3.2GHz, 16GB RAM, dual-port Intel x520 10Gb/s NIC.

3 cores assigned to VMs, 3 cores for dom0

ClickOS Host 2

6x 10Gb/s direct cable 6x 10Gb/s direct cable

Host 1

15 © NEC Corporation 2016

ClickOS: Network Middlebox Performance (single VM)

16 © NEC Corporation 2016

ClickOS: Network Middlebox performance: Delays

Unikernel Linux guests Baseline

Massive Consolidation and Fast Instantiation

18 © NEC Corporation 2016

What We Optimized

▌Following numbers are achieved by various optimizations on the platform[1]

LiXS (LIghtweight XenStore) •2500 lines of C++ code, Based on std::map

Toolstack XCL (XenCtrl Light) •600 lines of C code, simplified

XenConsoled •Faster Domain creation by more efficient handling of the Domain polling (1-per domain)

XenDevd

•Faster virtual device creation

[1] MANCO, F., MARTINS, J., YASUKATA, K., MENDES, J., KUENZER, S., AND HUICI, F. The Case for the Superfluid Cloud. In 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15) (Santa Clara, CA, Jul 2015), USENIX Association

19 © NEC Corporation 2016

Massive Unikernel Consolidation

▌Mini-OS guests on Xen

135ms

20ms

12ms

30ms

4x AMD Opteron 6376 16-core 2.3 GHz, 128GB RAM. CPU assignment in round-robin fashion

20 © NEC Corporation 2016

Massive Container Consolidation

▌Massive consolidation with LXC containers (as comparison)

3500ms

270ms 210ms

70ms

21 © NEC Corporation 2016

Unikernel Boot-up

▌Following unikernel boot-up measurement is done with our HTTP-Server Unikernel on Mini-OS, called MiniCache:

▌We are porting it currently also to KVM with OSv:

MiniCache on Xen

HTTP-Server

Mini-OS

lwIP SHFS

MiniCache on KVM

HTTP-Server

OSv

lwIP SHFS

22 © NEC Corporation 2016

Unikernel Boot-up Breakdown

▌Content Cache example with HTTP-Server with file system mounted

Debian+lighttpd Stripped-down Linux + lighttpd

MiniCache on Mini-OS (XEN)

MiniCache on Mini-OS (XEN, ARM)

MiniCache on OSv (KVM)

Unikernels

Intel Xeon E5-1630v3 4-core 3.7 GHz, 32 GB RAM

Unikernel Memory Footprint

24 © NEC Corporation 2016

Unikernel Memory Footprint

▌Comparison of different Content Cache VMs

Image size (MiB) Min. Memory

MiniCache on Mini-OS (Xen)

0.3*/0.7 8

MiniCache on OSv (KVM) 5.9*/8.9 31

OSv + lighttpd 6.1*/9.4 34

Stripped-down Linux + lighttpd

1.8*/5.9 23

Debian + lighttpd 627 82

* compressed image

Unik

ern

els

Microserver Platforms Survey

26 © NEC Corporation 2016

Arch Cores GHz

RAM GB

Price EUR

Others

CubieBoard 2 ARMv7 Allwinner A20 2x 1.0 1 70 SATA;

CubieTruck ARMv7 Allwinner A20 2x 1.0 2 100 SATA; WiFi; BT;

Wandboard Quad ARMv7 Freescal i.MX 6 4x 1.0 2 120 SATA; WiFi;

ODroid XU3 ARMv7 Samsung Exynos-5422

4x 2.1 4x 1.5

2 180 ARM big.LITTLE; USB 3.0;

Raspberry Pi 2 ARMv7 Broadcom BCM2709 4x 0.9 1 40

Intel NUC x86 Intel Core i5 2x 1.3 8 350 mSATA; SATA; USB 3.0; GbE;

Gizmo 2 x86 AMD GX-210HA 2x 2.0 1 180 USB3; Fan;

Intel Edison x86 Intel Quark 2x 0.4 1 100 Wearable; WiFi; BT;

Minnowboard Max x86 Intel Atom E3825 2x 1.3 2 170 SATA; USB 3.0; GbE;

Edge Router Lite MIPS64 Cavium Octeon+ 2x 0.5 0.5 100 Embedded 3 Port Switch;

Data center server x86 Intel Xeon E5 4x 3.7 16 3000 SATA; GbE; Fan; USB 3.0

Wide Range of Devices

Tested parameters: (1) Basic hardware performance, (2) Power consumption, (3) Network throughput, (4) Virtualized network throughput

27 © NEC Corporation 2016

Arch Cores GHz

RAM GB

Price EUR

Others

CubieBoard 2 ARMv7 Allwinner A20 2x 1.0 1 70 SATA;

CubieTruck ARMv7 Allwinner A20 2x 1.0 2 100 SATA; WiFi; BT;

Wandboard Quad ARMv7 Freescal i.MX 6 4x 1.0 2 120 SATA; WiFi;

ODroid XU3 ARMv7 Samsung Exynos-5422

4x 2.1 4x 1.5

2 180 ARM big.LITTLE; USB 3.0;

Raspberry Pi 2 ARMv7 Broadcom BCM2709 4x 0.9 1 40

Intel NUC x86 Intel Core i5 2x 1.3 8 350 mSATA; SATA; USB 3.0; GbE;

Gizmo 2 x86 AMD GX-210HA 2x 2.0 1 180 USB3; Fan;

Intel Edison x86 Intel Quark 2x 0.4 1 100 Wearable; WiFi; BT;

Minnowboard Max x86 Intel Atom E3825 2x 1.3 2 170 SATA; USB 3.0; GbE;

Edge Router Lite MIPS64 Cavium Octeon+ 2x 0.5 0.5 100 Embedded 3 Port Switch;

Data center server x86 Intel Xeon E5 4x 3.7 16 3000 SATA; GbE; Fan; USB 3.0

Wide Range of Devices

Tested parameters: (1) Basic hardware performance, (2) Power consumption, (3) Network throughput, (4) Virtualized network throughput

28 © NEC Corporation 2016

Test Results

Power Consumption

(W)

Bare Metal Performance TCP Throughput

(Mb/s)

Idle 100% CPU

Integer mult. (ns)

Double mult. (ns)

Memory Latency (ns) Bare Metal

KVM

L1 L2 Main

Raspberry Pi 2 B 2.6 3.2 5.17 11.80 5.06 15.50 55.40 94 48

Cubietruck 2.7 4.0 3.22 7.31 3.16 10.20 58.70 940 160

Intel NUC 9.9 13.7 1.20 1.94 1.54 4.76 16.50 941 940

Datacenter Server 66.0 135.0 0.84 1.35 1.08 4.38 22.60 942 942

Arch Cores GHz

RAM GB

Price EUR

Raspberry Pi 2 ARMv7 Broadcom BCM2709 4x 0.9 1 40

CubieTruck ARMv7 Allwinner A20 2x 1.0 2 100

Intel NUC x86 Intel Core i5 2x 1.3 8 350

Datacenter Server x86 Intel Xeon E5 4x 3.7 16 3000

Future Work

30 © NEC Corporation 2016

Future Work

▌Management Framework

…has to with thousands to millions of guests spread across multiple locations

…needs to:

• Be extremely scalable but also extremely lean

• Preserve the properties of the underlying framework

• Understand the properties of each network location

▌Performance evaluation and optimization on embedded devices mostly ARM

▌Efficient scheduling of massive numbers of guests, potentially hundreds of unikernels per CPU core

▌Back-end software switch performance dealing with a massive number of guests

Join us!

Try it out, participate, contribute, …

32 © NEC Corporation 2016

Open Source

▌Join our projects: http://cnp.neclab.eu

▌Register to our mailing list

33 © NEC Corporation 2016

Acknowledgement

▌This work has been partially funded under the EU Horizon 2020 Superfluidity project.