vsnoop: improving tcp throughput in virtualized environments via acknowledgement offload

35
vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department of Computer Science Purdue University

Upload: locke

Post on 22-Feb-2016

55 views

Category:

Documents


0 download

DESCRIPTION

vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload . Ardalan Kangarlou , Sahan Gamage , Ramana Kompella , Dongyan Xu Department of Computer Science Purdue University. Cloud Computing and HPC. Background and Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

vSnoop: Improving TCP Throughput in Virtualized Environments

via Acknowledgement Offload

Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu

Department of Computer SciencePurdue University

Page 2: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Cloud Computing and HPC

Page 3: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Background and Motivation Virtualization: A key enabler of cloud

computing Amazon EC2, Eucalyptus

Increasingly adopted in other real systems: High performance computing

NERSC’s Magellan system Grid/cyberinfrastructure computing

In-VIGO, Nimbus, Virtuoso

Page 4: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Multiple VMs hosted by one physical host Multiple VMs sharing the same core

Flexibility, scalability, and economy

VM Consolidation: A Common Practice

Hardware

Virtualization Layer

VM 1 VM 3 VM 4VM 2Key Observation:VM consolidation negatively

impacts network performance!

Page 5: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Sender

Hardware

Virtualization Layer

Investigating the Problem

Server

VM 1 VM 2 VM 3Client

Page 6: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

40

60

80

100

120

140

160

180

5432

RTT

(ms)

Number of VMs

US East – West

US East – Europe

US West – Australia

RTT increases in proportion to VM scheduling slice

(30ms)

Q1: How does CPU Sharing affect RTT ?

Page 7: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

RTT Increase

Q2: What is the Cause of RTT Increase ?

Sender

Hardware

Driver Domain(dom0)

VM 1

Device Driver

VM 3

bufbuf

30ms

30ms

VM scheduling latency dominates

virtualization overhead!

CD

F

VM 2

buf

+ dom0 processing x wait time in buffer

Page 8: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Connection to the VM is much slower than dom0!

Q3: What is the Impact on TCP Throughput ?

+ dom0 x VM

Page 9: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Our Solution: vSnoop Alleviates the negative effect of VM

scheduling on TCP throughput Implemented within the driver domain to

accelerate TCP connections

Does not require any modifications to the VM

Does not violate end-to-end TCP semantics Applicable across a wide range of VMMs

Xen, VMware, KVM, etc.

Page 10: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Sender VM1 BufferDriver Domain

time

SYN

SYN,ACK

SYN

SYN,ACK

VM1 buffer

TCP Connection to a VMScheduled VM

VM1

VM2

VM3

VM1

VM2

VM3

SYN,ACKSYN

VM Scheduling Latency

RTT

RTT

VM Scheduling Latency

Sender establishes a TCP connection to

VM1

Page 11: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Sender VM Shared BufferDriver Domain

time

SYN

SYN,ACK

SYN

SYN,ACK

VM1 buffer

Key Idea: Acknowledgement OffloadScheduled VM

VM1

VM2

VM3

VM1

VM2

VM3

SYN,ACK

w/ vSnoop

Faster progress during TCP slowstart

Page 12: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

vSnoop’s Impact on TCP Flows TCP Slow Start

Early acknowledgements help progress connections faster

Most significant benefit for short transfers that are more prevalent in data centers [Kandula IMC’09], [Benson WREN’09]

TCP congestion avoidance and fast retransmit Large flows in the steady state can also benefit from

vSnoop Benefit not as much as for Slow Start

Page 13: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Challenge 1: Out-of-order/special packets (SYN, FIN packets)

Solution: Let the VM handle these packets

Challenge 2: Packet loss after vSnoop Solution: Let vSnoop acknowledge only if room in

buffer

Challenge 3: ACKs generated by the VM Solution: Suppress/rewrite ACKs already generated by

vSnoop

Challenge 4: Throttle Receive window to keep vSnoop online

Solution: Adjusted according to the buffer size

Challenges

Page 14: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

State Machine Maintained Per-Flow

Start

Unexpected Sequence

Active(online)

No buffer(offline)

Out-of-order packet

In-order pkt Buffer space

available

Out-of-order packet

In-order pktNo buffer

In-order pkt Buffer space available

No buffer

Packet recvEarly acknowledgements

for in-order packets

Don’t acknowledge

Pass out-of-order pkts to VM

Page 15: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

vSnoop Implementation in Xen

Driver Domain (dom0)

Bridge

Netfront

Netback

vSnoop

VM1

Netfront

Netback

VM3Netfront

Netback

VM2

buf bufbuf

Tuning Netfront

Page 16: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Evaluation Overheads of vSnoop

TCP throughput speedup

Application speedup Multi-tier web service (RUBiS) MPI benchmarks (Intel, High-Performance

Linpack)

Page 17: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Evaluation – Setup VM hosts

3.06GHz Intel Xeon CPUs, 4GB RAM Only one core/CPU enabled Xen 3.3 with Linux 2.6.18 for the driver domain (dom0)

and the guest VMs Client machine

2.4GHz Intel Core 2 Quad CPU, 2GB RAM Linux 2.6.19

Gigabit Ethernet switch

Page 18: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

vSnoop Routines

Single Stream Multiple Streams

Cycles CPU % Cycles CPU %

vSnoop_ingress() 509 3.03 516 3.05vSnoop_lookup_hash(

)74 0.44 91 0.51

vSnoop_build_ack() 52 0.32 52 0.32vSnoop_egress() 104 0.61 104 0.61

Per-packet CPU overhead for vSnoop routines in dom0

vSnoop Overhead Profiling per-packet vSnoop overhead using

Xenoprof [Menon VEE’05]

Minimal aggregateCPU overhead

Page 19: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Median

0.192MB/s

0.778MB/s

6.003MB/s

TCP Throughput Improvement 3 VMs consolidated, 1000 transfers of a

100KB file Vanilla Xen, Xen+tuning,

Xen+tuning+vSnoop30x Improvement

+ Vanilla Xen x Xen+tuning * Xen+tuning+vSnoop

Page 20: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

TCP Throughput: 1 VM/Core

0.00

0.20

0.40

0.60

0.80

1.00

100M

B

10M

B

1MB

500K

B

250K

B

100K

B

50KBNo

rmal

ized

Thro

ughp

ut

Transfer Size

Xen+tuning+vSnoopXen+tuningXen

Page 21: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

TCP Throughput: 2 VMs/Core

0.00

0.20

0.40

0.60

0.80

1.00

100M

B

10M

B

1MB

500K

B

250K

B

100K

B

50KBNo

rmal

ized

Thro

ughp

ut

Transfer Size

Xen+tuning+vSnoopXen+tuningXen

Page 22: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

TCP Throughput: 3 VMs/Core

0.00

0.20

0.40

0.60

0.80

1.00

100M

B

10M

B

1MB

500K

B

250K

B

100K

B

50KB

Norm

alize

d Th

roug

hput

Transfer Size

Xen+tuning+vSnoopXen+tuningXen

Page 23: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

TCP Throughput: 5 VMs/Core

0.00

0.20

0.40

0.60

0.80

1.00

100M

B

10M

B

1MB

500K

B

250K

B

100K

B

50KB

Norm

alize

d Th

roug

hput

Transfer Size

Xen+tuning+vSnoopXen+tuningXen

vSnoop’s benefit rises with higher VM consolidation

Page 24: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

TCP Throughput: Other Setup Parameters CPU load for VMs Number of TCP connections to VM Driver domain on separate core Sender being a VM

vSnoop consistently achieves significant TCP

throughput improvement

Page 25: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

vSnoopdom0

dom1 dom2

Server1

vSnoopdom0

dom1 dom2

Server2ClientClient Threads

Application-Level Performance: RUBiS

RUBiS Clients Apache MySQL

Page 26: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

RUBiS Operation Countw/o vSnoop

Countw/ vSnoop

%Gain

Browse 421 505 19.9%BrowseCategories 288 357 23.9%

SearchItemsInCategory 3498 4747 35.7%BrowseRegions 128 141 10.1%

ViewItem 2892 3776 30.5%ViewUserInfo 732 846 15.6%

ViewBidHistory 339 398 17.4%Others 3939 4815 22.2%Total 12237 15585 27.4%

Average Throughput 29 req/s 37 req/s 27.5%

RUBiS Results

Page 27: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Intel MPI Benchmark: Network intensive High-performance Linpack: CPU intensive

vSnoopdom0

dom1 dom2

Server1dom0

dom1 dom2

Server2dom0

dom1 dom2

Server3dom0

dom2

Server4

dom1

MPI nodes

Application-level Performance – MPI Benchmarks

vSnoop vSnoop vSnoop

Page 28: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Intel MPI Benchmark Results: Broadcast

0.00

0.20

0.40

0.60

0.80

1.00

8MB

4MB

2MB

1MB

512K

B

256K

B

128K

B

64KB

Norm

alize

d Ex

ecut

ion

Tim

e

Message Size

Xen+tuning+vSnoopXen+tuningXen

40% Improvement

Page 29: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Intel MPI Benchmark Results: All-to-All

0.00

0.20

0.40

0.60

0.80

1.00

8MB

4MB

2MB

1MB

512K

B

256K

B

128K

B

64KB

Norm

alize

d Ex

ecut

ion

Tim

e

Message Size

Xen+tuning+vSnoopXen+tuningXen

Page 30: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

40%

HPL Benchmark Results

0.000 0.200 0.400 0.600 0.800 1.000 1.200 1.400 1.600 1.800

(8K,

16)

(8K,

8)

(8K,

4)

(8K,

2)

(6K,

16)

(6K,

8)

(6K,

4)

(6K,

2)

(4K,

16)

(4K,

8)

(4K,

4)

(4K,

2)

Gflop

s

Problem Size and Block Size (N,NB)

Xen+tuning+vSnoopXen

Page 31: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Related Work Optimizing virtualized I/O path

Menon et al. [USENIX ATC’06,’08; ASPLOS’09]

Improving intra-host VM communications XenSocket [Middleware’07], XenLoop

[HPDC’08], Fido [USENIX ATC’09], XWAY [VEE’08], IVC [SC’07]

I/O-aware VM scheduling Govindan et al. [VEE’07], DVT [SoCC’10]

Page 32: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Conclusions Problem: VM consolidation degrades TCP

throughput Solution: vSnoop

Leverages acknowledgment offloading Does not violate end-to-end TCP semantics Is transparent to applications and OS in VMs Is generically applicable to many VMMs

Results: 30x improvement in median TCP throughput About 30% improvement in RUBiS benchmark 40-50% reduction in execution time for Intel

MPI benchmark

Page 33: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

Thank you.

For more information: http://

friends.cs.purdue.edu/dokuwiki/doku.php?id=vsnoopOr Google “vSnoop Purdue”

Page 34: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

TCP Benchmarks cont. Testing different scenarios:

a) 10 concurrent connections b) Sender also subject to VM

scheduling c) Driver domain on a separate core

a)

b)

c)

Page 35: vSnoop: Improving TCP Throughput  in Virtualized Environments  via Acknowledgement Offload

TCP Benchmarks cont. Varying CPU load for 3 consolidated VMs:

40% CPU load:

80% CPU load:

60% CPU load: