identifying performance bottlenecks in cdns through tcp-level monitoring

15
Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring Peng Sun Minlan Yu, Michael J. Freedman, Jennifer Rexford Princeton University August 19, 2011

Upload: vaughn

Post on 23-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring. Peng Sun Minlan Yu, Michael J. Freedman, Jennifer Rexford Princeton University August 19, 2011. Performance Bottlenecks. CDN Servers. Internet. Clients. Client Insufficient receive buffer. Internet - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

Identifying Performance Bottlenecks in CDNs through

TCP-Level Monitoring

Peng SunMinlan Yu, Michael J. Freedman, Jennifer

RexfordPrinceton University

August 19, 2011

Page 2: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

2

Performance Bottlenecks

ServerAPP

Server OS

CDN Servers

Internet

Clients

APPWrite too slowly

Server OSInsufficient send buffer or Small initial congestion window

InternetNetwork congestion

ClientInsufficient receive buffer

Page 3: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

3

Reaction to Each Bottleneck

ServerAPP

Server OS

CDN Servers

Internet Clients

APP is bottleneck: Debug application

Server OS is bottleneck:Tune buffer size, or upgrade server

Internet is bottleneck: Circumvent the congested part of network

Client is bottleneck: Notify client to change

Page 4: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

4

ServerAPP

Packet SnifferServer

OS

Previous Techniques Not EnoughApplication logs:No details of network activities

Packet sniffing:Expensive to capture

Active probing:Extra load on network

Transport-layer stats:Directly reveal perf. bottlenecks

Page 5: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

5

How TCP Stats Reveal Bottlenecks

CDN Servers Internet

Clients

CDN Server Applications

Server Network

StackNetwork Path Clients

Insufficient data in send buffer

Send buffer full or Initial

congestion window too

small

Packet lossReceive

window too small

Page 6: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

6

Measurement Framework

• Collect TCP statistics• Web100 kernel patch• Extract useful TCP stats for analyzing perf.

• Analysis tool• Bottleneck classifier for individual

connections• Cross-connection correlation at AS level• Map conn. to AS based on RouteView• Correlate bottlenecks to drive CDN decisions

Page 7: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

7

How Bottleneck Classifier Works

BytesInSndBuf = Rwin

Rwin limits sending

Client is bottleneck

Cwin drops greatlyand Packet loss

Network path is bottleneck

Small initial Cwin

Slow start limits perf.

Network Stack is bottleneck

Page 8: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

8

CoralCDN Experiment

• CoralCDN serves 1 million clients per day• Experiment Environment• Deployment: A Clemson PlanetLab node• Polling interval: 50 ms• Traces to Show: Feb 19th – 25th 2011• Total # of Conn.: 209K• After removing

Cache-Miss Conn.: 137K (Total 2008 ASes)

• Log Space overhead• < 200MB per Coral server per day

Page 9: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

9

What are Major Bottleneck for Individual Clients?

• We calculate the fraction of time that the connection is under each bottleneck in lifetime

Bottlenecks% of Conn. With Bottleneck for

>40% of LifetimeServer Application 10.75%Server Network Stack 18.72%Network Path 3.94%Clients 1.27%

Reasons:Slow CPU or scarce disk resources of the PlanetLab node

Reasons:Congestion window rises too slowly for short conn. (>80% of the connections last <1 second) Reasons:Spotty network (discussed in next slide)Reasons:Receive buffer too small (Most of them are <30KB)Our suggestion:Use more powerful PlanetLab machinesOur suggestion:Use larger initial congestion windowOur suggestion:Filter them out of decision making

Page 10: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

10

AS-Level Correlation

• CDNs make decision at the AS level• e.g., change server selection for 1.1.1.0/24

• Explore at the AS level:• Filter out non-network bottlenecks• Whether network problems exist• Whether the problem is consistent

Page 11: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

11

Filtering Out Non-Network Bottlenecks

• CDNs change server selection if clients have low throughput

• Non-network factors can limit throughput• 236 out of 505 low-throughput ASes limited by

non-network bottlenecks• Filtering is helpful: • Don’t worry about things CDNs cannot control• Produce more accurate estimates of perf.

Page 12: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

12

Network Problem at AS Level

• CDN make decision at AS level• Whether conn. in the same AS have

common network problem• For 7.1% of the ASes, half of conn. have

>10% packet loss rate• Network problems are significant at the AS

level

Page 13: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

13

Consistent Packet Loss of AS

• CDNs care about predictive value of measurement• Analyze the variance of average packet loss rates • Each epoch (1 min) has nonzero average loss rate• Loss rate is consistent across epochs

(standard deviation < mean)Analysis Length # of ASes with

Consistent Packet LossOne Week 377 / 2008One Day (Feb 21st) 122 / 739One Hour(Feb 21st 18:00~19:00)

19 / 121

Page 14: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

14

Conclusion & Future Work• Use TCP-level stats to detect performance

bottlenecks• Identify major bottlenecks for a production CDN• Discuss how to improve CDN’s operation with

our tool• Future Works• Automatic and real-time analysis combined into

CDN operation• Detect the problematic AS on the path• Combine TCP-level stats with application logs to

debug online services

Page 15: Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring

15

Thanks!

Questions?