a scalable, adaptive, network-aware infrastructure for efficient content delivery yan chen ph.d....

A Scalable, Adaptive, Network-aware Infrastructure for Efficient

Content Delivery

Yan ChenPh.D. Status Talk

EECS DepartmentUC Berkeley

Motivation• The Internet has evolved to become a

commercial infrastructure for service delivery– Web delivery, VoIP, streaming media …

• Challenges for Internet-scale services– Scalability: 600M users, 35M Web sites, 28Tb/s– Efficiency: bandwidth, storage, management– Agility: dynamic clients/network/servers– Security, etc.

• Focus on content delivery - Content Distribution Network (CDN)– Totally 4 Billion Web pages, daily growth of 7M

pages– Annual growth of 200% for next 4 years

How CDN Works

New Challenges for CDN

• Large multimedia files ― Efficient replication

• Dynamic content ― Coherence support

• Network congestion/failures ― Scalable network monitoring

Existing CDNs Fail to Address these Challenges

Non-cooperative replication inefficient

No coherence for dynamic content

Unscalable network monitoring - O(M × N)

X

Provisioning (replica placement)

Network MonitoringCoherence Support

Ad hoc pair-wise monitoring O(M×N)

Tomography-based monitoring O(M+N)

Granularity

SCANPush

Existing CDNsPull

CooperativeNon-cooperative

Per

objectPer Website

Per

cluster

Access/Deployment Mechanisms

IP multicas

t

App-level multicast

Unicast

SCAN: Scalable Content Access Network

SCANCoherence for dynamic content

Cooperative clustering-based replication

s1, s4, s5

SCAN

X

Scalable network monitoring - O(M+N)

s1, s4, s5


Coherence for dynamic content

Outline

• Introduction• Research Methodology• SCAN Mechanisms and Status

– Cooperative clustering-based replication– Coherence support– Scalable network monitoring

• Research Plan• Conclusions

Design and Evaluation of Internet-scale Systems

• Network topology• Web workload• Network end-to-end

latency measurement

Analytical evaluation

Algorithm design

Realistic simulation

iterate

Real evaluation?

Network Topology and Web Workload

• Network Topology– Pure-random, Waxman & transit-stub synthetic topology– An AS-level topology from 7 widely-dispersed BGP peers

• Web Workload

Web Site

Period Duration # Requests avg –min-max

# Clients avg –min-max

# Client groups avg –min-max

MSNBC Aug-Oct/1999 10–11am 1.5M–642K–1.7M 129K–69K–150K 15.6K-10K-17K

NASA Jul-Aug/1995 All day 79K-61K-101K 5940-4781-7671 2378-1784-3011

World Cup

May-Jul/1998 All day 29M – 1M – 73M 103K–13K–218K N/A

– Aggregate MSNBC Web clients with BGP prefix» BGP tables from a BBNPlanet router

– Aggregate NASA Web clients with domain names– Map the client groups onto the topology

Network E2E Latency Measurement

• NLANR Active Measurement Project data set– 111 sites on America, Asia, Australia and Europe

– Round-trip time (RTT) between every pair of hosts every minute

– 17M daily measurement

– Raw data: Jun. – Dec. 2001, Nov. 2002

• Keynote measurement data– Measure TCP performance from about 100 worldwide agents– Heterogeneous core network: various ISPs– Heterogeneous access network:

» Dial up 56K, DSL and high-bandwidth business connections– Targets

» 40 most popular Web servers + 27 Internet Data Centers– Raw data: Nov. – Dec. 2001, Mar. – May 2002

Outline




Cooperative Clustering-based Replication

• Cooperative push: only 4 - 5% replication/update cost compared with existing CDNs

• Clustering reduce the management/computational overhead by two orders of magnitude– Spatial clustering and popularity-based clustering

recommended• Incremental clustering to adapt to emerging

objects– Hyperlink-based online incremental clustering for high

availability and performance improvement– Offline incremental clustering performs close to optimal

• Publication– ICNP 2002– IEEE J-SAC 2003 (extended version)

Coherence Support• Leverage on DOLR, Tapestry• Dynamic replica placement• Self-organized replicas into app-level multicast

tree– Small delay and bandwidth consumption for update multicast– Each node only maintains states for its parent & direct

children

• Evaluated based on simulation of– Synthetic traces with various sensitivity analysis– Real traces from NASA and MSNBC

• Publication– IPTPS 2002– Pervasive Computing 2002

Network Distance Estimation

• Proposed Internet Iso-bar: a scalable overlay distance monitoring system

• Procedures1.Cluster hosts that perceive similar performance to

a small set of sites (landmarks)

2.For each cluster, select a monitor for active and continuous probing

3.Estimate distance between any pair of hosts using inter- and intra-cluster distance

End Host

Cluster ACluster

B

Cluster C

Landmark

Diagram of Internet Iso-bar

Cluster A

End Host

Cluster B

Monitor

Cluster C

Distance probes from monitor to its hosts

Distance probes among monitors

Landmark


Internet Iso-bar

• Evaluated with NLANR AMP and Keynote data– 90% of relative error less than 0.5

» if 60ms latency, 45ms < prediction < 90ms

– Good stability for distance estimation

• Publications– ACM SIGMETRICS Performance Evaluation

Review (PER), September issue, 2002. – Journal of Computer Resource Management,

Computer Measurement Group, Spring Edition, 2002.

Outline




Research Plan

• Focus on congestion/failures estimation (4 months)– Apply topology information, e.g. lossy link

detection with network tomography– Cluster and choose monitors based on the lossy

links– Dynamic node join/leave for P2P systems– More comprehensive evaluation

» Simulate with large network» Deploy on PlanetLab, and operate at finer level

• Write up thesis (4 months)

Tomography-based Network Monitoring

• Observations– # of lossy links is small, dominate E2E loss– Loss rates are stable (in the order of hours ~ days)– Routing is stable (in the order of days)

• Identify the lossy links and only monitor a few paths to examine lossy links

• Make inference for other paths

End hostsRouters

Normal links

Lossy links

Conclusions• Cooperative, clustering-based replication

– Cooperative push: only 4 - 5% replication/update cost compared with existing CDNs

– Clustering reduce the management/computational overhead by two orders of magnitude

» Spatial clustering and popularity-based clustering recommended

– Incremental clustering to adapt to emerging objects» Hyperlink-based online incremental clustering for high

availability and performance improvement

• Self-organize replicas into app-level multicast tree for update dissemination

• Scalable overlay network monitoring– O(M+N) instead of O(M×N), given M client groups

and N servers

Backup Materials

SCANCoherence for dynamic content


X

Scalable network monitoring O(M+N)

s1, s4, s5

Problem Formulation

• Subject to certain total replication cost (e.g., # of URL replicas)• Find a scalable, adaptive replication strategy to reduce avg access cost

CDN Applications (e.g. streaming media)

SCAN: Scalable Content Access Network

Provision: Cooperative Clustering-based Replication

User Behavior/Workload Monitoring

Coherence: Update Multicast Tree Construction

Network PerformanceMonitoring

Network Distance/ Congestion/ FailureEstimation

red: my work, black: out of scope

Evaluation of Internet-scale System

• Analytical evaluation• Realistic simulation

– Network topology– Web workload– Network end-to-end latency measurement

• Network topology– Pure-random, Waxman & transit-stub synthetic

topology– A real AS-level topology from 7 widely-dispersed

BGP peers

Web Workload

Web Site






World Cup

May-Jul/1998 All day 29M – 1M – 73M 103K–13K–218K N/A

• Aggregate MSNBC Web clients with BGP prefix– BGP tables from a BBNPlanet router

• Aggregate NASA Web clients with domain names• Map the client groups onto the topology

Simulation Methodology

• Network Topology– Pure-random, Waxman & transit-stub synthetic topology– An AS-level topology from 7 widely-dispersed BGP peers

• Web Workload

Web Site






– Aggregate MSNBC Web clients with BGP prefix» BGP tables from a BBNPlanet router

– Aggregate NASA Web clients with domain names– Map the client groups onto the topology

Online Incremental Clustering

• Predict access patterns based on semantics• Simplify to popularity prediction • Groups of URLs with similar popularity? Use

hyperlink structures!– Groups of siblings– Groups of the same hyperlink depth: smallest #

of links from root

Challenges for CDN

• Over-provisioning for replication– Provide good QoS to clients (e.g., latency bound, coherence)– Small # of replicas with small delay and bandwidth

consumption for update

• Replica Management– Scalability: billions of replicas if replicating in URL

» O(104) URLs/server, O(105) CDN edge servers in O(103) networks

– Adaptation to dynamics of content providers and customers

• Monitoring– User workload monitoring – End-to-end network distance/congestion/failures monitoring

» Measurement scalability» Inference accuracy and stability

SCAN Architecture• Leverage Decentralized Object Location and Routing

(DOLR) - Tapestry for– Distributed, scalable location with guaranteed success– Search with locality

• Soft state maintenance of dissemination tree (for each object)

data plane

network plane

datasource

Web server

SCAN server

client

replica

always update

adaptivecoherence

cache

Tapestry mesh

Request Location

Dynamic Replication/Update

and Content Management

Cluster A

Clients

Cluster B

Monitors

Cluster C

Distance measured from a host to its monitor

Distance measured among monitors

SCAN edge servers

Wide-area Network Measurement and Monitoring

System (WNMMS)• Select a subset of SCAN servers to be monitors• E2E estimation for

• Distance• Congestion• Failures

network plane

Dynamic Provisioning

• Dynamic replica placement– Meeting clients’ latency and servers’ capacity constraints– Close-to-minimal # of replicas

• Self-organized replicas into app-level multicast tree– Small delay and bandwidth consumption for update

multicast– Each node only maintains states for its parent & direct

children

• Evaluated based on simulation of– Synthetic traces with various sensitivity analysis– Real traces from NASA and MSNBC

• Publication– IPTPS 2002– Pervasive Computing 2002

Effects of the Non-Uniform Size of URLs

• Replication cost constraint : bytes• Similar trends exist

– Per URL replication outperforms per Website dramatically – Spatial clustering with Euclidean distance and popularity-

based clustering are very cost-effective

1

2

3

4

End Host

Cluster ACluster

B

Cluster C

Landmark


Cluster A

End Host

Cluster B

Monitor

Cluster C

Distance probes from monitor to its hosts

Distance probes among monitors

Landmark


Real Internet Measurement Data

• NLANR Active Measurement Project data set

– 119 sites on US (106 after filtering out most offline sites)

– Round-trip time (RTT) between every pair of hosts every minute

– Raw data: 6/24/00 – 12/3/01

• Keynote measurement data– Measure TCP performance from about 100 agents– Heterogeneous core network: various ISPs– Heterogeneous access network:

» Dial up 56K, DSL and high-bandwidth business connections

– Targets» Web site perspective: 40 most popular Web servers» 27 Internet Data Centers (IDCs)

Related Work

• Internet content delivery systems– Web caching

» Client-initiated » Server-initiated

– Pull-based Content Delivery Networks (CDNs)– Push-based CDNs

• Update dissemination– IP multicast – Application-level multicast

• Network E2E Distance Monitoring Systems

Client

Local DNS server

Proxy cache server

Web content server

Client

Local DNS server

Proxy cache server

1.GET request

4. Response

2.GET request if cache miss

3. Response

ISP 2

ISP 1

Web Proxy Caching

CDN name server

Client

Local DNS server

Local CDN server

1. G

ET r

equest

4. lo

cal C

DN

serv

er

IP

addre

ss

Web content server

Client

Local DNS server

Local CDN server

2. Request for hostname resolution

3. Reply: local CDN server IP

address

5.GET request

8. Response6.GET request if cache miss

7. Response

ISP 2

Pull-based CDN

ISP 1

CDN name server

Client

Local DNS server

Local CDN server

1. G

ET r

equest

4. R

edir

ect

ed

serv

er

IP

addre

ss

Web content server

Client

Local DNS server

Local CDN server

2. Request for hostname resolution

3. Reply: nearby replica server or

Web server IP address

ISP 2

Push-based CDN

ISP 1

0. P

ush

repl

icas

5.GET request

6. Response

6. Response

5.GET request if no replica yet

Internet Content Delivery Systems

Scalability for request redirection

Pre-configured in browser

Use Bloom filter to exchange replica locations

Centralized CDN name server

Centralized CDN name server

Decentra-lized P2P location

Properties Web caching (client initiated)

Web caching (server initiated)

Pull-based CDNs (Akamai)

Push-based CDNs

SCAN

Efficiency (# of caches or replicas)

No cache sharing among proxies

Cache sharing

No replica sharing among edge servers

Replica sharing

Replica sharing

Network- awareness

No No Yes, unscalable monitoring system

No Yes, scalable monitoring system

Coherence support

No No Yes No Yes

Previous Work: Update Dissemination

• No inter-domain IP multicast• Application-level multicast (ALM) unscalable

– Root maintains states for all children (Narada, Overcast, ALMI, RMX)

– Root handles all “join” requests (Bayeux)

– Root split is common solution, but suffers consistency overhead

Design Principles• Scalability

– No centralized point of control: P2P location services, Tapestry

– Reduce management states: minimize # of replicas, object clustering

– Distributed load balancing: capacity constraints

• Adaptation to clients’ dynamics– Dynamic distribution/deletion of replicas with regarding

to clients’ QoS constraints– Incremental clustering

• Network-awareness and fault-tolerance (WNMMS)– Distance estimation: Internet Iso-bar– Anomaly detection and diagnostics

Comparison of Content Delivery Systems (cont’d)

Properties Web caching (client initiated)

Web caching (server initiated)

Pull-based CDNs (Akamai)

Push-based CDNs

SCAN

Distributed load balancing

No Yes Yes No Yes

Dynamic replica placement

Yes Yes Yes No Yes

Network- awareness

No No Yes, unscalable monitoring system

No Yes, scalable monitoring system

No global network topology assumption

Yes Yes Yes No Yes

Network-awareness (cont’d)

• Loss/congestion prediction– Maximize the true positive and minimize the false

positive

• Orthogonal loss/congestion paths discovery– Without underlying topology

– How stable is such orthogonality?» Degradation of orthogonality over time

• Reactive and proactive adaptation for SCAN

)()()()(

)()(),(1),(

2222 BCEBCEACEACE

BCEACEBCACEBCACityorthogonal

a scalable, adaptive, network-aware infrastructure for efficient content delivery yan chen ph.d....

Documents

networkaware infrastructure

web sites

web pages

internet data centersraw

evaluation of internet

popular web servers

level topology

bandwidth consumption