a scalable, adaptive, network-aware infrastructure for efficient content delivery yan chen ph.d....
TRANSCRIPT
A Scalable, Adaptive, Network-aware Infrastructure for Efficient
Content Delivery
Yan ChenPh.D. Status Talk
EECS DepartmentUC Berkeley
Motivation• The Internet has evolved to become a
commercial infrastructure for service delivery– Web delivery, VoIP, streaming media …
• Challenges for Internet-scale services– Scalability: 600M users, 35M Web sites, 28Tb/s– Efficiency: bandwidth, storage, management– Agility: dynamic clients/network/servers– Security, etc.
• Focus on content delivery - Content Distribution Network (CDN)– Totally 4 Billion Web pages, daily growth of 7M
pages– Annual growth of 200% for next 4 years
How CDN Works
New Challenges for CDN
• Large multimedia files ― Efficient replication
• Dynamic content ― Coherence support
• Network congestion/failures ― Scalable network monitoring
Existing CDNs Fail to Address these Challenges
Non-cooperative replication inefficient
No coherence for dynamic content
Unscalable network monitoring - O(M × N)
X
Provisioning (replica placement)
Network MonitoringCoherence Support
Ad hoc pair-wise monitoring O(M×N)
Tomography-based monitoring O(M+N)
Granularity
SCANPush
Existing CDNsPull
CooperativeNon-cooperative
Per
objectPer Website
Per
cluster
Access/Deployment Mechanisms
IP multicas
t
App-level multicast
Unicast
SCAN: Scalable Content Access Network
SCANCoherence for dynamic content
Cooperative clustering-based replication
s1, s4, s5
SCAN
X
Scalable network monitoring - O(M+N)
s1, s4, s5
Cooperative clustering-based replication
Coherence for dynamic content
Outline
• Introduction• Research Methodology• SCAN Mechanisms and Status
– Cooperative clustering-based replication– Coherence support– Scalable network monitoring
• Research Plan• Conclusions
Design and Evaluation of Internet-scale Systems
• Network topology• Web workload• Network end-to-end
latency measurement
Analytical evaluation
Algorithm design
Realistic simulation
iterate
Real evaluation?
Network Topology and Web Workload
• Network Topology– Pure-random, Waxman & transit-stub synthetic topology– An AS-level topology from 7 widely-dispersed BGP peers
• Web Workload
Web Site
Period Duration # Requests avg –min-max
# Clients avg –min-max
# Client groups avg –min-max
MSNBC Aug-Oct/1999 10–11am 1.5M–642K–1.7M 129K–69K–150K 15.6K-10K-17K
NASA Jul-Aug/1995 All day 79K-61K-101K 5940-4781-7671 2378-1784-3011
World Cup
May-Jul/1998 All day 29M – 1M – 73M 103K–13K–218K N/A
– Aggregate MSNBC Web clients with BGP prefix» BGP tables from a BBNPlanet router
– Aggregate NASA Web clients with domain names– Map the client groups onto the topology
Network E2E Latency Measurement
• NLANR Active Measurement Project data set– 111 sites on America, Asia, Australia and Europe
– Round-trip time (RTT) between every pair of hosts every minute
– 17M daily measurement
– Raw data: Jun. – Dec. 2001, Nov. 2002
• Keynote measurement data– Measure TCP performance from about 100 worldwide agents– Heterogeneous core network: various ISPs– Heterogeneous access network:
» Dial up 56K, DSL and high-bandwidth business connections– Targets
» 40 most popular Web servers + 27 Internet Data Centers– Raw data: Nov. – Dec. 2001, Mar. – May 2002
Outline
• Introduction• Research Methodology• SCAN Mechanisms and Status
– Cooperative clustering-based replication– Coherence support– Scalable network monitoring
• Research Plan• Conclusions
Cooperative Clustering-based Replication
• Cooperative push: only 4 - 5% replication/update cost compared with existing CDNs
• Clustering reduce the management/computational overhead by two orders of magnitude– Spatial clustering and popularity-based clustering
recommended• Incremental clustering to adapt to emerging
objects– Hyperlink-based online incremental clustering for high
availability and performance improvement– Offline incremental clustering performs close to optimal
• Publication– ICNP 2002– IEEE J-SAC 2003 (extended version)
Coherence Support• Leverage on DOLR, Tapestry• Dynamic replica placement• Self-organized replicas into app-level multicast
tree– Small delay and bandwidth consumption for update multicast– Each node only maintains states for its parent & direct
children
• Evaluated based on simulation of– Synthetic traces with various sensitivity analysis– Real traces from NASA and MSNBC
• Publication– IPTPS 2002– Pervasive Computing 2002
Network Distance Estimation
• Proposed Internet Iso-bar: a scalable overlay distance monitoring system
• Procedures1.Cluster hosts that perceive similar performance to
a small set of sites (landmarks)
2.For each cluster, select a monitor for active and continuous probing
3.Estimate distance between any pair of hosts using inter- and intra-cluster distance
End Host
Cluster ACluster
B
Cluster C
Landmark
Diagram of Internet Iso-bar
Cluster A
End Host
Cluster B
Monitor
Cluster C
Distance probes from monitor to its hosts
Distance probes among monitors
Landmark
Diagram of Internet Iso-bar
Internet Iso-bar
• Evaluated with NLANR AMP and Keynote data– 90% of relative error less than 0.5
» if 60ms latency, 45ms < prediction < 90ms
– Good stability for distance estimation
• Publications– ACM SIGMETRICS Performance Evaluation
Review (PER), September issue, 2002. – Journal of Computer Resource Management,
Computer Measurement Group, Spring Edition, 2002.
Outline
• Introduction• Research Methodology• SCAN Mechanisms and Status
– Cooperative clustering-based replication– Coherence support– Scalable network monitoring
• Research Plan• Conclusions
Research Plan
• Focus on congestion/failures estimation (4 months)– Apply topology information, e.g. lossy link
detection with network tomography– Cluster and choose monitors based on the lossy
links– Dynamic node join/leave for P2P systems– More comprehensive evaluation
» Simulate with large network» Deploy on PlanetLab, and operate at finer level
• Write up thesis (4 months)
Tomography-based Network Monitoring
• Observations– # of lossy links is small, dominate E2E loss– Loss rates are stable (in the order of hours ~ days)– Routing is stable (in the order of days)
• Identify the lossy links and only monitor a few paths to examine lossy links
• Make inference for other paths
End hostsRouters
Normal links
Lossy links
Conclusions• Cooperative, clustering-based replication
– Cooperative push: only 4 - 5% replication/update cost compared with existing CDNs
– Clustering reduce the management/computational overhead by two orders of magnitude
» Spatial clustering and popularity-based clustering recommended
– Incremental clustering to adapt to emerging objects» Hyperlink-based online incremental clustering for high
availability and performance improvement
• Self-organize replicas into app-level multicast tree for update dissemination
• Scalable overlay network monitoring– O(M+N) instead of O(M×N), given M client groups
and N servers
Backup Materials
SCANCoherence for dynamic content
Cooperative clustering-based replication
X
Scalable network monitoring O(M+N)
s1, s4, s5
Problem Formulation
• Subject to certain total replication cost (e.g., # of URL replicas)• Find a scalable, adaptive replication strategy to reduce avg access cost
CDN Applications (e.g. streaming media)
SCAN: Scalable Content Access Network
Provision: Cooperative Clustering-based Replication
User Behavior/Workload Monitoring
Coherence: Update Multicast Tree Construction
Network PerformanceMonitoring
Network Distance/ Congestion/ FailureEstimation
red: my work, black: out of scope
Evaluation of Internet-scale System
• Analytical evaluation• Realistic simulation
– Network topology– Web workload– Network end-to-end latency measurement
• Network topology– Pure-random, Waxman & transit-stub synthetic
topology– A real AS-level topology from 7 widely-dispersed
BGP peers
Web Workload
Web Site
Period Duration # Requests avg –min-max
# Clients avg –min-max
# Client groups avg –min-max
MSNBC Aug-Oct/1999 10–11am 1.5M–642K–1.7M 129K–69K–150K 15.6K-10K-17K
NASA Jul-Aug/1995 All day 79K-61K-101K 5940-4781-7671 2378-1784-3011
World Cup
May-Jul/1998 All day 29M – 1M – 73M 103K–13K–218K N/A
• Aggregate MSNBC Web clients with BGP prefix– BGP tables from a BBNPlanet router
• Aggregate NASA Web clients with domain names• Map the client groups onto the topology
Simulation Methodology
• Network Topology– Pure-random, Waxman & transit-stub synthetic topology– An AS-level topology from 7 widely-dispersed BGP peers
• Web Workload
Web Site
Period Duration # Requests avg –min-max
# Clients avg –min-max
# Client groups avg –min-max
MSNBC Aug-Oct/1999 10–11am 1.5M–642K–1.7M 129K–69K–150K 15.6K-10K-17K
NASA Jul-Aug/1995 All day 79K-61K-101K 5940-4781-7671 2378-1784-3011
– Aggregate MSNBC Web clients with BGP prefix» BGP tables from a BBNPlanet router
– Aggregate NASA Web clients with domain names– Map the client groups onto the topology
Online Incremental Clustering
• Predict access patterns based on semantics• Simplify to popularity prediction • Groups of URLs with similar popularity? Use
hyperlink structures!– Groups of siblings– Groups of the same hyperlink depth: smallest #
of links from root
Challenges for CDN
• Over-provisioning for replication– Provide good QoS to clients (e.g., latency bound, coherence)– Small # of replicas with small delay and bandwidth
consumption for update
• Replica Management– Scalability: billions of replicas if replicating in URL
» O(104) URLs/server, O(105) CDN edge servers in O(103) networks
– Adaptation to dynamics of content providers and customers
• Monitoring– User workload monitoring – End-to-end network distance/congestion/failures monitoring
» Measurement scalability» Inference accuracy and stability
SCAN Architecture• Leverage Decentralized Object Location and Routing
(DOLR) - Tapestry for– Distributed, scalable location with guaranteed success– Search with locality
• Soft state maintenance of dissemination tree (for each object)
data plane
network plane
datasource
Web server
SCAN server
client
replica
always update
adaptivecoherence
cache
Tapestry mesh
Request Location
Dynamic Replication/Update
and Content Management
Cluster A
Clients
Cluster B
Monitors
Cluster C
Distance measured from a host to its monitor
Distance measured among monitors
SCAN edge servers
Wide-area Network Measurement and Monitoring
System (WNMMS)• Select a subset of SCAN servers to be monitors• E2E estimation for
• Distance• Congestion• Failures
network plane
Dynamic Provisioning
• Dynamic replica placement– Meeting clients’ latency and servers’ capacity constraints– Close-to-minimal # of replicas
• Self-organized replicas into app-level multicast tree– Small delay and bandwidth consumption for update
multicast– Each node only maintains states for its parent & direct
children
• Evaluated based on simulation of– Synthetic traces with various sensitivity analysis– Real traces from NASA and MSNBC
• Publication– IPTPS 2002– Pervasive Computing 2002
Effects of the Non-Uniform Size of URLs
• Replication cost constraint : bytes• Similar trends exist
– Per URL replication outperforms per Website dramatically – Spatial clustering with Euclidean distance and popularity-
based clustering are very cost-effective
1
2
3
4
End Host
Cluster ACluster
B
Cluster C
Landmark
Diagram of Internet Iso-bar
Cluster A
End Host
Cluster B
Monitor
Cluster C
Distance probes from monitor to its hosts
Distance probes among monitors
Landmark
Diagram of Internet Iso-bar
Real Internet Measurement Data
• NLANR Active Measurement Project data set
– 119 sites on US (106 after filtering out most offline sites)
– Round-trip time (RTT) between every pair of hosts every minute
– Raw data: 6/24/00 – 12/3/01
• Keynote measurement data– Measure TCP performance from about 100 agents– Heterogeneous core network: various ISPs– Heterogeneous access network:
» Dial up 56K, DSL and high-bandwidth business connections
– Targets» Web site perspective: 40 most popular Web servers» 27 Internet Data Centers (IDCs)
Related Work
• Internet content delivery systems– Web caching
» Client-initiated » Server-initiated
– Pull-based Content Delivery Networks (CDNs)– Push-based CDNs
• Update dissemination– IP multicast – Application-level multicast
• Network E2E Distance Monitoring Systems
Client
Local DNS server
Proxy cache server
Web content server
Client
Local DNS server
Proxy cache server
1.GET request
4. Response
2.GET request if cache miss
3. Response
ISP 2
ISP 1
Web Proxy Caching
CDN name server
Client
Local DNS server
Local CDN server
1. G
ET r
equest
4. lo
cal C
DN
serv
er
IP
addre
ss
Web content server
Client
Local DNS server
Local CDN server
2. Request for hostname resolution
3. Reply: local CDN server IP
address
5.GET request
8. Response6.GET request if cache miss
7. Response
ISP 2
Pull-based CDN
ISP 1
CDN name server
Client
Local DNS server
Local CDN server
1. G
ET r
equest
4. R
edir
ect
ed
serv
er
IP
addre
ss
Web content server
Client
Local DNS server
Local CDN server
2. Request for hostname resolution
3. Reply: nearby replica server or
Web server IP address
ISP 2
Push-based CDN
ISP 1
0. P
ush
repl
icas
5.GET request
6. Response
6. Response
5.GET request if no replica yet
Internet Content Delivery Systems
Scalability for request redirection
Pre-configured in browser
Use Bloom filter to exchange replica locations
Centralized CDN name server
Centralized CDN name server
Decentra-lized P2P location
Properties Web caching (client initiated)
Web caching (server initiated)
Pull-based CDNs (Akamai)
Push-based CDNs
SCAN
Efficiency (# of caches or replicas)
No cache sharing among proxies
Cache sharing
No replica sharing among edge servers
Replica sharing
Replica sharing
Network- awareness
No No Yes, unscalable monitoring system
No Yes, scalable monitoring system
Coherence support
No No Yes No Yes
Previous Work: Update Dissemination
• No inter-domain IP multicast• Application-level multicast (ALM) unscalable
– Root maintains states for all children (Narada, Overcast, ALMI, RMX)
– Root handles all “join” requests (Bayeux)
– Root split is common solution, but suffers consistency overhead
Design Principles• Scalability
– No centralized point of control: P2P location services, Tapestry
– Reduce management states: minimize # of replicas, object clustering
– Distributed load balancing: capacity constraints
• Adaptation to clients’ dynamics– Dynamic distribution/deletion of replicas with regarding
to clients’ QoS constraints– Incremental clustering
• Network-awareness and fault-tolerance (WNMMS)– Distance estimation: Internet Iso-bar– Anomaly detection and diagnostics
Comparison of Content Delivery Systems (cont’d)
Properties Web caching (client initiated)
Web caching (server initiated)
Pull-based CDNs (Akamai)
Push-based CDNs
SCAN
Distributed load balancing
No Yes Yes No Yes
Dynamic replica placement
Yes Yes Yes No Yes
Network- awareness
No No Yes, unscalable monitoring system
No Yes, scalable monitoring system
No global network topology assumption
Yes Yes Yes No Yes
Network-awareness (cont’d)
• Loss/congestion prediction– Maximize the true positive and minimize the false
positive
• Orthogonal loss/congestion paths discovery– Without underlying topology
– How stable is such orthogonality?» Degradation of orthogonality over time
• Reactive and proactive adaptation for SCAN
)()()()(
)()(),(1),(
2222 BCEBCEACEACE
BCEACEBCACEBCACityorthogonal