a hybrid architecture for cost-effective on-demand media streaming

1

A Hybrid Architecture for Cost-Effective On-Demand Media Streaming

Mohamed Hefeeda & Bharat Bhargava

CS Dept, Purdue University

Support: NSF, CERIAS

FTDCS 2003

2

Imagine a media server streaming movies to clients- Target environments

• Distance learning, corporate streaming, … Current approaches

- Unicast• Centralized• Proxy caches• CDN (third-party)

- Multicast • Network layer • Application layer

Proposed approach (Peer-to-Peer)

Motivations

3

Unicast Streaming: Centralized

Pros - Easy to deploy, administer

Cons - Limited scalability

- Reliability concerns

- Load on the backbone

- High deployment cost $$$…..$

Note: - A server with T3 link (~45

Mb/s) supports only up to 45 concurrent users at 1Mb/s CBR!

4

Unicast streaming: Proxy

Pros- Better performance

- Less load on backbone and server

- Prefix caching, selective caching, staging, …

Cons - Proxy placement and

management

- High deployment cost $$$…..$

5

Unicast Streaming: CDN

Pros - Good performance - Suitable for web pages with

moderate-size objects Cons

- Co$t: CDN charges for every megabyte served!

- Not suitable for VoD service; movies are quite large (~Gbytes)

Note [Raczkowski’02]- Cost ranges from 0.25 to 2

cents/MByte - For a one-hour streamed to

1,000 clients, content provider pays $264 to CDN (at 0.5 cents/MByte)!

6

Multicast Streaming: Network Layer

Pros- Efficient!

- Asynchronous client

• Patching, skyscraper, …

• [e.g., Mahanti, et al. ToN’03]

Cons- Scalability of the routers

- Asynchronous tune to multiple channels

- Require high inbound bandwidth (2R)

- Not widely deployed

Patch stream

7

Multicast Streaming: Application Layer

Pros - Deployable

Cons- Assumes end systems can

support (outbound bandwidth) multiple folds of the streaming rate

8

P2P Approach: Key Ideas

Make use of underutilized peers’ resources

Make use of heterogeneity

Multiple peers serve a requesting peer

Network-aware peers organization

9

P2P Approach (cont’d)

Pros - Cost effective

- Deployable (no new hardware, nor network support)

- No extra inbound bandwidth (R)

- Small load on supplying peers (< R)

- Less stress on backbone

Challenges- Achieve good quality

• Unreliable, limited capacity, peers

• Highly dynamic environment

10

P2P Approach: Entities

Peer- Level of cooperation (rate,

bw, coonections) Super peer

- Help in searching and dispersion

Seed peer - Initiate the streaming

Stream - Time-ordered sequence of

packets Media file

- Divided into equal-size segments

Super Peers play special roles Hybrid system

11

Hybrid System: Issues

Peers Organization- Two-level peers clustering- Join, leave, failure, overhead, super peer selection

Operation- Client protocol: manage multiple suppliers

• Switch suppliers• Effect of switching on quality

Cluster-Based Searching- Find nearby suppliers

Cluster-Based Dispersion- Disseminate new media files

Details are in the extended version of the paper

12

Peers Organization: Two-Level Clustering

Based on BGP tables- [Oregon RouteViews]

Network-cluster [KW 00]- Peers sharing the same

network prefix

• 128.10.0.0/16 (purdue), 128.2.0.0/16 (cmu)

• 128.10.3.60, 128.10.7.22, 128.2.10.1, 128.2.11.43

AS-cluster- All network clusters within an

Autonomous System

Snapshot of a BGP routing table

13

Peers Organization: Join

Bootstrap data structure

Join

Join

Join

14

Client Protocol

Building blocks of the protocol to be run by a requesting peer

Three phases- Availability check

• Search for all segments with the full rate

- Streaming

• Stream (and playback) segment by segment

- Caching• Store enough copies of the media file in each cluster to

serve all expected requests from that cluster

• Which segments to cache?

15

Client Protocol (cont'd)

Phase II: Streaming

tj = tj-1 + δ /* δ: time to stream a segment */

For j = 1 to N do

At time tj, get segment sj as follows:

• Connect to every peer Px in PPjj (in parallel) and

• Download from byte bx-1 to bx-1

Note: bx = |sj| Rx/R

Example: P1, P2, and P3 serving different pieces of the same segment to P4 with different rates

16

Dispersion (cont'd)

Dispersion Algorithm (basic idea):- /* Upon getting a request from Py to cache Ny segments */

- C getCluster (Py)

- Compute available (A) and required (D) capacities in cluster C

- If A < D

• Py caches Ny segments in a cluster-wide round robin fashion (CWRR)

xx

CinP

xC u

N

N

R

R

TA

x

1

– All values are smoothed averages

– Average available capacity in C:

– CWRR Example: (10-segment file)

• P1 caches 4 segments: 1,2,3,4

• P2 then caches 7 segments: 5,6,7,8,9,10,1

17

Evaluation Through Simulation

Client parameters- Effect of switching on quality

System Parameters - Overall system capacity,

- Average waiting time,

- Load/Role on the seeding peer

- Scenarios: different client arrival patterns (constant rate, Poisson, flash crowd) and different levels of peer cooperation

Performance of the dispersion algorithm- Compare against random dispersion algorithm

18

Simulation: Topology

– Large, Hierarchical (more than 13,000 nodes) - Ex. 20 transit domains, 200 stub domains, 2,100 routers,

and a total of 11,052 end hosts

– Used GT-ITM and ns-2

19

Client: Effect of Switching

Initial buffering is needed to hide suppliers switching Tradeoff: small segment size small buffering but more

overhead (encoding, decoding, processing)

20

System: Capacity and Load on Seed Peer

Load on seeding peer

─ The role of the seeding peer is diminishing

• For 50%: After 5 hours, we have 100 concurrent clients (6.7 times original capacity) and none of them is served by the seeding peer

System capacity

21

System: Under Flash Crowd Arrivals

Flash crowd sudden increase in client arrivals

22

System: Under Flash Crowd (cont'd)

─ The role of the seeding peer is still just seeding

• During the peak, we have 400 concurrent clients (26.7 times original capacity) and none of them is served by the seeding server (50% caching)

Load on seeding peerSystem capacity

23

Dispersion Algorithm

─ Avg. number of hops:

• 8.05 hops (random), 6.82 hops (ours) 15.3% savings

─ For a domain with a 6-hop diameter:

• Random: 23% of the traffic was kept inside the domain

• Cluster-based: 44% of the traffic was kept inside the domain

5% caching

24

Conclusions

A hybrid model for on-demand media streaming- P2P streaming; Powerful peers do more work

- peers with special roles (super peers)

- Cost-effective, deployable- Supports large number of clients; including flash crowds

Two-level peers clustering - Network-conscious peers clustering- Helps in keeping the traffic local

Cluster-based dispersion pushes contents closer to clients (within the same domain)

- Reduces number of hops traversed by the stream and the load on the backbone

25

Thank You!

26

P2P Systems: Basic Definitions

Peers cooperate to achieve desired functions- Cooperate: share resources (CPU, storage, bandwidth),

participate in the protocols (routing, replication, …)

- Functions: file-sharing, distributed computing, communications, …

Examples- Gnutella, Napster, Freenet, OceanStore, CFS, CoopNet,

SpreadIt, SETI@HOME, …

Well, aren’t they just distributed systems? - P2P == distributed systems?

Background

27

P2P vs. Distributed Systems

P2P = distributed systems++; - Ad-hoc nature - Peers are not servers [Saroui et al., MMCN’02 ]

• Limited capacity and reliability

- Much more dynamism

- Scalability is a more serious issue (millions of nodes)

- Peers are self-interested (selfish!) entities • 70% of Gnutella users share nothing [Adar and Huberman ’00]

- All kind of Security concerns

• Privacy, anonymity, malicious peers, … you name it!

Background

28

P2P Systems: Rough Classification [Lv et al., ICS’02], [Yang et al., ICDCS’02]

Structured (or tightly controlled, DHT) + Files are rigidly assigned to specific nodes+ Efficient search & guarantee of finding– Lack of partial name and keyword queries• Ex.: Chord [Stoica et al., SIGCOMM’01], CAN

[Ratnasamy et al., SIGCOMM’01], Pastry [Rowstron and Druschel, Middleware’01]

Unstructured (or loosely controlled)+ Files can be anywhere+ Support of partial name and keyword queries– Inefficient search (some heuristics exist) & no

guarantee of finding• Ex.: Gnutella

Hybrid (P2P + centralized), super peers notion)- Napster, KazaA

Background

29

File-sharing vs. Streaming

File-sharing- Download the entire file first, then use it

- Small files (few Mbytes) short download time

- A file is stored by one peer one connection

- No timing constraints

Streaming- Consume (playback) as you download

- Large files (few Gbytes) long download time

- A file is stored by multiple peers several connections

- Timing is crucial

Background

30

Related Work: Streaming Approaches

Distributed caches [e.g., Chen and Tobagi, ToN’01 ]

- Deploy caches all over the place

- Yes, increases the scalability

• Shifts the bottleneck from the server to caches!

- But, it also multiplies cost

- What to cache? And where to put caches?

Multicast- Mainly for live media broadcast- Application level [Narada, NICE, Scattercast, Zigzag, … ]

- IP level [e.g., Dutta and Schulzrine, ICC’01]

• Widely deployed?

Background

31

Related Work: Streaming Approaches

P2P approaches- SpreadIt [Deshpande et al., Stanford TR’01]

• Live media Build application-level multicast distribution tree over peers

- CoopNet [Padmanabhan et al., NOSSDAV’02 and IPTPS’02]

• Live media Builds application-level multicast distribution tree over

peers

• On-demand Server redirects clients to other peers Assumes a peer can (or is willing to) support the full rate CoopNet does not address the issue of quickly

disseminating the media file

Background

a hybrid architecture for cost-effective on-demand media streaming

Documents

corporate streaming

peerunicast streaming

streaming ratep2p approach

media server streaming

streaming stream time

network layerprosefficient

network prefix128

peers rless stress