a measurement study of a peer-to-peer video-on-demand system

A Measurement Study of a Peer-to-Peer Video-on-Demand System

Bin Cheng1, Xuezheng Liu2, Zheng Zhang2 and Hai Jin1

1Huazhong University of Science and Technology 2Microsoft Research Asia

IPTPS 2007, Feb. 28 2007

Motivation

VoD is every coach potato’s dream• Select anything, start at any time, jump to anywhere

Centralized VoD is costly• Servers, bandwidth, contents ()

P2P VoD is attractive, but challenging:• Harder than streaming: no single stream; unpredictable, multiple “swarms”

• Harder than file downloading: globally optimal (e.g. “rarest first”) policy inapplicable

• VoD is a superset of file downloading and streaming

Main Contribution

Detailed measurement of a real, deployed P2P VoD system • What do we measure?

E.g. What does it mean that a system delivers good UX?

• How far off are we from an ideal system?

• How does users behave?

• Etc. Etc…

Problems spotted• There is a great tension between scalability and UX

• Network heterogeneity is an issue Is P2P VoD a luxury that poor peers cannot afford?

Outline

Motivation System background: GridCast Measurement methodology Evaluation

• Overall performance• User behavior and UXexperience

Conclusions

GridCast Overview

Tracker server• Index all joined peers

Source server• Stores a copy for every video file

Web portal• Provide channel list

Peer• Feed data to player• Cache all fetched data of the

current file• Exchange data with others

channel list

Initial neighbor list

source

web

tracker

One Overlay per Channel

Finding the partners• Get the initial content-closer set from the tracker when joining• Periodically gossip with some near- & far-neighbors (30s)• Look up new near-neighbors from the current neighbors when seeking• Refresh the tracker every 5minutes

t

Scheduling (every 10s)Current position

Fetch the next 200 seconds from partners (if they have them)

Feed to the player

Fetch the next 10 seconds from the source server if no partners have them

If bandwidth budget allows, fetch the rarest anchor from the source server or partners

next 200 seconds

next 10 seconds

Anchor Prefetching

Anchors are used to improve seek latency• Each anchor is a segment of 10 seconds

• Anchors are 5 minutes apart

• Playhead adjusted to the nearest anchor (if present)

5 Minutes10s

DataSet Summary

Log duration Sept. & Oct. 2006

Number of visited users About 20,000

Percent of CERNET users 98%

Percent of no-CERNET users Netcom: 1% Unicom: 0.6% Unicom: 0.4%

Percent of NAT users 22.8%

Maximal online users More than 360

Number of sessions About 250,000

Number of videos About 1,200 channels

Average Code rate 500~600kbps

Movie length Mostly about one hour

Total bytes from the source server 11,420GB

Total bytes played by peers 15,083GB

System Setup

GridCast was deployed since May 2006• The tracker server and the Web server share one machine

• One source server with 100Mb, 2GB Memory and 1 TB disk

Popularity keeps on climbing up; in Dec 2006 –• Users : 91K; sessions: 290K; total bytes from server: 22TB

Peer logs collected at the tracker (30s)• Latency, jitter, buffer map and anchor usage

• Sep-log and Oct-log w/o and w/ log, respectively Just a matter of switch the codepath as the peer joins in

The source server keeps other statistics (e.g. total bytes served)

Strong Diurnal Pattern

Hot time vs. cold time• Hot time (10:00 ~24:00)

• Cold time (0:00 ~ 10:00)

Two peaks • After lunch time & before midnight

• Higher at weekends or holidays

Mon Tue Wed Thu Fri Sat Sun0

50

100

150

200

250

300

350

400

num

ber

of o

nlin

e pe

ers

date (Oct. 2006)

Scalability

Ideal model: only the lead peer fetches from the source server cs model: all data from the source server

0

10

20

30

40

50

60

70

80

90

100

6:0022:0013:00

norm

aliz

ed lo

ad

load of the source server in a typical day

cs GridCast ideal

6:00

Significantly decreases the source server load (against cs), especially in hot time.

Follows quite closely the ideal curve.

# of active channel increase 3x from cold to hot – the long tail effect!

Understand the Ceiling

2 4 6 8 10 12 14 16 18 20 220

10

20

30

40

50

60

70

80

90

100

util

izatio

n(%

)

popularity

ideal GridCast

Utilization = data from peers / total fetched data • Calculated from the snapshots

For the ideal model, utilization = (n-1)/n• n is # of users in a session; or concurrency

GridCast achieves the ideal when n is large

Why?

Why do we fall short (when n is small)

The peer cannot get the content if:• It’s only available from the server (missing content); caused by random seeks• It exists in disconnected peers; caused by NAT• Its partners do not have enough bandwidth

2 3 4 5 6 7 80

20

40

60

80

100

utili

zatio

n (%

)

popularity

Missing content NAT Limited Bandwidth GridCast

missing content dominates for those unpopular files

UX: latency

Startup Latency ( 70 ％ < 5s, 90 ％ < 10s ) Seek latency ( 70% < 3.5s, 90% < 8s ) Seek latency is smaller:

• There is a 2-second delay to create TCP connections with initial partners• Short seeks hit cached data

0 4 8 12 16 20 24 280

10

20

30

40

50

60

70

80

90

100

110

CD

F(%

)

latency (sec.)

seek startup

UX: jitter

For sessions with 5 minutes, 72.3% has not any jitter For sessions with 40 minutes, 40.6% has not any jitter Avg. delayed data: 3~4%

0

20

40

60

80

100

120

140

160

72.3

54.749.7 47.8

43.2 41.7 42.044.9

40.6

7.4 6.2 4.3 5.0 3.8 3.5 3.4 3.0 3.29

2632

52 5057

66 67

124

0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 >400

10

20

30

40

50

60

70

80

90

100

%

duration(minutes)

no jitter percent average delayed data percent average delayed chunks number

dela

yed

chun

ks n

umbe

r

Reasons for Bad UX Network capacity

• CERNET to CERNET: >100KB/s• Non-CERNET to Non-CERNET: 20~50KB/s• CERNET to Non-CERNET: 4-5KB/s• Bad UX in Non-CERNET region might have prevented swarm to form

52.6

5.2 4.6

60.2

3.6 3.4

Non-CERNET CERNET Campus0

10

20

30

40

50

60

70

aver

age

late

ncy

(sec

.)

network type

startup seek

Reasons for Bad UX (cont.) Server stress and UX is inversely correlated

• Hot time -> lots of active channels -> long tail -> high server stress -> bad UX• Most pronounced for movies at the tail (next slide)

7 9 11 13 15 17 19 21 23 1 3 50

20

40

60

80

100

120

norm

aliz

ed v

alue

time( hour )

server stress(bandwidth) unacceptable jitter unacceptable seeking

UX Correlation with Concurrency

Higher concurrency:• Reduces both startup and seek latencies• Reduces amount of jitters

Getting close to that of cold time

0 2 4 6 8 10

1

2

3

4

5

6

7

8

aver

age

late

ncy

(sec

.)

initial partner number

startup seek

0 2 4 6 8 100

4

8

12

16

20

24

28

32

unac

cept

able

per

cent

age(

%)

popularity

unacceptable jitter(in hot time) unacceptable seeking(in hot time) unacceptable jitter(in cold time) unacceptable seeking(in cold time)

User Seek Behavior

Seek behavior (Without anchor)

-60 -50 -40 -30 -20 -10 0 10 20 30 40 50 600

10

20

30

40

50

60

70

80

90

100

CD

F(%

)

seek distance (minutes)

FORWARD

BACKWARD

Short seeks dominate(80% within 500seconds)

BACKWORAD:FORWARD~= 3:7

Seek Behavior vs. Popularity

Fewer seeks in more popular channels More popular channels usually have longer sessions So: stop making bad movies

0 2 4 6 8 10 12

3.2

3.6

4.0

4.4 seek duration

popularity (for file sessions)

aver

age

num

ber

of s

eeks

1500

2000

2500

3000

3500

4000 duration of sessions (seconds)

Benefit of Anchor Prefetching Significant reduction of seek latency

• FORWARD seeks get more benefit (seeks < 1s jump from 33% to 63%)

“next-anchor first” is statistically optimal from any one peer’s point of view• “rarest-first” is globally optimal in reducing the load of the source server (sees

30% prefetched but unused

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 3010

20

30

40

50

60

70

80

90

100

CD

F(%

)

a) forward seek latency (seconds)

with anchor without anchor

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 3050

60

70

80

90

100

CD

F(%

)

b) backward seek latency (seconds)

with anchor without anchor

Conclusions A few things are not new:

• Diurnal pattern; the looooooooong tail of content

A few things are new:• Seeking behaviors (e.g. 7:3 split of forward/backward seeks; 80% seeks are

short etc.)• The correlation of UX to source server stress and concurrency

A few things are good to know:• Even moderate concurrency improves system utilization and UX• Simple prefetching helps to improve seeking performance

A few things remain to be problematic• The looooooong tail• Network heterogeneity

A lot remain to be done (and are being done)• Multi-file caching and proactive replication

http://grid.hust.edu.cn/gridcast http://www.gridcast.cn

Thank you!Q&A

http://grid.hust.edu.cn/gridcast

http://www.gridcast.cn/

a measurement study of a peer-to-peer video-on-demand system

Documents

machineone source server

web server

peer joins inthe source

p2p vod system

anchor usageseplog

rarest anchor

source server11

issueis p2p vod