challenges, design and analysis of a large-scale p2p-vod system dr. yingwu zhu

33
Challenges, Design and Analysis of a Large-scale P2P-VoD System Dr. Yingwu Zhu

Upload: arthur-butler

Post on 24-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Challenges, Design and Analysis of a Large-scale P2P-VoD System

Dr. Yingwu Zhu

P2P Overview• Advantage

– Reduce server load by having peers contributing their resources

– Scalability, availability, robustness, …• P2P services

– P2P file downloading : BitTorrent and Emule– P2P live streaming : Coolstreaming and PPLive– P2P video-on-demand (P2P-VoD) : PPLive

• Unlike P2P streaming systems, P2P-VoD systems lacks synchrony, as peers can watch different parts of a video at the same time.

• Require each user to contribute a small amount of storage (usually 1GB) instead of only the playback buffer in memory as in the P2P streaming system

Challenges in P2P VoD

• Lack of synchrony in user behavior• Content replication to provide movie data for

other peers – Data availability by caching at peers– ATD: Availability to Demand Ratio– Cache replacement

• Transmission strategy to deliver data in real time– Satisfaction level– Playrate

PPLive-VoD

• Architecture• Design considerations• Measurement study

Architecture• Major components

– Peers – Content Servers : the source of content– Trackers : help peers connect to other peers to

share the same content– A bootstrap server : helps peers to find a suitable

tracker and to perform other bootstrapping functions

– Other servers • log servers : log significant events for data

measurement• transit servers : help peers behind NAT boxes

Design Decisions

• Segment sizes• Content replication strategy• Content Discovery• Piece Selection• Transmission Strategy• Other issues

Segment sizes• How to divide a video into multiple pieces

– Small segment size gives more flexibility to schedule which piece should be uploaded from which neighboring peer.

– The larger the segment size the smaller the overhead.• Header overhead (checksum)

• Bitmap overhead (advertisement)

• Protocol overhead (TCP/IP header)

– The video player expects a certain minimum size for a piece of content to be viewable (playback rate: frame rate).

• Segmentation of a movie in PPLive’s VoD system

Replication Strategy

• Goal– To make the chunks as available to the user population as

possible to meet users’ viewing demand while without incurring excessive additional overheads

• Considerations– Whether to allow multiple movies be cached

• Multiple movie cache (MVC) / single movie cache (SVC)

– Whether to pre-fetch or not (NO! due to short view duration)

– Cache replacement: Which chunk/movie to remove when the disk cache is full (movie-based replacement)

• Least recently used (LRU) / least frequently used (LFU)• Weight-based replacement (How complete the movie is

cached locally; and ATD = c/n from tracker)• Reduce sever load from 19% to 11-7%

Content Discovery• Content advertising and look-up methods

– Trackers• keep track of which peers replicate a given movie• As soon as a user starts watching a movie, the peer informs its

tracker that it is replicating that movie. Also inform tracker of content removal

• When a peer wants to start watching movie, it goes to the tracker to find out which other peers have that movie.

– Gossip method• Discovering where chunks are is by the gossiping chunk bitmap.• This cuts down on the reliance on the tracker, and makes the

system more robust.– DHT

• Used to automatically assign movies to trackers to achieve some level of load balancing.

Piece Selection• Which piece to download first

– Sequential : select the piece that is closest to what is needed for the video playback

• First priority– Rarest first : selecting the rarest piece helps speeding up

the spread of pieces, hence indirectly helps streaming quality.

• Second priority– Anchor-based : when a user tries to jump to a particular

location in the movie, if the piece for that location is missing then the closest anchor point is used instead.

• Currently not used• User do not jump much, 1.8 times/movie• Optimizing transmission alg., buffering time after a jump is

satisfactory

Transmission Strategy• Goals

– Maximize downloading rate– Minimize the overheads due to duplicate

transmissions and requests• Strategies (by levels of aggressiveness)

– A peer can send a request for the same content to multiple neighbors simultaneously

– A peer can request for different content from multiple neighbors simultaneously (PPLive’s choice)

• For playback rate of 500Kbps, 8-20 neighbors is the best. More than this number can still improve the achieved rate, but at the expense of heavy duplication rate.

– A peer can work with one neighbor at a time.

Other Design Issues

• NAT and firewalls– Discovering different types of NAT boxes (similar

to STUN protocol)– Pacing the upload rate and request rate (avoid

being labeled as attacks and blocked)

• Content authentication– Chunk level authentication

• Some pieces may be polluted and cause poor viewing experience locally at a peer.

• If a peer detects a chunk is bad, discard it.

– Piece level authentication

What to measure• User behavior

– includes the user arrival patterns, and how long they stayed watching a movie

– used to improve the design of the replication strategy

• External performance metrics– includes user satisfaction and server load– used to measure the system performance perceived externally

• Health of replication– measures how well a P2P-VoD system is replicating a content– Used to infer how well the important component of the system is

doing

User Behavior

• MVR (movie viewing record)

User Satisfaction• Simple fluency

– measures the fraction of time a user spends watching a movie out of the total time he spends waiting for and watching that movie

R(m, i) : the set of all MVRs for a given movie m and user in(m, i) : the number of MVRs in R(m, i)r : one of the MVRs in R(m, i)

User Satisfaction (cont’)

• User satisfaction index– considers the quality of the delivery of the

content

r(Q) : a grade for the average viewing quality for an MVR r, using infer/estimate

Health of Replication• Three levels

– Movie level• The number of active peers who have advertised storing chunks

of that movie• The information that the tracker collects about movies

– Weighted movie level• Considers the fraction of chunks a peer has in computing the

index

– Chunk bitmap level (chunk vector)• The number of copies each chunk of a movie is stored by peers• Various other statistics can be computed; the average number of

copies of a chunk in a movie, the minimum number of chunks, the variance of the number of chunks.

Statistics on video objects

• Overall statistics of the three typical movies

Statistics on user behavior (1)

• Interarrival time distribution of viewers

Statistics on user behavior (2)

• View duration distribution Many MVRs <10 minsPrefetching?

Statistics on user behavior (3)How long does a user stay in system? – important to data replication

•>=70% users stay longer than 15 mins, able to provide upload serv.

Statistics on user behavior (4)

• Start position distribution

Observations from last slide

• A large fraction of users start watching movies from beginning

• User jump to different positions uniformly, so anchor points for jump operations could be uniformly spaced, guide for chunk selection strategy

Health index of Movies (1)

• Number of peers that cahce the movie

Health index of Movies (2)• Average owning ratios for different chunks (avg. over 24 hours)

What the observations imply?

• Many users watching movies from the beginning

• Many users do not finish the whole movie? Why?– Movie epilog!

Health index of Movies (3)

• Chunk availability and chunk demand

Health index of Movies (4)

• The available to demand ratios (good, >= 1)

User Satisfaction Index (1)• Generating fluency index

– The computation of F(m, i) is carried out by the client software.– The client software reports all MVRs and the fluency F(m, i) to

the log server whenever a “stop-watching” event occurs.• The STOP button is pressed

• Another movie/program is selected

• The user turns off the P2P-VoD software

User Satisfaction Index (2)

• The number of fluency records– A good indicator of the number of viewers of the movie

User Satisfaction Index (3)• The distribution of fluency index (<=0.2 bad; >=0.8 good)

– Good, but need to improve buffering time!

Future works

• Further research in P2P-VoD systems– How to design a highly scalable P2P-VoD system to

support millions of simultaneous users– How to perform dynamic movie replication,

replacement, and scheduling so as reduce the workload at the content servers

– How to quantify various replication strategies so as to guarantee a high health index

– How to select proper chunk and piece transmission strategies so as to improve the viewing quality

– How to accurately measure and quantify the user satisfaction level

Thinking

• What makes PPLive-VoD distinguish from BitTorrent?– From design perspectives– Similarity– Difference