a measurement study of peer-to-peer file sharing systems

32
A Measurement Study of Peer- to-Peer File Sharing Systems Presented by Cristina Abad

Upload: samantha-koch

Post on 30-Dec-2015

40 views

Category:

Documents


0 download

DESCRIPTION

A Measurement Study of Peer-to-Peer File Sharing Systems. Presented by Cristina Abad. Motivation. In a P2P file sharing system, peers are usually in the “edge” of the network Does this affect/limit the quality of the infrastructure? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Measurement Study of Peer-to-Peer File Sharing Systems

A Measurement Study of Peer-to-Peer File Sharing Systems

Presented by

Cristina Abad

Page 2: A Measurement Study of Peer-to-Peer File Sharing Systems

Motivation

In a P2P file sharing system, peers are usually in the “edge” of the network

Does this affect/limit the quality of the infrastructure?

What are the characteristics of hosts that choose to participate?

Solution: Measure Gnutella and Napster traffic to help understand these issues

Page 3: A Measurement Study of Peer-to-Peer File Sharing Systems

Napster

Page 4: A Measurement Study of Peer-to-Peer File Sharing Systems

Gnutella

Page 5: A Measurement Study of Peer-to-Peer File Sharing Systems

Methodology Crawler periodically takes “snapshot” of

Napster/Gnutella– capture basic info (peers, files shared, …)

For peers discovered– measure bottleneck bandwidth– measure latency– track content and degree of sharing

Measure lifetime– track availability of peers (at P2P and IP level)

Page 6: A Measurement Study of Peer-to-Peer File Sharing Systems

Crawling Napster Peers can only be discovered by querying

index Crawler issues queries with names of

popular song artists Query responses contain

– IP, reported bandwidth, files shared (number, names and sizes)

Results:– Captured 40-60% of Napster hosts

(contributing to 80-95% of total files)– Could not capture peers that do not share files

Page 7: A Measurement Study of Peer-to-Peer File Sharing Systems

Crawling Gnutella Crawler uses ping/pong to discover peers Each crawl captured aprox. 10000 peers

Page 8: A Measurement Study of Peer-to-Peer File Sharing Systems

Measuring bandwidth Reported bandwidth may not be accurate

(ignorance or lies) Use bottleneck bandwidth as approximation

to available bandwidth– capacity of slowest host along path between

two hosts

Used SProbe to actively measure both upstream and downstream bottleneck bandwidth– Similar to “packet pair” technique

Page 9: A Measurement Study of Peer-to-Peer File Sharing Systems

Packet Pair Technique Two packets queued next to each other at

bottleneck link exit the link t seconds apart:

Then,

Kevin Lai and Mary Baker. “Measuring bandwidth”. In Proceedings of IEEE INFOCOM '99. 1999.

bnlb

st 2 s2: size of second packet

bbnl: bottleneck bandwidth

t

sbbnl

2

Page 10: A Measurement Study of Peer-to-Peer File Sharing Systems

How many peers are server-like? High-bandwidth, low latency, high

availability 8% have upstreambb 10Mbps

Page 11: A Measurement Study of Peer-to-Peer File Sharing Systems

Availability – Host uptimes

Page 12: A Measurement Study of Peer-to-Peer File Sharing Systems

Availability – Session duration

Page 13: A Measurement Study of Peer-to-Peer File Sharing Systems

Free-riders

Page 14: A Measurement Study of Peer-to-Peer File Sharing Systems

Is Gnutella robust?

Page 15: A Measurement Study of Peer-to-Peer File Sharing Systems

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Presented by

Cristina Abad

Page 16: A Measurement Study of Peer-to-Peer File Sharing Systems

Three-tiered approach

1. Analyze 200-day trace of Kazaa traffic Considered only traffic going from U.

Washington to the outside

2. Develop a model of multimedia workloads

Analyze and confirm hypothesis

3. Explore potential impact of locality -awareness in Kazaa

Page 17: A Measurement Study of Peer-to-Peer File Sharing Systems

Contributions Obtained some useful characterizations of

Kazaa’s traffic Showed that Kazaa’s workload is not Zipf

– Showed that other workloads (multimedia) may not be Zipf either

Presented a model of P2P file-sharing workloads based on their trace results– Validated the model through simulations that

yielded results very similar to those from traces Proved the usefulness of exploiting locality-

aware request routing

Page 18: A Measurement Study of Peer-to-Peer File Sharing Systems

Measurement results

Users are patient Users slow down as they age Kazaa is not one workload Kazaa clients fetch objects at-most-once Popularity of objects is often short-lived Kazaa is not Zipf

Page 19: A Measurement Study of Peer-to-Peer File Sharing Systems

User characteristics (1)

Users are patient

Page 20: A Measurement Study of Peer-to-Peer File Sharing Systems

User characteristics (2) Users slow down as they age

– clients “die”– older clients ask for less each time they use

system

Page 21: A Measurement Study of Peer-to-Peer File Sharing Systems

User characteristics (3) Client activity

– Tracing used could only detect users when their clients transfer data

– Thus, they only report statistics on client activity, which is a lower bound on availability

– Avg session lengths are typically small (median: 2.4 mins)

• Many transactions fail

• Periods of inactivity may occur during a request if client cannot find an available server with the object

Page 22: A Measurement Study of Peer-to-Peer File Sharing Systems

Object characteristics (1) Kazaa is not

one workload

Page 23: A Measurement Study of Peer-to-Peer File Sharing Systems

Object characteristics (2)

Kazaa object dynamics– Kazaa clients fetch objects at most once– Popularity of objects is often short-lived– Most popular objects tend to be recently born

objects– Most requests are for old objects

Page 24: A Measurement Study of Peer-to-Peer File Sharing Systems

Object characteristics (3) Kazaa is not Zipf Web access patterns are Zipf: small number of

objects are extremely popular, but there is a long tail of unpopular requests.

Zipf’s law: popularity of ith-most popular object is proportional to i-α, (α: Zipf coefficient)

(Zipf) looks linear on log-log

scale

Page 25: A Measurement Study of Peer-to-Peer File Sharing Systems

Model of P2P file-sharing workloads On average, a client requests 2 objects/day P(x): probability that a user requests an

object of popularity rank x Zipf(1)– Adjusted so that objects are requested at most

once A(x): probability that a newly arrived object

is inserted at popularity rank x Zipf(1) All objects are assumed to have same size Use caching to observe performance

changes (effectiveness hit rate)

Page 26: A Measurement Study of Peer-to-Peer File Sharing Systems

Model – Simulation results File-sharing effectiveness diminishes with

client age– System evolves towards one with no locality

and objects chosen at random from large space New object arrivals improve performance

– Arrivals replenish supply of popular objects New clients cannot stabilize performance

– Can’t compensate for increasing number of old clients

– Overall bandwidth increases in proportion to population size

Page 27: A Measurement Study of Peer-to-Peer File Sharing Systems

Model validation By tweaking the arrival rate of of new

objects, were able to match trace results (with 5475 new arrivals per year)

Page 28: A Measurement Study of Peer-to-Peer File Sharing Systems

Exploring locality-awareness

Currently organizations shape or filter P2P traffic

Alternative strategy: exploit locality in file-sharing workload– Caching; or,– Use content available within organization to

substantially decrease external bandwidth usage– Result: 86% of externally downloaded bytes

could be avoided by using an organizational proxy

Page 29: A Measurement Study of Peer-to-Peer File Sharing Systems

Questions?

Page 30: A Measurement Study of Peer-to-Peer File Sharing Systems

Analysis

How can results obtained be used when evaluating P2P schemes?

Are any of the measurements obtained biased?

Peers are heterogeneous– Incentives– Enforcement (e.g. super-peers in Kazaa)

Page 31: A Measurement Study of Peer-to-Peer File Sharing Systems

SProbe

Works in uncooperative environments Works on asymmetric network paths Exploit properties of TCP protocol

– Send SYN packet with large payload; then, measure time dispersion of received RST packet

Page 32: A Measurement Study of Peer-to-Peer File Sharing Systems

Zipf Linguist George Kingsley Zipf observed that for

many frequency distributions, the n-th largest frequency is proportional to a negative power of the rank order n

"Zipf's law" is also sometimes used to refer to the corresponding probability distribution

Is an instance of a power law Zipf's law is often demonstrated by plotting the

data, with the axes being log(rank order) and log(frequency). If the points are close to a single straight line, the distribution follows Zipf's law.