a measurement study of peer-to-peer file sharing systems

Post on 30-Dec-2015

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A Measurement Study of Peer-to-Peer File Sharing Systems. Presented by Cristina Abad. Motivation. In a P2P file sharing system, peers are usually in the “edge” of the network Does this affect/limit the quality of the infrastructure? - PowerPoint PPT Presentation

TRANSCRIPT

A Measurement Study of Peer-to-Peer File Sharing Systems

Presented by

Cristina Abad

Motivation

In a P2P file sharing system, peers are usually in the “edge” of the network

Does this affect/limit the quality of the infrastructure?

What are the characteristics of hosts that choose to participate?

Solution: Measure Gnutella and Napster traffic to help understand these issues

Napster

Gnutella

Methodology Crawler periodically takes “snapshot” of

Napster/Gnutella– capture basic info (peers, files shared, …)

For peers discovered– measure bottleneck bandwidth– measure latency– track content and degree of sharing

Measure lifetime– track availability of peers (at P2P and IP level)

Crawling Napster Peers can only be discovered by querying

index Crawler issues queries with names of

popular song artists Query responses contain

– IP, reported bandwidth, files shared (number, names and sizes)

Results:– Captured 40-60% of Napster hosts

(contributing to 80-95% of total files)– Could not capture peers that do not share files

Crawling Gnutella Crawler uses ping/pong to discover peers Each crawl captured aprox. 10000 peers

Measuring bandwidth Reported bandwidth may not be accurate

(ignorance or lies) Use bottleneck bandwidth as approximation

to available bandwidth– capacity of slowest host along path between

two hosts

Used SProbe to actively measure both upstream and downstream bottleneck bandwidth– Similar to “packet pair” technique

Packet Pair Technique Two packets queued next to each other at

bottleneck link exit the link t seconds apart:

Then,

Kevin Lai and Mary Baker. “Measuring bandwidth”. In Proceedings of IEEE INFOCOM '99. 1999.

bnlb

st 2 s2: size of second packet

bbnl: bottleneck bandwidth

t

sbbnl

2

How many peers are server-like? High-bandwidth, low latency, high

availability 8% have upstreambb 10Mbps

Availability – Host uptimes

Availability – Session duration

Free-riders

Is Gnutella robust?

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Presented by

Cristina Abad

Three-tiered approach

1. Analyze 200-day trace of Kazaa traffic Considered only traffic going from U.

Washington to the outside

2. Develop a model of multimedia workloads

Analyze and confirm hypothesis

3. Explore potential impact of locality -awareness in Kazaa

Contributions Obtained some useful characterizations of

Kazaa’s traffic Showed that Kazaa’s workload is not Zipf

– Showed that other workloads (multimedia) may not be Zipf either

Presented a model of P2P file-sharing workloads based on their trace results– Validated the model through simulations that

yielded results very similar to those from traces Proved the usefulness of exploiting locality-

aware request routing

Measurement results

Users are patient Users slow down as they age Kazaa is not one workload Kazaa clients fetch objects at-most-once Popularity of objects is often short-lived Kazaa is not Zipf

User characteristics (1)

Users are patient

User characteristics (2) Users slow down as they age

– clients “die”– older clients ask for less each time they use

system

User characteristics (3) Client activity

– Tracing used could only detect users when their clients transfer data

– Thus, they only report statistics on client activity, which is a lower bound on availability

– Avg session lengths are typically small (median: 2.4 mins)

• Many transactions fail

• Periods of inactivity may occur during a request if client cannot find an available server with the object

Object characteristics (1) Kazaa is not

one workload

Object characteristics (2)

Kazaa object dynamics– Kazaa clients fetch objects at most once– Popularity of objects is often short-lived– Most popular objects tend to be recently born

objects– Most requests are for old objects

Object characteristics (3) Kazaa is not Zipf Web access patterns are Zipf: small number of

objects are extremely popular, but there is a long tail of unpopular requests.

Zipf’s law: popularity of ith-most popular object is proportional to i-α, (α: Zipf coefficient)

(Zipf) looks linear on log-log

scale

Model of P2P file-sharing workloads On average, a client requests 2 objects/day P(x): probability that a user requests an

object of popularity rank x Zipf(1)– Adjusted so that objects are requested at most

once A(x): probability that a newly arrived object

is inserted at popularity rank x Zipf(1) All objects are assumed to have same size Use caching to observe performance

changes (effectiveness hit rate)

Model – Simulation results File-sharing effectiveness diminishes with

client age– System evolves towards one with no locality

and objects chosen at random from large space New object arrivals improve performance

– Arrivals replenish supply of popular objects New clients cannot stabilize performance

– Can’t compensate for increasing number of old clients

– Overall bandwidth increases in proportion to population size

Model validation By tweaking the arrival rate of of new

objects, were able to match trace results (with 5475 new arrivals per year)

Exploring locality-awareness

Currently organizations shape or filter P2P traffic

Alternative strategy: exploit locality in file-sharing workload– Caching; or,– Use content available within organization to

substantially decrease external bandwidth usage– Result: 86% of externally downloaded bytes

could be avoided by using an organizational proxy

Questions?

Analysis

How can results obtained be used when evaluating P2P schemes?

Are any of the measurements obtained biased?

Peers are heterogeneous– Incentives– Enforcement (e.g. super-peers in Kazaa)

SProbe

Works in uncooperative environments Works on asymmetric network paths Exploit properties of TCP protocol

– Send SYN packet with large payload; then, measure time dispersion of received RST packet

Zipf Linguist George Kingsley Zipf observed that for

many frequency distributions, the n-th largest frequency is proportional to a negative power of the rank order n

"Zipf's law" is also sometimes used to refer to the corresponding probability distribution

Is an instance of a power law Zipf's law is often demonstrated by plotting the

data, with the axes being log(rank order) and log(frequency). If the points are close to a single straight line, the distribution follows Zipf's law.

top related