measurement, modeling, and analysis of a peer-2-peer file-sharing workload

19
Measurement, Modeling, and Analysis of a Peer- 2-Peer File-Sharing Workload Presented For Cs294-4 Fall 2003 By Jon Hess

Upload: kalia-randall

Post on 31-Dec-2015

21 views

Category:

Documents


0 download

DESCRIPTION

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload. Presented For Cs294-4 Fall 2003 By Jon Hess. Measurement, Modeling, and Analysis. of a Peer-2-Peer File-Sharing Workload. Goal - Overview - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Presented ForCs294-4 Fall 2003

By Jon Hess

Page 2: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

• Goal - Overview– Determine if the KaZaA search space is

queried in such a way that a group of 25,000 clients can satisfy most of their own requests.

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Page 3: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

• Goals - Details– Capture an extensive trace– Utilize that trace to understand file-

sharing traffic flows– Model user and object activity– Determine inefficiencies in the

distribution model– Propose solutions to inefficiencies

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Page 4: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

• Motivations– Beginning in 1999-2000

file-sharing traffic began to exceed HTTP traffic in terms of aggregate bandwidth consumed

– File-sharing traffic is much less understood than HTTP traffic even though it represents such a large segment of bandwidth usage

– Bandwidth is expensive

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

HTTP Traffic P2P Traffic

2000 2002

Page 5: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

• The Trace– 2 Machines– 203 days 5 hours and 6 minutes– 22.7TB of KaZaA file transfer traffic– Captured seasonal variations

• End of spring• Summer• Fall semester

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Page 6: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

• Trace Conclusions – Users are patient

• 30 minutes to retrieve a small object

• Up to 1 week to retrieve a large object

– Users consume less as they age

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Page 7: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

• Trace Conclusions – Users machines are not very active

• A session is an unbroken length of time where a client has one or more file transfers in progress.

• Average sessions are only 2 minutes– 90th percentile 28 minutes

• Over the life of a client, it is only active 5.54% of the time or 0.20% of the trace period

– 90th percentile clients are active most of their life, and 4.15% of the trace

– Without control traffic analysis, is this meaningful?

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Page 8: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Transfer A

Transfer B

Transfer D Transfer E

3 Minutes 2 Minutes

Session Lengths

Page 9: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

• Trace Conclusions – Objects– Most requests are for small objects – 91%– Most bytes transferred are part of large objects

– 65%– There are many small objects– There are few large objects– Small Objects’ popularity is subject to heavy

churn• No small object was in the top 10 for all 6 months• Only 1 large object lived in the top 10 for 6 months• 44 large files remained in the top 100 for 6 months• The most popular small objects are new objects

– Most requests are for old objects

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Page 10: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

• Fetch-at-most-once– Once a KaZaA user obtains an object, they will

not need to retrieve another copy• 94% of Objects are fetched once per user• 99% are fetched less than twice per user

– Stems from the fact that media files are immutable and never ‘stale’

• You may refresh ‘slashdot.org’ three times a day, but there is no point download ‘thriller.mpeg’ seventeen times.

– This keeps KaZaA workload from following a Zipf curve even though object popularity does.

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Page 11: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

• Workload Modeling– Create a set of objects and give them

popularity based on a zipf distribution– Create a set of clients that requests objects in

proportion to there popularity– Have each client ‘fetch-at-most-once’– Measure the distribution of transfers

• Does it follow a zipf curve• How many big-object requests can a population of

size N satisfy

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Page 12: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Popular objects are not requested as curve would predict

Page 13: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

• Would a proxy cache help?– At first the

proxy will cache the popular objects and succeed.

– But as ‘fetch-at-most-once’ draws clients away from the Zipf curve and the proxy begins to fail.

– What happens if we increase density of popularity?• Curve starts higher and falls faster

Page 14: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

• Previous model did not insert new objects.– New popular

objects tend to ‘correct’ the work load.

– Through providing locality

• New clients however do not help, they contribute to keeping old object’s popular and destroy locality

Page 15: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

• Validating The Model– Capture parameters that are inputs to the

model from the trace• Number of clients• Number of objects• User request rate• Probability user requests given file - Guess• Probability of popularity of new objects - Guess• Object arrival rates – Guess

– Run simulation with harvested parameters– See if simulation predicts what actually

happened

Page 16: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

Simulation seems to successfully predict reality. But with three free variables used to tune results, is this fair?

Page 17: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

• What inefficiencies can we eliminate?– Analysis against the trace shows

• 86% of object transfers were from external sources when an internal source possessed the object.

• A traditional proxy, given the resources, could cut bandwidth utilization by 86%

– Would have to host pirated data

• Could use a proxy redirector instead. Must know the availability of the objects

– Control traffic is obfuscated

• Build locality into the protocol– Does this sacrifice anonymity?

Page 18: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload

• How successful would a locality aware protocol be?– Assume that a client is available for periods the

trace shows it as active• During a file transfer - extremely conservative

Page 19: Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

• Questions?Will increasing efficiency decrease load as the authors would like? Or simply increase work achieved per dollar? Do clients have insatiable appetites?

Are you worried that a large number of queries might have already been locally satisfied?

Measurement, Modeling, and Analysis

of a Peer-2-Peer File-Sharing Workload