presentation by nanda kishore lella lella.2@wright

64
1 A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella [email protected]

Upload: walden

Post on 21-Jan-2016

124 views

Category:

Documents


0 download

DESCRIPTION

A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble. Presentation by Nanda Kishore Lella [email protected]. Outline. P2P Overview What is a peer? Example applications Benefits of P2P P2P Content Sharing Challenges - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presentation by Nanda Kishore Lella Lella.2@wright

1

A Measurement Study of Peer-to-Peer File Sharing Systems

by

Stefan SaroiuP. Krishna Gummadi

Steven D. Gribble

Presentationby

Nanda Kishore [email protected]

Page 2: Presentation by Nanda Kishore Lella Lella.2@wright

2

Outline• P2P Overview

– What is a peer?– Example applications– Benefits of P2P

• P2P Content Sharing– Challenges– Group management/data placement approaches– Measurement studies

• Conclusion

Page 3: Presentation by Nanda Kishore Lella Lella.2@wright

3

What is Peer-to-Peer (P2P)?

• Most people think of P2P as music sharing

Examples:

• Napster

• Gnutella

Page 4: Presentation by Nanda Kishore Lella Lella.2@wright

4

What is a peer?

• Contrasted with Client-Server model

• Servers are centrally maintained and administered

• Client has fewer resources than a server

Page 5: Presentation by Nanda Kishore Lella Lella.2@wright

5

What is a peer?

• A peer’s resources are similar to the resources of the other participants

• P2P – peers communicating directly with other peers and sharing resources

Page 6: Presentation by Nanda Kishore Lella Lella.2@wright

6

P2P Application Taxonomy

P2P Systems

Distributed Computing File Sharing Collaboration PlatformsJXTA

Page 7: Presentation by Nanda Kishore Lella Lella.2@wright

7

P2P Goals/Benefits

• Cost sharing

• Resource aggregation

• Improved scalability/reliability

• Increased autonomy

• Anonymity/privacy

• Dynamism

• Ad-hoc communication

Page 8: Presentation by Nanda Kishore Lella Lella.2@wright

8

P2P File Sharing

• Content exchange– Gnutella

• File systems– Oceanstore

• Filtering/mining– Opencola

Page 9: Presentation by Nanda Kishore Lella Lella.2@wright

9

Research Areas

• Peer discovery and group management

• Data location and placement

• Reliable and efficient file exchange

• Security/privacy/anonymity/trust

Page 10: Presentation by Nanda Kishore Lella Lella.2@wright

10

Current Research

• Group management and data placement– Chord, CAN, Tapestry, Pastry

• Anonymity– Publius

• Performance studies– Gnutella measurement study etc.

Page 11: Presentation by Nanda Kishore Lella Lella.2@wright

11

Management/Placement Challenges

• Per-node state

• Bandwidth usage

• Search time

• Fault tolerance/resiliency

Page 12: Presentation by Nanda Kishore Lella Lella.2@wright

12

Approaches

• Centralized

• Flooding

• Document Routing

Page 13: Presentation by Nanda Kishore Lella Lella.2@wright

13

Centralized

• Napster model• Benefits:

– Efficient search

– Limited bandwidth usage

– Efficient network handling

• Drawbacks:– Central point of failure

– Limited scale

Bob Alice

JaneJudy

Page 14: Presentation by Nanda Kishore Lella Lella.2@wright

14

Flooding

• Gnutella model• Benefits:

– No central point of failure

– Limited per-node state

• Drawbacks:– Slow searches

– Bandwidth intensive

Bob

Alice

Jane

Judy

Carl

Page 15: Presentation by Nanda Kishore Lella Lella.2@wright

15

Document Routing

• FreeNet, Chord, CAN, Tapestry, Pastry model

• Benefits:– More efficient searching

– Limited per-node state

• Drawbacks:– Limited fault-tolerance vs

redundancy

001 012

212

305

332

212 ?

212 ?

Page 16: Presentation by Nanda Kishore Lella Lella.2@wright

16

Document Routing – CAN

• Associate to each node and item a unique id in an d-dimensional space

• Goals– Scales to hundreds of thousands of nodes

– Handles rapid arrival and failure of nodes

• Properties – Routing table size O(d)

– Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes

Slide modified from another presentation

Page 17: Presentation by Nanda Kishore Lella Lella.2@wright

17

CAN Example: Two Dimensional Space

• Space divided between nodes• All nodes cover the entire space• Each node covers either a square or a

rectangular area • Example:

– Node n1:(1, 2) first node that joins cover the entire space

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1

Slide modified from another presentation

Page 18: Presentation by Nanda Kishore Lella Lella.2@wright

18

CAN Example: Two Dimensional Space

• Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

Slide modified from another presentation

Page 19: Presentation by Nanda Kishore Lella Lella.2@wright

19

CAN Example: Two Dimensional Space

• Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3

Slide modified from another presentation

Page 20: Presentation by Nanda Kishore Lella Lella.2@wright

20

CAN Example: Two Dimensional Space

• Nodes n4:(5, 5) and n5:(6,6) join

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

Slide modified from another presentation

Page 21: Presentation by Nanda Kishore Lella Lella.2@wright

21

CAN Example: Two Dimensional Space

• Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)

• Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5);

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 22: Presentation by Nanda Kishore Lella Lella.2@wright

22

CAN Example: Two Dimensional Space

• Each item is stored by the node who owns its mapping in the space

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 23: Presentation by Nanda Kishore Lella Lella.2@wright

23

CAN: Query Example• Each node knows its

neighbors in the d-space• Forward query to the

neighbor that is closest to the query id

• Can route around some failures– some failures require local

flooding• Example: assume n1 queries

f41 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 24: Presentation by Nanda Kishore Lella Lella.2@wright

24

CAN: Query Example

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 25: Presentation by Nanda Kishore Lella Lella.2@wright

25

CAN: Query Example

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 26: Presentation by Nanda Kishore Lella Lella.2@wright

26

CAN: Query Example

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 27: Presentation by Nanda Kishore Lella Lella.2@wright

27

Node Failure Recovery

• Simple failures– know your neighbor’s neighbors– when a node fails, one of its neighbors takes

over its zone

• More complex failure modes– simultaneous failure of multiple adjacent nodes – scoped flooding to discover neighbors– hopefully, a rare event

Slide modified from another presentation

Page 28: Presentation by Nanda Kishore Lella Lella.2@wright

28

Document Routing – Chord

• MIT project• Uni-dimensional ID

space• Keep track of log N

nodes• Search through log N

nodes to find desired key

N32

N10

N5

N20

N110

N99

N80

N60

K19

Page 29: Presentation by Nanda Kishore Lella Lella.2@wright

29

Document Routing – Chord(2)

N32

N10

N5

N20

N110

N99

N80

N60

K19• Each node and key is

assigned an id.• If a node needs a key, searches in its table of n nodes for the key.• If fails, goes to the last

node of its table and repeats until it finds the key.

• Search through log N nodes to find desired key

Page 30: Presentation by Nanda Kishore Lella Lella.2@wright

30

Doc Routing – Tapestry/Pastry

• Global mesh of meshes• Suffix-based routing• Uses underlying network

distance in constructing mesh

13FE

ABFE

1290239E

73FE

9990

F990

993E

04FE

43FE

Page 31: Presentation by Nanda Kishore Lella Lella.2@wright

31

Naming in Tapestry

13FE

ABFE

1290239E

73FE

9990

F990

993E

04FE

43FE

• Every node has a 4 bit

name similar to IP address

• Each bit in the name

can hold 16 types• Keys present at the node

are in accordance with

the node name.

Page 32: Presentation by Nanda Kishore Lella Lella.2@wright

32

Tapestry Routing

6789

B4F8

9098

7598

4598

Msg to 4598

B437

Page 33: Presentation by Nanda Kishore Lella Lella.2@wright

33

Remaining Problems?

• Hard to handle highly dynamic environments

• Methods don’t consider peer characteristics

Page 34: Presentation by Nanda Kishore Lella Lella.2@wright

34

Measurement Studies

Gnutella vs. Napster

Page 35: Presentation by Nanda Kishore Lella Lella.2@wright

35

S S

S S

napster.com

P

P

P

P

P

P

Q

R

D

PP

PPP

PP

Q

QQ

QD

Q

R

P

S

peer

server

Q

RD

response

queryfile download

Napster Gnutella

R

Page 36: Presentation by Nanda Kishore Lella Lella.2@wright

36

Methodology

2 stages:1. periodically crawl Gnutella/Napster

• discover peers and their metadata

2. feed output from crawl into measurement tools:• bottleneck bandwidth – SProbe• latency – SProbe• peer availability – LF• degree of content sharing – Napster crawler

Page 37: Presentation by Nanda Kishore Lella Lella.2@wright

37

Crawling

• May 2001

• Napster crawl– query index server and keep track of results– query about returned peers– don’t capture users sharing unpopular content

• Gnutella crawl– send out ping messages with large TTL

Page 38: Presentation by Nanda Kishore Lella Lella.2@wright

38

Page 39: Presentation by Nanda Kishore Lella Lella.2@wright

39

Measurement Study

• How many peers are server-like…client-like?

• Bandwidth, latency

• Connectivity

• Who is sharing what?

Page 40: Presentation by Nanda Kishore Lella Lella.2@wright

40

Page 41: Presentation by Nanda Kishore Lella Lella.2@wright

41

Graph results

• CDF: cumulative distribution function• From this graph, we see that while 78% of the

participating peers have downstream bottleneck

bandwidths of at least 1000Kbps• Only 8% of the peers have upstream bottleneck

bandwidths of at least 10Mbps.• 22% of the participating peers have upstream

bottleneck bandwidths of 100Kbps or less.

Page 42: Presentation by Nanda Kishore Lella Lella.2@wright

42

Page 43: Presentation by Nanda Kishore Lella Lella.2@wright

43

Reported Bandwidth

Page 44: Presentation by Nanda Kishore Lella Lella.2@wright

44

Graph results

• The percentage of Napster users connected with modems (of 64Kbps or less) is

• About 25%, while the percentage of Gnutella users with similar connectivity is as low as 8%.

• 50% of the users in Napster and 60% of the users in Gnutella use broadband connections

• only about 20% of the users in Napster and 30% of the users in Gnutella have very high bandwidth connections

• Overall, Gnutella users on average tend to have higher downstream bottleneck bandwidths than Napster

users.

Page 45: Presentation by Nanda Kishore Lella Lella.2@wright

45

Page 46: Presentation by Nanda Kishore Lella Lella.2@wright

46

Graph results

• Approximately 20% of the peers have latencies of at least 280ms,

• Another 20% have latencies of at most 70ms

Page 47: Presentation by Nanda Kishore Lella Lella.2@wright

47

Page 48: Presentation by Nanda Kishore Lella Lella.2@wright

48

Graph results

• This graph illustrates the presence of two clusters; a smaller one situated at (20-60Kbps, 100-1,000ms) and a larger one at over (1,000Kbps, 60-300ms).

• horizontal lines in the graph they predicate that the latency also depends on the location of the peer for measuring system.

Page 49: Presentation by Nanda Kishore Lella Lella.2@wright

49

Measured Uptime

Page 50: Presentation by Nanda Kishore Lella Lella.2@wright

50

Page 51: Presentation by Nanda Kishore Lella Lella.2@wright

51

Number of Shared Files

Page 52: Presentation by Nanda Kishore Lella Lella.2@wright

52

Page 53: Presentation by Nanda Kishore Lella Lella.2@wright

53

Correlation of Free-Riding with B/WCDF of Number of Downloads Per Reported Bandwidths (Napster)

0

20

40

60

80

100

0 1 10 100Number of Downloads

Per

cen

tag

e o

f H

ost

s

Unknown

Modem + ISDN

Dual ISDN + Cable + DSL

T1 + T3

CDF of Number of Uploads Per Reported Bandwidths (Napster)

0

20

40

60

80

100

0 1 10 100Number of Uploads

Per

cen

tag

e o

f H

ost

sUnknown

T1 + T3

Dual ISDN + Cable + DSL

Modem + ISDN

Page 54: Presentation by Nanda Kishore Lella Lella.2@wright

54

Page 55: Presentation by Nanda Kishore Lella Lella.2@wright

55

Page 56: Presentation by Nanda Kishore Lella Lella.2@wright

56

CDFs of Downstream Bottlenck Bandwidths for All Napster Users and For All Users Who Reported Unknown Bandwidhts

0

20

40

60

80

100

1 10 100 1,000 10,000 100,000

Measured (SProbe) Downstream Bottleneck Bandwidth (Kbps)

Per

cent

age

of H

osts

All Napster Users

Napster Users whoReported Unknown Bandwidhts

Page 57: Presentation by Nanda Kishore Lella Lella.2@wright

57

Power law

• A connected cluster of peers that spans the entire network survives even in the presence of a large percentage p of random peer breakdowns, where p can be as large as:

where m is the minimum node degree and K is the maximum node degree.α <3.

Page 58: Presentation by Nanda Kishore Lella Lella.2@wright

58

Page 59: Presentation by Nanda Kishore Lella Lella.2@wright

59

Gnutella

Fri Feb 16 05:21:52-05:23:22 PST1771 hosts

Popular sites:

• 212.239.171.174

• adams-00-305a.Stanford.EDU

• 0.0.0.0

Page 60: Presentation by Nanda Kishore Lella Lella.2@wright

60

30% random failures

1771 – 471 – 294 hosts Fri Feb 16 05:21:52-05:23:22 PST

Page 61: Presentation by Nanda Kishore Lella Lella.2@wright

61

4% orchestrated failures

Fri Feb 16 05:21:52-05:23:22 PST1771 - 63 hosts

Page 62: Presentation by Nanda Kishore Lella Lella.2@wright

62

Results Overview

• Lots of heterogeneity between peers– Systems should consider peer capabilities

• Peers lie– Systems must be able to verify reported peer

capabilities or measure true capabilities

Page 63: Presentation by Nanda Kishore Lella Lella.2@wright

63

Points of Discussion

• Is it all hype?

• Should P2P be a research area?

• Do P2P applications/systems have common research questions?

• What are the “killer apps” for P2P systems?

Page 64: Presentation by Nanda Kishore Lella Lella.2@wright

64

Conclusion

• P2P is an interesting and useful model

• There are lots of technical challenges to be solved