Transcript
Page 1: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

1B.N. Levine

Recent Problems in Peer-to-peer Content Retrieval

Brian Neil Levine

Dept. of Computer Science

UMass AmherstThe work by BNL and his students presented here was supported in part by National Science Foundation awards ANI-033055 and EIA-0080199.

AMHERST

Page 2: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

2B.N. Levine

Motivation

• Peer-to-peer content sharing is one of the largest portions of traffic on the network.

• Illegal (gnutella, kazaa) or not (Apple iTunes), understanding the characteristics of such traffic is important to a well-performing Internet.

• This talk: – What’s being done in p2p content & retrieval.– Overview of research in p2p traffic measurement.– How such measurements can affect p2p design.

Page 3: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

3B.N. Levine

What is a p2p architecture?

1

Re

sou

rces

ou

t of y

ou

r p

ock

et

to m

ake

it w

ork

(=m

on

ey)

Peers required to make it work

Centralized

successful

unsuccessful

robust,fault-tolerant

Many

over-budgeted

Little

LotsDistributed

Robust P2P

P2P

Cha

nce

you’

ll be

hel

d ac

coun

tabl

e

Page 4: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

4B.N. Levine

Overview of P2P research problems

• Content search– P2P designs are not one-size-fits-all.– Different applications require different solutions.

• Peer selection– Finding the best peer of many serving a file…

• Incentives for peers to participate• Security and privacy• Evaluation against measurement traces

– What does real p2p traffic look like?– What’s the real performance of these protocols?

Page 5: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

6B.N. Levine

Circular Pegs, square holes…• DHTs work great when:

– each node is associate with a unique keyword (e.g., SOS).

– The keywords stored are well-known

• e.g., DNS lookup using a DHT

– Hashes of keywords ensure work is evenly distributed

• Libraries of content?• Real measurements show:

– Nodes store more than one file, each file brings at least one keyword

• h(“The Red Hot Chili Peppers”, “Breaking the girl”)

– Content search is difficult: index each term? Or index whole title? Or part?

• h(“red”), h(“hot”), h(“chili”),…• H(“let”), h(“there”), h(“be”), h(“light”)…

– Some stored keywords are more popular than others.

– Some queried keywords are more popular than others.

Page 6: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

7B.N. Levine

How many keys per new user in your app?

806.0 xy

Number of files in user library

Nu

mb

er o

f u

niq

ue

keys

• DNS: 1-2 keys pers authoritative domain.

• [Left] : Unique terms in real collections of shared files (based on file names only! Not idv3 tags).

Page 7: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

8B.N. Levine

Cost of indexing files in DHTs

100%

80%

60%

40%

20%

0%

Per

cent

age

of p

eers

con

tact

ed to

inde

x fil

es

Cumulative percentage of peers (ranked)

e.g., in a 100-node network, 40% of the nodes must contact 100% of the peers to index filenames for each join and leave.

Page 8: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

9B.N. Levine

Methods of p2p search

• Distributed Hash Tables– CAN, Chord, Pastry, etc…

• Distribute the index• Cost: updating

pointers to content

• Flooded search over– Random graphs – Small-world networks– Power-law degree networks

• Return results only on the content you have stored

• Make it easy for searches to traverse the graph

• Cost: updating the graph; group similar nodes together

• Links represent– Nothing– Relational autocorrelation

• “Heat-seeking search” over an organized network.

Mu

ch focu

sN

ot e

no

ugh

focu

s

Page 9: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

10B.N. Levine

Searching for Topics not files…

• Information Retrieval searches:– Show me all documents that are related to

“salsa dancing” (as google does)

• You can’t index every word of every document– It’s hard enough to handle file names.

• One approach: place nodes with similar content together.

Page 10: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

11B.N. Levine

Arranging topology to match content

0

0.2

0.4

0.6

0.8

1

- 20 40 60 80 100Nodes contacted by BFS of the graph

Rec

all

Optimal

Per-queryArrangement

Arrangement

Random (gnutella)

• Arrange topology so that we increase the amount of relevant information returned to peers for limited BFS of the graph.

• Tough problem!• Can you find

answers without flooding? Can you route queries towards content?

Page 11: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

12B.N. Levine

Retrieval (briefly)

• Content is likely to be available from several peers.

• From which peer do you download?– Random (current approach)– Heuristics (ping, hop count, dl time)

• (but, most peers you’ve never seen before)

– Learned/Adaptive methods (e.g., MDPs)• See [BZLS; IPTPS’03]

Page 12: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

13B.N. Levine

Selecting for both accuracy and speed

• Of the set of 100, IR techniques will chose servers it believes are most accurate (red)

• Selecting nodes for best transfer times picks a different set (green).

• Trivial composition doesn’t work.

Client

...

Page 13: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

14B.N. Levine

Some other lessons learned from measurement (openNap)

Ratio of audio:video

Shared Transferred

# of files 20:1 1:1

# of bytes 1:1 0.06:1

• What happened to content delivery on the Internet?• What happened to serving video on the Internet?

Page 14: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

15B.N. Levine

Who’s transferring/serving files? (openNap)

Percentage of users down/uploading

Pe

rce

nta

ge

of a

ll d

ow

n/u

plo

ads

Page 15: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

16B.N. Levine

Session Lengths (gnutella)P

erce

ntag

e of

all

sess

ion

>x

Length of node availability (10 min. increments)

Page 16: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

17B.N. Levine

Balance of work in Chord(simulation based on real traces)

Equal work

Keys indexed

Queries Resolved

Msgs rcvd

Msgs sent

Percentage of all nodes (ranked)

100%

80%

60%

40%

20%

0%

Cum

ulat

ive

perc

enta

ge

of w

ork

doin

g “x

” pe

rfor

med

Page 17: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

18B.N. Levine

Does caching queries balance load? (simulation based on real traces)

• cached (infinite buffer): 20% answer 55% of the queries.

•Answer: yes, but still a problem.

• normal: 20% answer 84% of the queries.

Page 18: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

19B.N. Levine

Some Measurements of P2P

• Ripeanu et al. – Gnutella topology does not match underlying network topology.

MMCN'02

• Markatos – A simple, query caching scheme can reduce query traffic by a factor of

two. CCGrid 2002

• Saroiu et al. – Gnutella bandwidth, latency, and node availability over a 60-hour

period. Multimedia Systems Journal v8n6

• Adar and Huberman – A free-rider study, using Gnutella’s QueryHit messages to

infer peer downloads.

• Chu, Labonte, Levine – Measurements of Napster and Gnutella file popularity and

session lengths. Proc. ITCom 2002

• Bhagwan et al – effects of dhcp on availability of nodes in p2p, TOD, joins and

leaves IPTPS 2003

• Chu, Labonte, Levine – Measurements of all transfers and most libraries in a large

p2p system (openNap); evaluation of Chord

Page 19: Recent Problems in  Peer-to-peer Content Retrieval

NeXtworking’03 June 23-25,2003, Chania, Crete, GreeceThe First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

20B.N. Levine

Summary Open Issues

• Applications of p2p are broad.• Methods other than DHT are possible.• Measurement studies have revealed the skewed

distributions of p2p systems.– Can these be modeled?

• DHTs are limited in their application to content sharing.– Work well for single-key systems

• Stronger efforts are needed to match research designs to real characteristics of systems.

• Thanks to Jacky Chu and Kevin Labonte for doing the balance of the work.


Top Related