peer-to-peer networking

53
Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica

Upload: jessica-welch

Post on 30-Dec-2015

46 views

Category:

Documents


0 download

DESCRIPTION

Peer-to-Peer Networking. Credit slides from J. Pang, B. Richardson, I. Stoica. Why Study P2P. Huge fraction of traffic on networks today >=50%! Exciting new applications Next level of resource sharing Vs. timesharing, client-server E.g. Access 10’s-100’s of TB at low cost. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Peer-to-Peer Networking

Peer-to-Peer Networking

Credit slides from J. Pang, B. Richardson, I. Stoica

Page 2: Peer-to-Peer Networking

2

Why Study P2P

Huge fraction of traffic on networks today >=50%!

Exciting new applications Next level of resource sharing

Vs. timesharing, client-server E.g. Access 10’s-100’s of TB at low cost.

Page 3: Peer-to-Peer Networking

3

Share of Internet Traffic

Page 4: Peer-to-Peer Networking

4

Number of Users

Others includeBitTorrent, eDonkey, iMesh,Overnet, Gnutella

BitTorrent (and others) gaining sharefrom FastTrack (Kazaa).

Page 5: Peer-to-Peer Networking

5

What is P2P used for?

Use resources of end-hosts to accomplish a shared task Typically share files Play game Search for patterns in data (Seti@Home)

Page 6: Peer-to-Peer Networking

6

What’s new?

Taking advantage of resources at the edge of the network Fundamental shift in computing capability Increase in absolute bandwidth over WAN

Page 7: Peer-to-Peer Networking

7

Peer to Peer

Systems: Napster Gnutella KaZaA BitTorrent Chord

Page 8: Peer-to-Peer Networking

8

Key issues for P2P systems

Join/leave How do nodes join/leave? Who is allowed?

Search and retrieval How to find content? How are metadata indexes built, stored,

distributed? Content Distribution

Where is content stored? How is it downloaded and retrieved?

Page 9: Peer-to-Peer Networking

9

4 Key Primitives

• Join– How to enter/leave the P2P system?

• Publish– How to advertise a file?

• Search• how to find a file?

• Fetch– how to download a file?

Page 10: Peer-to-Peer Networking

10

Publish and Search

Basic strategies: Centralized (Napster) Flood the query (Gnutella) Route the query(Chord)

Different tradeoffs depending on application Robustness, scalability, legal issues

Page 11: Peer-to-Peer Networking

11

Napster: History

In 1999, S. Fanning launches Napster Peaked at 1.5 million simultaneous

users Jul 2001, Napster shuts down

Page 12: Peer-to-Peer Networking

12

Napster: Overiew

Centralized Database: Join: on startup, client contacts central

server Publish: reports list of files to central

server Search: query the server => return

someone that stores the requested file Fetch: get the file directly from peer

Page 13: Peer-to-Peer Networking

13

Napster: Publish

I have X, Y, and Z!

Publish

insert(X, 123.2.21.23)...

123.2.21.23

Page 14: Peer-to-Peer Networking

14

Napster: Search

Where is file A?

Query Reply

search(A)-->123.2.0.18Fetch

123.2.0.18

Page 15: Peer-to-Peer Networking

15

Napster: Discussion

Pros: Simple Search scope is O(1) Controllable (pro or con?)

Cons: Server maintains O(N) State Server does all processing Single point of failure

Page 16: Peer-to-Peer Networking

16

Gnutella: History

In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella

Soon many other clients: Bearshare, Morpheus, LimeWire, etc.

In 2001, many protocol enhancements including “ultrapeers”

Page 17: Peer-to-Peer Networking

17

Gnutella: Overview

Query Flooding: Join: on startup, client contacts a few

other nodes; these become its “neighbors”

Publish: no needSearch: ask neighbors, who as their

neighbors, and so on... when/if found, reply to sender.

Fetch: get the file directly from peer

Page 18: Peer-to-Peer Networking

18

I have file A.

I have file A.

Gnutella: Search

Where is file A?

Query

Reply

Page 19: Peer-to-Peer Networking

19

Gnutella: Discussion

Pros: Fully de-centralized Search cost distributed

Cons: Search scope is O(N) Search time is O(???) Nodes leave often, network unstable

Page 20: Peer-to-Peer Networking

20

Aside: Search Time?

Page 21: Peer-to-Peer Networking

21

Aside: All Peers Equal?

56kbps Modem

10Mbps LAN

1.5Mbps DSL

56kbps Modem56kbps Modem

1.5Mbps DSL

1.5Mbps DSL

1.5Mbps DSL

Page 22: Peer-to-Peer Networking

22

Aside: Network Resilience

Partial Topology Random 30% die Targeted 4% die

from Saroiu et al., MMCN 2002

Page 23: Peer-to-Peer Networking

23

KaZaA: History

In 2001, KaZaA created by Dutch company Kazaa BV

Single network called FastTrack used by other clients as well: Morpheus, giFT, etc.

Eventually protocol changed so other clients could no longer talk to it

Most popular file sharing network today with >10 million users (number varies)

Page 24: Peer-to-Peer Networking

24

KaZaA: Overview

“Smart” Query Flooding: Join: on startup, client contacts a

“supernode” ... may at some point become one itself

Publish: send list of files to supernode Search: send query to supernode,

supernodes flood query amongst themselves.

Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers

Page 25: Peer-to-Peer Networking

25

KaZaA: Network Design

“Super Nodes”

Page 26: Peer-to-Peer Networking

26

KaZaA: File Insert

I have X!

Publish

insert(X, 123.2.21.23)...

123.2.21.23

Page 27: Peer-to-Peer Networking

27

KaZaA: File Search

Where is file A?

Query

search(A)-->123.2.0.18

search(A)-->123.2.22.50

Replies

123.2.0.18

123.2.22.50

Page 28: Peer-to-Peer Networking

28

KaZaA: Discussion

Pros: Tries to take into account node heterogeneity:

• Bandwidth• Host Computational Resources• Host Availability (?)

Rumored to take into account network locality Cons:

Mechanisms easy to circumvent Still no real guarantees on search scope or search time

Page 29: Peer-to-Peer Networking

29

P2P systems

Napster Launched P2P Centralized index

Gnutella: Focus is simple sharing Using simple flooding

Kazaa More intelligent query routing

BitTorrent Focus on Download speed, fairness in sharing

Page 30: Peer-to-Peer Networking

30

BitTorrent: History

In 2002, B. Cohen debuted BitTorrent Key Motivation:

Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN on 9/11, new movie/game

release Focused on Efficient Fetching, not Searching:

Distribute the same file to all peers Single publisher, multiple downloaders

Has some “real” publishers: Blizzard Entertainment using it to distribute the beta of

their new game

Page 31: Peer-to-Peer Networking

31

BitTorrent: Overview

Swarming: Join: contact centralized “tracker” server,

get a list of peers. Publish: Run a tracker server. Search: Out-of-band. E.g., use Google to

find a tracker for the file you want. Fetch: Download chunks of the file from

your peers. Upload chunks you have to them.

Page 32: Peer-to-Peer Networking

32

Page 33: Peer-to-Peer Networking

33

BitTorrent: Sharing Strategy

Employ “Tit-for-tat” sharing strategy “I’ll share with you if you share with me” Be optimistic: occasionally let freeloaders download

• Otherwise no one would ever start!• Also allows you to discover better peers to download

from when they reciprocate

Page 34: Peer-to-Peer Networking

34

BitTorrent: Summary

Pros: Works reasonably well in practice Gives peers incentive to share resources;

avoids freeloaders Cons:

Central tracker server needed to bootstrap swarm (is this really necessary?)

Page 35: Peer-to-Peer Networking

35

DHT: History

In 2000-2001, academic researchers said “we want to play too!”

Motivation: Frustrated by popularity of all these “half-baked” P2P

apps :) We can do better! (so we said) Guaranteed lookup success for files in system Provable bounds on search time Provable scalability to millions of node

Hot Topic in networking ever since

Page 36: Peer-to-Peer Networking

36

DHT: Overview

Abstraction: a distributed “hash-table” (DHT) data structure: put(id, item); item = get(id);

Implementation: nodes in system form a distributed data structure Can be Ring, Tree, Hypercube, Skip List, Butterfly

Network, ...

Page 37: Peer-to-Peer Networking

37

DHT: Overview (2)

Structured Overlay Routing: Join: On startup, contact a “bootstrap” node and integrate

yourself into the distributed data structure; get a node id Publish: Route publication for file id toward a close node id

along the data structure Search: Route a query for file id toward a close node id.

Data structure guarantees that query will meet the publication.

Fetch:• P2P fetching

Page 38: Peer-to-Peer Networking

Tapestry: a DHT-based P2P system focusing on fault-tolerance

Danfeng (Daphne) Yao

Page 39: Peer-to-Peer Networking

39

A very big picture – P2P networking Observe

Dynamic nature of the computing environment

Network Expansion Require

Robustness, or fault tolerance Scalability Self-administration, or self-organization

Page 40: Peer-to-Peer Networking

40

Goals to achieve in Tapestry

Decentralization Routing uses local data

Efficiency, scalability Robust routing mechanism

Resilient to node failures An easy way to locate objects

Page 41: Peer-to-Peer Networking

41

IDs in Tapestry

NodeID ObjectID IDs are computed using a hash function

The hash function is a system-wide parameter

Use ID for routing Locating an object Finding a node

A node stores its neighbors’ IDs

Page 42: Peer-to-Peer Networking

42

Tapestry routing: 03254598

Suffix-based routing

Page 43: Peer-to-Peer Networking

43

Each object is assigned a root node

Root node: a unique node for object O with the longest matching suffix Root knows where O is

rootId = O

Copy of O

Publish: node S routes a msg <objectID(O), nodeID(S)> to root of O Closest location is stored if exist multiple copies Location query: a msg for O is routed towards O’s root How to choose the root node for an object?

S

Page 44: Peer-to-Peer Networking

44

Surrogate routing: unique mapping without global knowledge

Problem: how to choose a unique root node of an object deterministically

Surrogate routing Choose an object’s root (surrogate) node

such that whose nodeId is the closest to the objectId

Small overhead, but no need of global knowledge

The number of additional hops is small

Page 45: Peer-to-Peer Networking

45

Surrogate example: find the root node for objectId = 6534

2234

1114

2534

6534

object

root

7534

7534

6534 empty

7534

6534 empty…….

Page 46: Peer-to-Peer Networking

46

Tapestry: adaptable, fault-resilient,self-management

Basic routing and location Backpointer list in neighbor map Multiple <objectId(O), nodeId(S)> mappings

are stored More semantic flexibility: selection operator

for choosing returned objects

Page 47: Peer-to-Peer Networking

47

A single Tapestry node

Neighbor MapFor “1732” (Octal)

1234

xxx1

1732

xxx0

xxx3

xxx4

xxx5

xxx6

xxx7

xx02

1732

xx22

xx32

xx42

xx52

xx62

xx72

x032

x132

x232

x332

x432

x532

x632

1732

0732

1732

2732

3732

4732

5732

6732

7732

Object locationPointers<objectId, nodeId>

Hotspot monitor<objId, nodeId, freq>

Object store

Page 48: Peer-to-Peer Networking

48

Fault-tolerant routing: detect, operate, and recover Expected faults

Server outages (high load, hw/sw faults) Link failures (router hw/sw faults) Neighbor table corruption at the server

Detect: TCP timeout, heartbeats to nodes on backpointer list

Operate under faults: backup neighbors Second chance for failed server: probing msgs

Page 49: Peer-to-Peer Networking

49

Fault-tolerant location: avoid single point of failure Multiple roots for each object

rootId = f(objectId + salt) Redundant, but reliable

Storage servers periodically republish location info Cached location info on routers times out New and recovered objects periodically

advertise their location info

Page 50: Peer-to-Peer Networking

50

Dynamic node insertion

Populate new node’s neighbor maps Routing messages to its own nodeId Copy and optimize neighbor maps

Inform relevant nodes of its entries to update neighbor maps

Page 51: Peer-to-Peer Networking

51

Dynamic deletion

Inform relevant nodes using backptrs Or rely on soft-state (stop heartbeats) In general, Tapestry expects small

number of dynamic insertions/deletions

Page 52: Peer-to-Peer Networking

52

Fault handling outline

Design choice Soft state for graceful fault recovery

Soft-state Caches: updated by periodic refreshment msgs, or

purged if receiving no such msgs Faults are expected in Tapestry

Fault-tolerant routing Fault-tolerant location Surrogate routing

Page 53: Peer-to-Peer Networking

53

Tapestry Conclusions

Decentralized location and routing Distributed algorithms for object-root

mapping, node insertion/deletion Fault-handling with redundancy Per node routing table size: bLogb(N)

• N = size of namespace Find object in Logb(n) overlay hops

• n = # of physical nodes