brief overview of academic research on p2p

41
Brief Overview of Brief Overview of Academic Research Academic Research on P2P on P2P Pei Cao Pei Cao

Upload: march

Post on 23-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Brief Overview of Academic Research on P2P. Pei Cao. Relevant Conferences. IPTPS (International Workshop on Peer-to-Peer Systems) ICDCS (IEEE Conference on Distributed Computer Systems) NSDI (USENIX Symposium on Network System Design and Implementation) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Brief Overview of Academic Research on P2P

Brief Overview of Brief Overview of Academic Research Academic Research

on P2Pon P2PPei CaoPei Cao

Page 2: Brief Overview of Academic Research on P2P

Relevant ConferencesRelevant Conferences IPTPS (International Workshop on IPTPS (International Workshop on

Peer-to-Peer Systems)Peer-to-Peer Systems) ICDCS (IEEE Conference on ICDCS (IEEE Conference on

Distributed Computer Systems)Distributed Computer Systems) NSDI (USENIX Symposium on Network NSDI (USENIX Symposium on Network

System Design and Implementation)System Design and Implementation) PODC (ACM Symposium on Principles PODC (ACM Symposium on Principles

of Distributed Computing)of Distributed Computing) SIGCOMMSIGCOMM

Page 3: Brief Overview of Academic Research on P2P

Areas of Research FocusAreas of Research Focus Gnutella-InspiredGnutella-Inspired

The “Directory Service” ProblemThe “Directory Service” Problem BitTorrent-Inspired BitTorrent-Inspired

The “File Distribution” ProblemThe “File Distribution” Problem P2P Live StreamingP2P Live Streaming P2P and Net NeutralityP2P and Net Neutrality

Page 4: Brief Overview of Academic Research on P2P

Gnutella-Gnutella-Inspired Inspired

Research StudiesResearch Studies

Page 5: Brief Overview of Academic Research on P2P

The Applications and The The Applications and The ProblemsProblems

Napster, Gnutella, KaZaa/FastTrak, SkypeNapster, Gnutella, KaZaa/FastTrak, Skype Look for a particular content/object, and Look for a particular content/object, and

find which peer has it find which peer has it the “directory the “directory service” problemservice” problem Challenge: how to offer a scalable Challenge: how to offer a scalable

directory service in a fully decentralized directory service in a fully decentralized fashionfashion

Arrange direct transfer from the peer Arrange direct transfer from the peer the “punch a hole in the firewall” problemthe “punch a hole in the firewall” problem

Page 6: Brief Overview of Academic Research on P2P

Decentralized Directory Decentralized Directory ServicesServices

Structured NetworksStructured Networks DHT (Distributed Hash Tables)DHT (Distributed Hash Tables)

Very active research areas from 2001 to 2004Very active research areas from 2001 to 2004 Limitation: lookup by keys onlyLimitation: lookup by keys only

Multi-Attribute DHTMulti-Attribute DHT Limited support for query-based lookupLimited support for query-based lookup

Unstructured NetworksUnstructured Networks Various improvements to basic flooding Various improvements to basic flooding

based schemesbased schemes

Page 7: Brief Overview of Academic Research on P2P

What Is a DHT?What Is a DHT? Single-node hash table:Single-node hash table:

key = Hash(name)key = Hash(name)put(key, value)put(key, value)get(key) -> valueget(key) -> value

How do I do this across millions of How do I do this across millions of hosts on the Internet?hosts on the Internet? DistributedDistributed Hash Table Hash Table

Page 8: Brief Overview of Academic Research on P2P

Distributed Hash TablesDistributed Hash Tables ChordChord CANCAN PastryPastry TapastryTapastry SymphonySymphony KoodleKoodle etc.etc.

Page 9: Brief Overview of Academic Research on P2P

The ProblemThe Problem

Internet

N1N2 N3

N6N5N4

Publisher

Put (Key=“title”Value=file data…) Client

Get(key=“title”)?

• Key Placement• Routing to find key

Page 10: Brief Overview of Academic Research on P2P

Key PlacementKey Placement Traditional hashingTraditional hashing

Nodes numbered from 1 to NNodes numbered from 1 to N Key is placed at node Key is placed at node (hash(key) % N)(hash(key) % N)

Why Traditional Hashing have Why Traditional Hashing have problemsproblems

Page 11: Brief Overview of Academic Research on P2P

Consistent Hashing: IDsConsistent Hashing: IDs

Key identifier = SHA-1(key)Key identifier = SHA-1(key) Node identifier = SHA-1(IP address)Node identifier = SHA-1(IP address) SHA-1 distributes both uniformlySHA-1 distributes both uniformly

How to map key IDs to node IDs?How to map key IDs to node IDs?

Page 12: Brief Overview of Academic Research on P2P

Consistent Hashing: Consistent Hashing: PlacementPlacement

A key is stored at its successor: node with next higher IDK80

N32

N90

N105 K20

K5

Circular 7-bitID space

Key 5Node 105

Page 13: Brief Overview of Academic Research on P2P

Basic LookupBasic Lookup

N32

N90

N105

N60

N10N120

K80

“Where is key 80?”

“N90 has K80”

Page 14: Brief Overview of Academic Research on P2P

““Finger Table” Allows Finger Table” Allows log(N)-time Lookupslog(N)-time Lookups

N80

½¼

1/8

1/161/321/641/128

Page 15: Brief Overview of Academic Research on P2P

Finger Finger ii Points to Points to Successor of Successor of n+2n+2ii

N80

½¼

1/8

1/161/321/641/128

112N120

Page 16: Brief Overview of Academic Research on P2P

Lookups Take O(Lookups Take O(log(N)log(N)) ) HopsHops

N32

N10

N5

N20N110

N99

N80

N60

Lookup(K19)

K19

Page 17: Brief Overview of Academic Research on P2P

Chord Lookup Algorithm Chord Lookup Algorithm PropertiesProperties

Interface: lookup(key) Interface: lookup(key) IP address IP address Efficient: O(log N) messages per Efficient: O(log N) messages per

lookuplookup N is the total number of serversN is the total number of servers

Scalable: O(log N) state per nodeScalable: O(log N) state per node Robust: survives massive failuresRobust: survives massive failures Simple to analyzeSimple to analyze

Page 18: Brief Overview of Academic Research on P2P

Related Studies on DHTsRelated Studies on DHTs Many variations of DHTsMany variations of DHTs

Different ways to choose the fingersDifferent ways to choose the fingers Ways to make it more robustWays to make it more robust Ways to make it more network efficientWays to make it more network efficient

Studies of different DHTsStudies of different DHTs What happens when peers leave aka What happens when peers leave aka

churns?churns? Applications built using DHTsApplications built using DHTs

Tracker-less BitTorrentTracker-less BitTorrent Beehive --- a P2P based DNS systemBeehive --- a P2P based DNS system

Page 19: Brief Overview of Academic Research on P2P

Directory Lookups: Directory Lookups: Unstructured NetworksUnstructured Networks

Example: GnutellaExample: Gnutella Support more flexible queriesSupport more flexible queries

Typically, precise “name” search is a small Typically, precise “name” search is a small portion of all queriesportion of all queries

SimplicitySimplicity High resilience against node failuresHigh resilience against node failures

Problems: ScalabilityProblems: Scalability Flooding Flooding # of messages ~ O(N*E) # of messages ~ O(N*E)

Page 20: Brief Overview of Academic Research on P2P

Flooding-Based SearchesFlooding-Based Searches

. . . . . . . . . . . .

Duplication increases as TTL increases in floodingDuplication increases as TTL increases in flooding Worst case: a node A is interrupted by N * q * Worst case: a node A is interrupted by N * q *

degree(A) messagesdegree(A) messages

1

2 3 4

5 6 7 8

Page 21: Brief Overview of Academic Research on P2P

Problems with Simple Problems with Simple TTL-Based FloodingTTL-Based Flooding

Hard to choose TTL:Hard to choose TTL: For objects that are widely present in For objects that are widely present in

the network, small TTLs sufficethe network, small TTLs suffice For objects that are rare in the network, For objects that are rare in the network,

large TTLs are necessarylarge TTLs are necessary Number of query messages grow Number of query messages grow

exponentially as TTL growsexponentially as TTL grows

Page 22: Brief Overview of Academic Research on P2P

Idea #1: Adaptively Idea #1: Adaptively Adjust TTLAdjust TTL

““Expanding Ring”Expanding Ring” Multiple floods: start with TTL=1; Multiple floods: start with TTL=1;

increment TTL by 2 each time until increment TTL by 2 each time until search succeedssearch succeeds

Success varies by network topologySuccess varies by network topology

Page 23: Brief Overview of Academic Research on P2P

Idea #2: Random WalkIdea #2: Random Walk Simple random walkSimple random walk

takes too long to find anything!takes too long to find anything! Multiple-walker random walkMultiple-walker random walk

N agents after each walking T steps N agents after each walking T steps visits as many nodes as 1 agent walking visits as many nodes as 1 agent walking N*T stepsN*T steps

When to terminate the search: check When to terminate the search: check back with the query originator once back with the query originator once every C stepsevery C steps

Page 24: Brief Overview of Academic Research on P2P

Flexible ReplicationFlexible Replication In unstructured systems, search success In unstructured systems, search success

is essentially about coverage: visiting is essentially about coverage: visiting enough nodes to probabilistically find the enough nodes to probabilistically find the object => replication density mattersobject => replication density matters

Limited node storage => what’s the Limited node storage => what’s the optimal replication density distribution?optimal replication density distribution? In Gnutella, only nodes who query an object In Gnutella, only nodes who query an object

store it => store it => rrii ppii What if we have different replication What if we have different replication

strategies? strategies?

Page 25: Brief Overview of Academic Research on P2P

Optimal rOptimal rii Distribution Distribution Goal: minimize Goal: minimize ( ( ppii/ / rri i ), where ), where rri i

=R=R Calculation: Calculation:

introduce Lagrange multiplier introduce Lagrange multiplier , find , find rrii and and that minimize: that minimize:

( ( ppii/ / rri i ) + ) + * ( * ( rri i - R)- R) => => - - ppii/ / rrii

2 2 = 0 = 0 for all ifor all i => => rrii ppii

Page 26: Brief Overview of Academic Research on P2P

Square-Root DistributionSquare-Root Distribution General principle: to minimize General principle: to minimize ( ( ppii/ /

rri i ) under constraint ) under constraint rri i =R, make =R, make rri i proportional to square root of proportional to square root of ppii

Other application examples:Other application examples: Bandwidth allocation to minimize Bandwidth allocation to minimize

expected download timesexpected download times Server load balancing to minimize Server load balancing to minimize

expected request latencyexpected request latency

Page 27: Brief Overview of Academic Research on P2P

Achieving Square-Root Achieving Square-Root DistributionDistribution

Suggestions from some heuristicsSuggestions from some heuristics Store an object at a number of nodes that is Store an object at a number of nodes that is

proportional to the number of node visited in proportional to the number of node visited in order to find the objectorder to find the object

Each node uses random replacementEach node uses random replacement Two implementations:Two implementations:

Path replication: store the object along the Path replication: store the object along the path of a successful “walk”path of a successful “walk”

Random replication: store the object randomly Random replication: store the object randomly among nodes visited by the agentsamong nodes visited by the agents

Page 28: Brief Overview of Academic Research on P2P

KaZaaKaZaa Use SupernodesUse Supernodes Regular Nodes : Supernodes = 100 : Regular Nodes : Supernodes = 100 :

11 Simple way to scale the system by a Simple way to scale the system by a

factor of 100factor of 100

Page 29: Brief Overview of Academic Research on P2P

BitTorrent-BitTorrent-Inspired Inspired

Research StudiesResearch Studies

Page 30: Brief Overview of Academic Research on P2P

Modeling and Modeling and Understanding BitTorrentUnderstanding BitTorrent

Analysis based on modelingAnalysis based on modeling View it as a type of Gossip AlgorithmView it as a type of Gossip Algorithm

Usually do not model the Tit-for-Tat aspectsUsually do not model the Tit-for-Tat aspects Assume perfectly connected networksAssume perfectly connected networks

Statistical modeling techniquesStatistical modeling techniques Mostly published in PODC or SIGMETRICSMostly published in PODC or SIGMETRICS

Simulation StudiesSimulation Studies Different assumption of bottlenecksDifferent assumption of bottlenecks Varying details of the modeling of the data Varying details of the modeling of the data

transfertransfer Published in ICDCS and SIGCOMMPublished in ICDCS and SIGCOMM

Page 31: Brief Overview of Academic Research on P2P

Studies on Effect of Studies on Effect of BitTorrent on ISPsBitTorrent on ISPs

Observation: P2P contributes to Observation: P2P contributes to cross-ISP trafficcross-ISP traffic SIGCOMM 2006 publication on studies SIGCOMM 2006 publication on studies

in Japan backbone trafficin Japan backbone traffic Attempts to improve network locality Attempts to improve network locality

of BitTorrent-like applicationsof BitTorrent-like applications ICDCS 2006 publicatoinICDCS 2006 publicatoin Academic P2P file sharing systemsAcademic P2P file sharing systems

Bullet, Julia, etc.Bullet, Julia, etc.

Page 32: Brief Overview of Academic Research on P2P

Techniques to Alleviate the Techniques to Alleviate the “Last Missing Piece” “Last Missing Piece”

ProblemProblem Apply Network Coding to pieces exchanged Apply Network Coding to pieces exchanged

between peersbetween peers Pablo Rodriguez Rodriguez, Microsoft Research Pablo Rodriguez Rodriguez, Microsoft Research

(recently moved to Telefonica Research)(recently moved to Telefonica Research)

Use a different piece-replication strategyUse a different piece-replication strategy Dahlia Makhi, Microsoft ResearchDahlia Makhi, Microsoft Research ““On Collaborative Content Distribution Using On Collaborative Content Distribution Using

Multi-Message Gossip”Multi-Message Gossip” Associate “age” with file segmentsAssociate “age” with file segments

Page 33: Brief Overview of Academic Research on P2P

Network CodingNetwork Coding Main FeatureMain Feature

Allowing intermediate nodes to encode Allowing intermediate nodes to encode packetspackets

Making optimal use of the available Making optimal use of the available network resourcesnetwork resources

Similar Technique: Erasure CodesSimilar Technique: Erasure Codes Reconstructing the original content of size Reconstructing the original content of size

n from roughly a subset of any n symbols n from roughly a subset of any n symbols from a large universe of encoded symbolsfrom a large universe of encoded symbols

Page 34: Brief Overview of Academic Research on P2P

Network Coding in P2P: Network Coding in P2P: The ModelThe Model

ServerServer Dividing the file into k blocksDividing the file into k blocks Uploading blocks at random to different Uploading blocks at random to different

clientsclients Clients (Users)Clients (Users)

Collaborating with each other to assemble Collaborating with each other to assemble the blocks and reconstruct the original filethe blocks and reconstruct the original file

Exchanging information and data with only a Exchanging information and data with only a small subset of others (neighbors)small subset of others (neighbors)

Symmetric neighborhood and linksSymmetric neighborhood and links

Page 35: Brief Overview of Academic Research on P2P

Network Coding in P2PNetwork Coding in P2P Assume a node with blocks B1, B2, …, BkAssume a node with blocks B1, B2, …, Bk Pick random numbers C1, C2, …, CkPick random numbers C1, C2, …, Ck Construct new block Construct new block E = C1 * B1 + C2 * B2 + … + Ck * BkE = C1 * B1 + C2 * B2 + … + Ck * Bk Send E and (C1, C2, …, Ck) to neighborSend E and (C1, C2, …, Ck) to neighbor Decoding: collect enough linearly Decoding: collect enough linearly

independent E’s, solve the linear systemindependent E’s, solve the linear system If all nodes pick vector C randomly, chances are If all nodes pick vector C randomly, chances are

high that after receiving ~K blocks, can recover high that after receiving ~K blocks, can recover B1 through BkB1 through Bk

Page 36: Brief Overview of Academic Research on P2P

P2P Live P2P Live StreamingStreaming

Page 37: Brief Overview of Academic Research on P2P

MotivationsMotivations Internet Applications:Internet Applications:

PPLive, PPStream, etc.PPLive, PPStream, etc. Challenge: QoS IssuesChallenge: QoS Issues

Raw bandwidth constraintsRaw bandwidth constraints Example: PPLive utilizes the significant Example: PPLive utilizes the significant

bandwidth disparity between “Univeristy bandwidth disparity between “Univeristy nodes” and “Residential nodes”nodes” and “Residential nodes”

Satisfying demand of content publishersSatisfying demand of content publishers

Page 38: Brief Overview of Academic Research on P2P

P2P Live Streaming Can’t P2P Live Streaming Can’t Stand on Its OwnStand on Its Own

P2P as a complement to IP-MulticastP2P as a complement to IP-Multicast Used where IP-Multicast isn’t enabledUsed where IP-Multicast isn’t enabled

P2P as a way to reduce server loadP2P as a way to reduce server load By sourcing parts of streams from peers, By sourcing parts of streams from peers,

server load might be reduced by 10%server load might be reduced by 10% P2P as a way to reduce backbone P2P as a way to reduce backbone

bandwidth requirementsbandwidth requirements When core network bandwidth isn’t When core network bandwidth isn’t

sufficient sufficient

Page 39: Brief Overview of Academic Research on P2P

P2P and Net-P2P and Net-NeutralityNeutrality

Page 40: Brief Overview of Academic Research on P2P

It’s All TCP’s FaultIt’s All TCP’s Fault TCP: per-flow fairnessTCP: per-flow fairness BrowsersBrowsers

2-4 TCP flows per web server2-4 TCP flows per web server Contact a few web servers at a timeContact a few web servers at a time Short flowsShort flows

P2P applications:P2P applications: Much higher number of TCP connectionsMuch higher number of TCP connections Many more endpointsMany more endpoints Long flowsLong flows

Page 41: Brief Overview of Academic Research on P2P

When and How to Apply When and How to Apply Traffic ShapingTraffic Shaping

Current practice: application Current practice: application recognitionrecognition

Needs:Needs: An application ignostic way to trigger An application ignostic way to trigger

the traffic shapingthe traffic shaping A clear statement to users on what A clear statement to users on what

happenshappens