searching in unstructured networks

21
Searching in Unstructured Networks Joining Theory with P-P2P

Upload: macy

Post on 19-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Searching in Unstructured Networks. Joining Theory with P-P2P. What is P2P?. Distributed system where all nodes play an equal role. In the “Internet world” homogeneity cannot be guaranteed (ie. Bandwidth, storage, processing power). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Searching in Unstructured Networks

Searching in Unstructured Networks

Joining Theory with P-P2P

Page 2: Searching in Unstructured Networks

What is P2P?

• Distributed system where all nodes play an equal role.

• In the “Internet world” homogeneity cannot be guaranteed (ie. Bandwidth, storage, processing power).

• Any such system requires an indexing, or searching scheme in order to function.

Page 3: Searching in Unstructured Networks

P2P Architectures

• Centralized (Napster)

• Decentralized with Structure (all DHT architectures such as CAN, Chord, etc)

• Decentralized and Unstructured (Gnutella, Freenet). I call these...

Page 4: Searching in Unstructured Networks

Popular Peer to Peer (P-P2P)

• No centralized directory.

• No centralized control over object placement.

• No centralized control over network topology.

• Only most loose of guarantees and weakest assumptions can be made.

Page 5: Searching in Unstructured Networks

Current Search Strategies

Flooding (BFS)• Reaches the most

nodes (w/in depth D)• Returns complete

results• Fastest brute-force

search• Does not scale

Crawl (DFS)• Successful searches

terminate• Minimal resources

• Poor response time: time increases exponential in D

Page 6: Searching in Unstructured Networks

Outline

• Current Solutions

• A better approach by example

• A good first step

• Final thoughts

Page 7: Searching in Unstructured Networks

Current Solutions

An Heuristic Approach to Heuristics

Page 8: Searching in Unstructured Networks

Directed BFS (DBFS)

• Keep state on search results returned from neighbours.

• Send query only to those neighbours with most successful “history.”

• Subset of neighbours revert to standard BFS (flooding).

• “Intelligent” selection of 1 neighbour produces similar success as BFS.

Page 9: Searching in Unstructured Networks

Iterative Deepening

• Global policy, P={a, b, c,…,W}– AI technique called search over state space.– Multiple BFSs with successively larger D.– Requires globally unique identifier

Reduces processing?Increases bandwidth?

No analysis or intuition: where’s the beef?

Page 10: Searching in Unstructured Networks

Hierarchical Approach

• Nodes with higher bandwidth and processing are designated “super-nodes”– In a graph, these nodes would have higher

weight (or size)

• Connected via 2 types of edges:– large weight edges to other super-nodes– smaller weight edges to smaller nodes.

• Founded on experience (eg. DNS system)

Page 11: Searching in Unstructured Networks

Lead by Example:

Replication in unstructured networks

Page 12: Searching in Unstructured Networks

Current Replication Strategies

• Owner Replication: objects replicated by nodes requesting object

• Path Replication: object replicated at every node along path from object origin to object request.

Page 13: Searching in Unstructured Networks

The Problem

In an unstructured network, how many copies of each object should there be in order to

minimize the cost of a search

(assuming fixed storage)?

Page 14: Searching in Unstructured Networks

First start with the simplest case

• m objects, n nodes

• Each object, i, replicated at ri random sites.

• Object i requested with rate qi ,

• Probability of successful search in k probes:

i irR

1

1]Pr[

k

ii

n

r

n

rk

1i iq

Page 15: Searching in Unstructured Networks

The simple case (con’t)

• Average search size of i:

• The average messaging overhead during a query can be represented by, A, the average search size over all objects:

ii r

nA

i

i

iii i r

qnAqA

Page 16: Searching in Unstructured Networks

Replication Strategies

• Assume average replicas per site

• Uniform:

mn

R

m

Rri ii Rqr •Proportional:

i

alproportioni

i

iiuniform A

Rq

qn

mmqA

Page 17: Searching in Unstructured Networks

Square-root Replication

• Put as a minimization problem, what is the optimal allocation of replicas so that A is minimized [Kleinrock, 1976]?

• This occurs when

• So,

i i

iiq

Rqr where,

21 i ioptimal qA

Page 18: Searching in Unstructured Networks

Searching in Unstructured Networks:

Revisited

Page 19: Searching in Unstructured Networks

Random Walks

• A query, at every node, is forwarded to a randomly chosen neighbour until the query succeeds.

• A strong body of knowledge exists:– understand and quantify resource trade-offs?

(ie. How many walkers? How many hops?)– gain insight into the merits of “tweaks”– helps us avoid guessing, and trial & error

Page 20: Searching in Unstructured Networks

“Gossip”

• “She tells 2 friends, and they tell 2 friends, and…” [VO5]– randomly select k neighbours to whom queries

are forwarded

• A young and relatively unexplored solution that (as yet) lacks significant analysis and understanding– perhaps this is the perfect opportunity

Page 21: Searching in Unstructured Networks

Final Thoughts

• DHT cannot succeed P-P2P applications without large breakthroughs in hashing (or ad-hoc solutions to search problems).

• P-P2P inherently suffers from unscalable or unfinished searches.

• An unstructured P2P network is still a graph; capitalise on prior work & knowledge.