cs728 lecture 17 web indexes iii. last time showed how build indexes for graph connectivity based on...

30
CS728 Lecture 17 Web Indexes III

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

CS728

Lecture 17

Web Indexes III

Page 2: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

•Last Time• Showed how build indexes for graph connectivity

•Based on 2-hop covers•Today

•Look at more general problem of compact encodings for

graphs and network problems• Applications

- Fast queries for path information - routing & routing table construction

- topology control- spanning trees- dominating sets & clustering- hierarchical clustering

Page 3: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Main Problem Considered

• arbitrary topology• goal small routing tables to find path to destination• related problem: finding closest item of certain type

Routing: how do I get there from here?

source

destination

Page 4: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Definitions:

Spanner: subgraph whose distance between two nodes is close to that in the original graph

We will see that radio networks need energy-spanners, i.e, subgraphs that contain energy-efficient paths

Page 5: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Spanning Trees:

K-Dominating Sets:

• minimum connected subgraph• useful for routing• single point of failure• non-minimal routes• many variants

• set of nodes that are within K hops of every node• used to defines partition of the network into zones 1-dominating set

Page 6: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Graph Clustering:

Hierarchical Clustering

• K-center problem – find k nodes such that minimize the max distance to all nodes – Flat Clustering

• Hierarchical Clustering• tree clustering with internal and border nodes and edges

Page 7: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Hierarchical Clustering

• The hierarchy imposes a natural addressing scheme

• Each node labeled with the path in the hierarchy tree

• Problem: give a compact labeling for a tree– Clearly need logn bits to identify some nodes.– Need to add information about tree structure– Complete binary tree– Other n-node trees

Page 8: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

• Interval labeling scheme– Label the leaves of the tree uniquely logn bits– Label each internal node with the range of its

descendents 2log n bits.– Given two nodes x,y and their labels

• Can you test if x is an ancestor or y?• Can you describe the path from x to y?

Page 9: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

• Greedy Dewey Labeling scheme

• Label each edge with small unique string

• Nodes are concatenation of edge labels

v0

00 01

v1 v2

10

v3 v4

Out-degree 4 requires edge labels of maximum length 2.

v0

v1

0

v600

……..

Out-degree 600 requires edge labels of maximum length 10.

1101101110

Page 10: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Theorem: Upper bound on GDL label length withunary delimiters is bits, - is the depth of v in T - n is number of nodes in T

• Alternative use binary (fixed length) for delimiting each edge– Seems to do worse in practice

• Can remove dependence on depth by converting encodings of long interior paths using count labels

)log(2 nv

v

Page 11: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Spanners and Stretch

• Stretch of a subgraph H is the maximum ratio of the distance between two nodes in H to that between them in G– Extensively studied in the graph algorithms and graph

theory literature [Eppstein 96]• Distance stretch and topological stretch• A spanner is a subgraph that has constant stretch

– The Delaunay triangulation yields a planar Euclidean distance-spanner

– The Yao-graph [Yao 82] is also a simple distance-spanner

Page 12: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Energy Stretch and Energy Spanners

• Commonly adopted power attenuation model:– is between 2 and 4

• Assuming uniform threshold for reception power and interference/noise levels, energy consumed for transmitting from to needs to be proportional to

• Power control: Radios have the capability to adjust their power levels so as to reach destination with desired fidelity

• Energy consumed along a path is simply the sum of the transmission energies along the path links

• Define energy-stretch analogous to distance-stretch

distancepowerTransmit

Power Received

u v ),( vud

Page 13: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Energy-Aware Routing

• A path with many short hops consumes less energy than a path with a few large hops– Which edges to use? (Considered in topology control)– Can maintain “energy cost” information to find minimum-energy

paths [Rodoplu-Meng 98]

• Routing to maximize network lifetime [Chang-Tassiulas 99]– Formulate the selection of paths and power levels as an

optimization problem– Suggests the use of multiple routes between a given source-

destination pair to balance energy consumption

• Energy consumption also depends on transmission rate– Schedule transmissions lazily [Prabhakar et al 2001]– Can split traffic among multiple routes at reduced rate [Shah-

Rabaey 02]

Page 14: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Topology Control

• Given:– A collection of nodes in the plane– Transmission range of the nodes

(assumed equal)

• Goal: To determine a subgraph of the transmission graph G that is– Connected – Low-degree– Small stretch, hop-stretch, and power-

stretch

Page 15: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

The Yao Graph

• Divide the space around each node into sectors (cones) of angle

• Each node has an edge to nearest node in each sector

• Number of edges is

• For any edge (u,v) in transmission graph– There exists edge (u,w) in same sector such that w is closer to v than u is

• Theorem: The Yao Graph has stretch ))2/sin(21/(1

)(nO

u

wv

Page 16: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Dominating Set

• Applications Facility location– A set of -dominating centers can be selected to

locate servers or copies of a distributed directory– Dominating sets can serve as location database for

storing routing information in ad hoc networks [Liang Haas 00]

• NP-hard for general graphs• Reduces to the minimum set cover problem• Recall last time: Greedy gives logn

approximation• Admits a PTAS for planar graphs [Baker 94]

k

Page 17: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

• An Example

Greedy Algorithm

Page 18: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Hierarchical Network Decomposition

• Sparse neighborhood covers [Awerbuch-Peleg 89, Linial-Saks 92]– Applications in location management, replicated data

management, routing– Provable guarantees, though difficult to adapt to a

dynamic environment

• Routing scheme using hierarchical partitioning [Dolev et al 95]– Adaptive to topology changes– Weak guarantees in terms of stretch and memory per

node

Page 19: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Sparse Neighborhood Covers

• An r-neighborhood cover is a set of overlapping clusters such that the r-zone of any node is in one of the clusters

• Aim: Have covers that are low diameter and have small overlap

• Overlap is measured by the max number of clusters a node is in

• Tradeoff between diameter and overlap– Set of all r-zones: Have diameter 2r but overlap n– The entire network single cluster: Overlap 1 but diameter could

be n

• Sparse r-neighborhood with O(r log(n)) diameter clusters and O(log(n)) overlap [Peleg 89, Awerbuch-Peleg 90]

Page 20: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Sparse Neighborhood Covers

• Set of sparse neighborhood covers– { -neighborhood cover: }

• For each node:– For any , the -zone is contained within a

cluster of diameter – The node is in clusters

• Applications:– Tracking mobile users– Distributed directories for replicated objects

r)log( nrO

)(log2 nO

ni log0

r

i2

Page 21: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Online Tracking of Mobile Users

• Given a fixed network with mobile users• Need to support location query operations• Home location register (HLR) approach:

– Whenever a user moves, corresponding HLR is updated

– Inefficient if user is near the seeker, yet HLR is far

• Performance issues:– Cost of query: ratio with “distance” between source

and destination– Cost of updating the data structure when a user

moves

Page 22: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Mobile User Tracking: Initial Setup

• The sparse -neighborhood cover forms a regional directory at level

• At level , each node u selects a home cluster that contains the -zone of u

• Each cluster has a leader node.

• Initially, each user registers its location with the home cluster leader at each of the levels

i2i

)(lognO

i2i

Page 23: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

The Location Update Operation

• When a user X moves, X leaves a forwarding pointer at the previous host.

• User X updates its location at only a subset of home cluster leaders– For every sequence of moves that add up to

a distance of at least , X updates its location with the leader at level

• Amortized cost of an update is for a sequence of moves totaling distance

i2i

)log( ndO

d

Page 24: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

The Location Query Operation

• To locate user X, go through the levels starting from 0 until the user is located

• At level , query each of the clusters u belongs to in the -neighborhood cover

• Follow the forwarding pointers, if necessary• Cost of query: , if is the

distance between the querying node and the current location of the user

i2i

)log( ndOd

)(lognO

Page 25: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Comments on the Tracking Scheme

• Distributed construction of sparse covers in time [Awerbuch et al 93]

• The storage load for leader nodes may be excessive; use hashing to distribute the leadership role (per user) over the cluster nodes

• Distributed directories for accessing replicated objects [Awerbuch-Bartal-Fiat 96]– Allows reads and writes on replicated objects– An -competitive algorithm assuming each

node has times more memory than the optimal

• Unclear how to maintain sparse neighborhood covers in a dynamic network

)loglog( 2 nnnmO

)(lognO)(lognO

Page 26: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Bubbles Routing and Partitioning Scheme

• Adaptive scheme by [Dolev et al 95]

• Hierarchical Partitioning of a spanning tree structure

• Provable bounds on efficiency for updates

2-level partitioningof a spanning tree

root

Page 27: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Bubbles (cont.)

• Size of clusters at each level is bounded

• Cluster size grows exponentially

• # of levels equal to # of routing hops

• Tradeoff between number of routing hops and update costs

• Each cluster has a leader who has routing information

• General idea:

- route up the tree until in the same cluster as destination,

- then route down

- maintain by rebuilding/fixing things locally inside subtrees

Page 28: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Bubbles Algorithm

• A partition is an [x,y]-partition if all its clusters are of size between x and y

• A partition P is a refinement of another partition P’ if each cluster in P is contained in some cluster of P’.

• An (x_1, x_2, …, x_k)-hierarchical partitioning is a sequence of partitions P_1, P_2, .., P_k such that

- P_i is an [x_i, d x_i] partitioning (d is the degree)

- P_i is a refinement of P_(i-1)

• Choose x_(k+1) = 1 and x_i = x_(i+1) n1/k

Page 29: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

Clustering Construction

• Build a spanning tree, say, using BFS

• Let P_1 be the cluster consisting of the entire tree

• Partition P_1 into clusters, resulting in P_2

• Recursively partition each cluster

• Maintenance rules:

- when a new node is added, try to include in existing cluster, else split cluster

- when a node is removed, if necessary combine clusters

Page 30: CS728 Lecture 17 Web Indexes III. Last Time Showed how build indexes for graph connectivity Based on 2-hop covers Today Look at more general problem of

• memory requirement

• adaptability

• k hops during routing

• matching lower bound for bounded degree graphs

• Note: Bubbles does not provide a non-trivial upper bound

on stretch in the non-hop model

Performance Bounds

kk nd /123

nkdn k log/11