client assignment in content dissemination networks for dynamic data shetal shah krithi ramamritham...
DESCRIPTION
Coherency of Dynamic Data Strong coherency The client and source always in sync (U(t) = S(t)) Strong coherency is expensive! Relax strong coherency: - coherency Time domain: t - coherency Value domain: v - coherency The difference in the data values at the client and the source bounded by v at all times E.g.: temperature changes greater than 1 degree Source S(t) Repository R(t) Clien t U(t)TRANSCRIPT
Client Assignment in Content Dissemination Networks for Dynamic Data
Shetal ShahKrithi Ramamritham
Indian Institute of Technology Bombay
Chinya Ravishankar
University of California Riverside
Dynamic Data
Traffic data packets thru switches / vehicles on highways
Stock prices, Sport Scores
• rapid and unpredictable changes• time critical, value critical• used in on-line monitoring, decision making
More and more of data gathered from the web/internet is dynamic
Coherency of Dynamic Data
Strong coherency The client and source always in sync (U(t) = S(t)) Strong coherency is expensive!
Relax strong coherency: - coherency Time domain: t - coherency Value domain: v - coherency
The difference in the data values at the client and the source bounded by v at all times
E.g.: temperature changes greater than 1 degree
SourceS(t)
RepositoryR(t)
ClientU(t)
vtStUt |)()(|,
Broad Focus of work To create a scalable content dissemination network
(CDN) for streaming/dynamic data.
Metric: Fidelity: % of time coherency requirement is met
• Clients request for different data items by specifying coherence requirements for each item
• Repositories derive their requirements from the client requirements
• Source pushes the changes of interest to repositories
• Repositories cooperate with each other and the source to serve clients
Basic Framework: Sources, Repositories, Clients
Example Dissemination NetworkData Set: p, q, r Max Clients : 2
Source
p: 0.2, q : 0.2 r: 0.2
p: 0.4, r: 0.3 q: 0.3
R1 R2
R4 R3
Challenges – I
Given the data and coherency needs of repositories, how should repositories cooperate to satisfy these
needs?
How should repositories refresh the data such that coherency requirements of dependents are satisfied?
How to make repository network resilient to failures? [VLDB02, VLDB03, IEEE TKDE]
Challenges – II:Service to Clients
Given the data and coherency needs of clients
what data at what coherency should reside in each repository?
Given the data and the coherency available at repositories,
how to assign clients to the repositories?
Service toClients
Assign datato
repositories
Assign clientsto
repositories
Assigning clients to repositories
Client request is satisfied
Overheads are low Communication
delay Computational
delayC1
R2
Source
p:0.2, q:0.2 r:0.2
p:0.4, r: 0.3 q: 0.3
R1
R4 R3
q:0.3?
Assign <client, data- item, coherence> to repository
Overview Client assignment problem is NP-Hard Solve using preferences
Clients and repositories order each other by preferences
Use Stable Marriages Assign costs and do many-to-one client-repository
pairing
Cost based Client Assignment
• Assign cost to each potential <client request, repository> pair• Minimum Cost Assignment = {1,3,7} 7
6
1
3
89
5
<client, data item, coherence>Repositories
Client Assignment
• An assignment may contribute to delay for other assignments at the same node
• Assignment = {1,3,8}
7
6
1
3
89
5
Minimum Weight Matching<client, data item, coherence>
Repositories
Many-to-one Matching: Min Cost Network Flows
Directed graph, G={V, E} Start vertex End vertex or sink Edge
Capacity: maximum flow the edge can have
Cost: per unit flow Intermediate vertex
Inflow = outflow
Start
End
4 2 3
22 2
5
2
8
Maximum Flow
Value of the flow: flow leaving the source Maximum flow: value of flow is maximum Cost of flow = edgesflow * cost per unit flow) Min Cost Flow: maximum flow of minimum cost
2
Start
End
2 2 3
2 1
5
2
2
Client Assignment Using Network Flows
X,Y, Z : number of clients the repository is willing to serve
Capacity of <source, client request>
edge = 1
Sum of capacities on <repository, sink> edges number of client requests
End
1Start
X YZ
1 1
Network Flows: Costs and Capacities
<client request, repository> edge
Capacity : 1 Cost: function of
communication delays and coherence requirement
Cost of all other edges:0
Start1 1
1 1 11
XY Z
Max Flows Flow out of start node =
number of client requestsEach unit of flow makes one
assignment Cost of unit flow = cost of
assignment Maximum Flow of
minimum Cost => required solution
But this could overload the repositories!
Start1 1
1 1 11
XY Z
Considering Load:Iterative Min Cost Flows Load depends on the coherence
requirement of the assignments Assignments depend on this load! Limit the number of requests assigned
to a repository using <repository, sink> capacity
But this number does not translate into load It translates to load if coherences are close
to each other
Iterative Min Cost Flows
Split the requests into ranges. For each range:
Calculate the approximate load at each repository due to the previous assignments
Calculate the approximate load of the assignments to be made in this range
Determine the capacity of each repository
Find min-cost max flow
For Each Range Number of updates for coherence ci is ci
-2
Approximate load at a repository:Ai. Average load A. For n client requests, expected load = n * ci
-2
Number of repositories: k Let ti be the number of assignments in the
current range to repository Ri Total load at Ri will be Ai + ti * ci
-2
Average load at R after assignment =
Capacity for Ri
)(2ii AAc
kn
2 icknA
Best Effort Service
Source
p:0.2, q:0.2 r:0.2
p:0.4, r: 0.3 q: 0.3
R1
R4 R3
q:0.1C1
Client will be served q at coherence 0.2
R2
Augmentation
Source
p:0.2, q:0.1 r:0.2
p:0.4, r: 0.3 q: 0.3
R1
R4 R3
q:0.1C1
Coherence of A for q is changed to 0.1. R2
Experimental Methodology Network: 1 source, 10 - 20 repositories,
10,000 – 80,000 client requests Real stock traces: 100-1000 Time duration of observations: 10,000 s Ranges for min cost flow: {0.01-0.03,
0.03- 0.07, 0.07-0.2, 0.2-1.0} Network Flow Solver: RelaxIV from
www.di.unipi.it/di/groups/optimize/ORGroup.html
For comparison… Prior online Global Heuristic
Selector node for each data item Selector keeps information of
coherence requirements at repositories delays between the nodes in the network number of clients assigned to each repository
Client is assigned to a repository where the sum of the delays is minimized.
Two flavours: GHIS, GHESS. Agarwal et al. Construction of a Temporal Coherency Preserving Dynamic Data Dissemination Network. RTSS’04
Performance of the algorithms
GHIS does better than MCF, GHES initially, but degrades rapidly
unsatisfied requests source overloading!
Augmentation performs very well
GHES and MCF are comparable for small number of repositories
50% client requests between 0.01 to 0.09. Remaining from 0.1 to 0.99
GHIS
MCF vs GHES (best effort)
MCF does better as the number of repositories increase
In fact for some simple inputs, MCF did better than GHES by a factor of 9!
Topology: 1 source, 10 repositories, 50 data items
GHISGHESMCF
Augmentation helps, but…
as the load
increases, augmentation increases
loss in fidelity
As load increases, serving clients at less stringent coherence requirements might actually reduce the loss in fidelity!
Need to adapt to load– Fair vs. biased approaches
Fair Approach Biased Approach
It is better to be biased than to be fair!
MCF_aug MCF_augMCF
Adaptive Algorithm For each data item, source maintains a list of
unique coherences and the number of clients for each coherence
If the queuing delay at any source/repository crosses a threshold th1
For each data item, the source reduces the coherence of service for some clients
If the queuing delays at any source/repository goes below a threshold th2. Resume service at desired coherency to some of the
clients
Performance of the adaptive algorithm
Augmented adaptation performs the best!
Conclusions and Current Work
Conclusions We prove that the client assignment
problem is NP-Hard Develop two new heuristics for the client
assignment problem Develop an adaptive algorithm for client
assignmentCurrent Work Investigation of the algorithms in real
network settings – Planet Lab.
Thank You!