matrix factorizations for computer network tomographybindel/present/2011-07-householder.pdfmatrix...
TRANSCRIPT
Matrix Factorizations forComputer Network Tomography
David Bindel
Department of Computer ScienceCornell University
17 Jun 2011
D. Bindel (Cornell University) Householder 2011 1 / 32
How this started
“I typed in the SVD from Numerical Recipes; it ran for a few days, thentold me it wouldn’t converge. What can I do?”
D. Bindel (Cornell University) Householder 2011 2 / 32
A fuzzy picture
WAN
D. Bindel (Cornell University) Householder 2011 3 / 32
A problem path
WAN
D. Bindel (Cornell University) Householder 2011 4 / 32
Overlays to the rescue?
WAN
D. Bindel (Cornell University) Householder 2011 5 / 32
Overlays and measurement
WAN
Measure a few paths to infer:Path properties (Chen, B., Song, Chavez, Katz: 2003, 2004, 2007)
Link properties (Zhao, Chen, B.: 2006, 2009)
Routing topology? (underway)
D. Bindel (Cornell University) Householder 2011 6 / 32
Discrete Radon transform
Radon transform:
(Ru)(L) =
∫L
u(x) |dx|
WAN
Discrete version:
(Gu)i =∑
j∈links(i)uj
D. Bindel (Cornell University) Householder 2011 7 / 32
Additive metrics and path matrices
For additive metrics (log(P(transmission)), latency, jitter, ...):
Gu = b
wherebi = property of i th end-to-end pathuj = property of link j
Gij =
{1 if path i uses link j0 otherwise
Today’s goal: A structured rank-revealing decomposition of G
D. Bindel (Cornell University) Householder 2011 8 / 32
Properties of G
Network Matrix GRouting path Row of GShort paths Sparse GRouting table updates Low rank updates to G
k = rank(G) < # links� # paths (for n sufficiently large).Ex: 500 nodes (of 100K), G is 249500× 33237, k = 9643.
D. Bindel (Cornell University) Householder 2011 9 / 32
Identifiability
If both paths are flaky, what link is to blame?
D. Bindel (Cornell University) Householder 2011 10 / 32
Network virtualization and matrix factorization
Factor out a zero-one “virtualization matrix”:
G(:, fan) =[c1 c2 c1 + c2 c1 + c2
]=[c1 c2
] [1 0 1 10 1 1 1
]Even virtual links may be “unidentifiable.”
D. Bindel (Cornell University) Householder 2011 11 / 32
Network virtualization and matrix factorization
Write network virtualization as
G = Gv S
where
Gij =
{1 if path i uses link j0 otherwise
Gvik =
{1 if path i uses virtual link k0 otherwise
Skj =
{1 if virtual link k includes link i0 otherwise
This handles (some) column dependencies. What about rows?
D. Bindel (Cornell University) Householder 2011 12 / 32
The junction pattern
B2
A1
A2
B1
A1B1 − A2B1 + A2B2 = A3B3
D. Bindel (Cornell University) Householder 2011 13 / 32
The junction pattern
B2
A1
A2
B1
A1B1−A2B1 + A2B2 = A3B3
D. Bindel (Cornell University) Householder 2011 14 / 32
The junction pattern
B2
A1
A2
B1
A1B1 − A2B1+A2B2 = A3B3
D. Bindel (Cornell University) Householder 2011 15 / 32
The junction pattern
B2
A1
A2
B1
A1B1 − A2B1+A2B2 = A3B3
D. Bindel (Cornell University) Householder 2011 16 / 32
The junction pattern
B2
A1
A2
B1
A1B1 − A2B1 + A2B2= A3B3
D. Bindel (Cornell University) Householder 2011 17 / 32
The junction pattern
B2
A1
A2
B1
A1B1 − A2B1 + A2B2 = A1B2
D. Bindel (Cornell University) Householder 2011 18 / 32
More complicated junctions?
D. Bindel (Cornell University) Householder 2011 19 / 32
Junctions and bipartite graphs
Define a bipartite router graph at r :Path segments from sources to r are nodes on the leftPath segments from r to destinations are nodes in the rightEdges indicate complete paths traversing r
D. Bindel (Cornell University) Householder 2011 20 / 32
Junctions and bipartite graphs
Spanning trees in the router graph=⇒
spanning sets among path vectors
D. Bindel (Cornell University) Householder 2011 21 / 32
Junctions and bipartite graphs example
D. Bindel (Cornell University) Householder 2011 22 / 32
Junction elimination and factorization
Gr = matrix of path vectors for paths crossing r :
Gr =[ES ED
] [PSPD
]=
[I
Tr
] [ES ED
] [PSPD
]=
[I
Tr
] [Gr]
whereRows of P are path segments to/from the routerRows of E indicate how path segments sum to form pathsE corresponds to spanning tree edgesG is the corresponding subset of pathsRows of Tr consist of ±1 entries (and zeros) corresponding topaths through the spanning tree
D. Bindel (Cornell University) Householder 2011 23 / 32
Combining router results
B
A A
B
Process each path in turn, build router forests incrementally.Processing a path from A to B through r , have either
1 A− r and r − B in same component =⇒could infer path at r from existing paths
2 A− r and r − B in different components =⇒might make new inferences via this path
D. Bindel (Cornell University) Householder 2011 24 / 32
Elimination algorithm
To process path from A to B:
for each router r on pathupdate hash h of route up to rif (A,h)-(r,B) in router graphmark that path can be inferred at r
elseadd (A,h)-(r,B) to router graphfor each edge e from source in [(A,h)]if e goes to component for (A,h) and
edge is not already marked thenput edge on top of list to be processed
if path was not inferred, mark as measured
D. Bindel (Cornell University) Householder 2011 25 / 32
Matrix factorization perspective
Junction elimination yields
PG =
[IT
]G,
where the first factor is a product of matrices of the form[I
Tk
]with rows of Tk representing paths through router graphs.
D. Bindel (Cornell University) Householder 2011 26 / 32
Matrix factorization perspective
Can combine with topology virtualization:
PG =
[IT
]Gv S
where S is a zero-one virtualization matrix.
D. Bindel (Cornell University) Householder 2011 27 / 32
Storage
Sufficient to store:Each router graph (≈ nnz(G) edges)Union-find structures for tracking componentsMarkers for which paths are measuredRouter used for inference for each inferred path
D. Bindel (Cornell University) Householder 2011 28 / 32
Choice of representative paths
Spanning trees are not unique! Want representative paths such thatThere are few redundant measurementsNo host (or router) is overloaded with measurement trafficMost inferences involve few paths
Don’t know how to do this yet...
D. Bindel (Cornell University) Householder 2011 29 / 32
Summary
Path matrix G useful for network inference:Measure a few paths, infer restMeasure a few paths, estimate link behaviors
Cost of factoring G limited scalability.
New factorization for the path matrix G isCompact (proportional to G in storage)Easy to computeNearly rank-revealingFaithful to the underlying problem structure
D. Bindel (Cornell University) Householder 2011 30 / 32
Questions?
I have lots more questions:How should we process paths for load balancing, etc?Can these methods be distributed?Are there other places where this factorization applies?
Questions from you?
D. Bindel (Cornell University) Householder 2011 31 / 32
References
Chen, B., Song, Chavez, Katz. Algebra-based scalable overlaynetwork monitoring: Algorithms, evaluation, and applications.ACM ToN, 17(6), 2009.Zhao, Chen, B. Towards unbiased end-to-end network diagnosis.ACM ToN, 15(5), 2007
D. Bindel (Cornell University) Householder 2011 32 / 32