the theory of zeta graphs with an application to random networks christopher ré stanford

27
The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Upload: rebeca-coates

Post on 11-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

The Theory of Zeta Graphs with an Application to Random Networks

Christopher RéStanford

Page 2: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Social Network Data

Social network data is ubiquitous and high value.

Since 2000, many studies of the dynamics of these graphs, Watts-Strogatz, Preferential Attachment, etc.

Design new random graph models to capture some new aspect of an observed network.

Above is not the goal of this work…

Page 3: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

What’s the matter with Erdös-Rényi?

G(N,p) does not match real-world graphs (degree distribution, diameter)

But we have a beautiful theory of G(N,p) (zero-one laws, the “movie”, threshold phenomenon, ….)

Much of this work enabled by simple, declarative G(N,p).

Find an ER-like model to replace generative models for DB theory-style theorems?

May lead to rigorous hypothesis testing for these models (key question in motifs).

Page 4: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Which model should we study?

“At each time step, a new vertex is added. Then, with probability δ, two vertices are chosen uniformly at random and joined by an undirected edge.” – CHKNS

Many models. For this study: simple & popular.

Callway, Hopcroft, Kleinberg, Newman, Strogatz (CHKNS)

CHKNS captures one salient aspect of many models: Arrival order of node affect its properties.

NB: Does not capture all phenomenon of interest.

Page 5: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Zeta GraphsSimple model to capture “arrival order”

NB: We’ll use a directed variant, all queries are binary since its easier to describe.

Page 6: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Zeta graphs

Bare bones model to break symmetry: 1 connects to many nodes (~ log N).N connects to 1 node (in expectation)

ER-like: Edges are present independently.

Zeta graphs are a family of sets of graphs indexed by N

Fixed node set: [N] = {1,…,N} (Index ≈ arrival order)

Stochastic edge set (independent edges)

Page 7: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Informal Main Result

Conjunctive Graph Queries cannot distinguish between Zeta graphs and CHKNS as N to ∞.

1. Determine the Theory of Zeta Graphs

2. Show the Theory of CHKNS is sandwiched between two “slices” of Zeta Graphs.

Here, Theory is set of CQs with probability 1

Page 8: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

1st Technical Challenge:Graph Patterns

Page 9: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Our goal for this sectionGiven(1) a Language of Boolean queries L, and (2) a family of probability models M(1), M(2), …,M(N) check if limN to ∞ PrM(N)[q] = 1 for q in L

For the talk:(1) L will be “graph patterns” positive conjunctive

queries over binary relations.(2) The family of probability models M(N)=

“Theory” Th(L,M) = { q in L : limN to ∞ PrM(N)[q] = 1 }

Page 10: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Boolean Query Answering on ER Graphs

(2) Compute expected number of tuples.

(1) Form “full query” corresponding to q.

(3) Use Janson’s Inequality to relate E[Q] to Pr[q]

Page 11: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Recall: Classical Janson’s InequalityA classical sufficient condition for Pr[q] to 1.

A Q(c) and Q(d) properly overlap if they are not identical, but they share at least one identical subgoal

see Alon & Spencer, Random Graphs

A corollary of Janson’s inequality is:

Page 12: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Boolean Query Answering on ER Graphs

(2) Compute expected number of tuples.

(1) Form “full query” corresponding to q.

(3) Use Janson’s Inequality to relate E[Q] to Pr[q]

What changes for Zeta graphs?

Page 13: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Computing Expectation

Page 14: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Multiple Valued Zeta (MVZ) Functions

Only use integer si in this talk

MVZs show up in some strange places…

Page 15: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Order Matters: Paths of Length 2

If x < y < z

If x < z < y

So in our “atoms” variables will be totally ordered.

0 1 1

00 2

Page 16: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Why Multiple-Valued Zeta (MVZ)?

Well-studied special function. We get for free:

1. Asymptotics [Costermans et al. 2005]

2. Algebraic Identities [Zudilin & Zudilin 2003]

3. Fancy sounding function (not helpful)

Page 17: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Asymptotic Estimates for MVZsThis is a small variation of Costermans et al. result.

(expected # of edges)

(expected # of triangles)

(expected # of K4)

Page 18: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Indicates shared identical goal

Pr[2 Paths]

Consider pairs of properly overlapping 2 paths.

And others o(E[Q]2) and since E[Q] = w(1), Pr[Q] = 1 – o(1)

0

0

0

0

1

2 1

1 1

1 1

Page 19: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Two cycles you’re out!

rcycle

scycle

(1) For all r, s ≥ 2, PrM(N)[ B(r,s) ] < 1 – e for some fixed e > 0 as N to ∞, i.e., no bicycles.

B(r,s)

(2) Any connected graph q with at most one cycle appears with probability 1.

1st result:

Two Parts: (A) Any individual pattern, check E, and(B) Different “orderings” are non-negatively correlated.

Page 20: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Back to CHKNS

Page 21: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Central Message

How different is CHKNS from the family of Zeta graphs?

Up to CQs, the answer is not at all.

Page 22: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Key Technical Issues

1. CHKNS Edge probabilities have a painful form.– But can be sandwiched by “Zeta slices”

2. CHKNS Edges are correlated!- Develop bounds on correlations

3. Show that CHKNS can be essentially embedded in a part of Zeta graphs.

Goal: Establish that Th(“Graph Patterns”, CHKNS ) = Th(“Graph Patterns”, Zeta Graphs)

Page 23: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Other Related Work

Graph Models. Huge amounts. Volumes!

[Lynch 05]: Conditions on a skewed degree distribution, but symmetrizes labels.• Proves a 0-1 law for all of FO! • Zeta graphs and CHKNS have no 0-1 law.• Inspired by this paper!

Page 24: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Future Work & Conclusion

“Conjunctive” theory of simple random graph models with order.

• Does a simpler model capture CHKNS?

• Could one capture Albert & Barabasi’s preferential attachment model?

• Richer Languages?

Page 25: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford
Page 26: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Expectations for Ordered Graphs

Since sensitive to order, consider graph patterns with order among variables.

Then expectation has a semi-closed form.

This function has an MVZ

Page 27: The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford

Computing Expectations of General CQs

If variables in Q are totally ordered, then we can compute E[Q] using MVZs.

Obvious algorithm: given a query, add in equality and inequality in all possible ways.

This takes exponential time in Q (#P-hard).