mathematical foundations for search and surf engines

55
Mathematical Foundations for Search and Surf Engines

Upload: wayne

Post on 05-Feb-2016

27 views

Category:

Documents


2 download

DESCRIPTION

Mathematical Foundations for Search and Surf Engines. Jean-Louis Lassez. Ryan Rossi. Lecture 1 & 2. Introduction Ergodic Theorem Perron-Frobenius Theorem Power Method Foundations of PageRank. Search Engines Search engine: deterministic Ranking of sites - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mathematical Foundations for Search and Surf Engines

Mathematical Foundations for Search and Surf Engines

Page 2: Mathematical Foundations for Search and Surf Engines

Jean-Louis Lassez

Page 3: Mathematical Foundations for Search and Surf Engines

Ryan Rossi

Page 4: Mathematical Foundations for Search and Surf Engines

Lecture 1 & 2

• Introduction• Ergodic Theorem• Perron-Frobenius Theorem• Power Method• Foundations of PageRank

Page 5: Mathematical Foundations for Search and Surf Engines

Search EnginesSearch engine: deterministicRanking of sitesMathematical foundations: Markov and PerronFrobenius

Surf EnginesNon deterministic: window shoppingRanking of linksMathematical foundations: Singular ValueDecomposition

Page 6: Mathematical Foundations for Search and Surf Engines

Initial approach to ranking sites:the in-degree heuristic

1

2

6 7

5

3

8

4

Page 7: Mathematical Foundations for Search and Surf Engines

Google’s PageRank approach

Problem: rank the sites in order of most visited

Page 8: Mathematical Foundations for Search and Surf Engines

Another Approach: Hubs and Authorities

Authorities: sites that contain the mostimportant information

Hubs: sites that provide directions to theauthorities

AH

Page 9: Mathematical Foundations for Search and Surf Engines

Ranking Hyperlinks

• Local• Global• Update

Page 10: Mathematical Foundations for Search and Surf Engines

Local…..

…..

…..

…..

?

?

?

Page 11: Mathematical Foundations for Search and Surf Engines

Global

??

…..…

..…

..

…..

Page 12: Mathematical Foundations for Search and Surf Engines

2005

A

C

D F

E

B

G

H

J

I

K

2006

L

Updated

Page 13: Mathematical Foundations for Search and Surf Engines

Part I: Ranking Sites

• The Web as a Markov Chain• The Ergodic Theorem• Perron-Frobenius Theorem• Algorithms: PageRank, HITS, SALSA

Page 14: Mathematical Foundations for Search and Surf Engines

Markov Chains

• Text Analysis• Speech Recognition• Statistical Mechanics• Decision Science: Medicine, Commerce

….more recently…..

Page 15: Mathematical Foundations for Search and Surf Engines

Bioinformatics

M1 M2 M3

d1

I1

d2

I2

d1

I1

Page 16: Mathematical Foundations for Search and Surf Engines

Internet

Page 17: Mathematical Foundations for Search and Surf Engines

Markov’s Ergodic Theorem (1906)

Any irreducible, finite, aperiodic Markov Chain has all states Ergodic (reachable at any time in the future) and has a unique stationary distribution, which is a probability vector.

Page 18: Mathematical Foundations for Search and Surf Engines

s1

.45

1s2

s3 s4

.25

.6

.75.55.4

Probabilities after 15 stepsX1 = .33X2 = .26X3 = .26X4 = .13

30 stepsX1 = .33 X2 = .26X3 = .23X4 = .16

100 stepsX1 = .37 X2 = .29X3 = .21X4 = .13

500 stepsX1 = .38 X2 = .28X3 = .21X4 = .13

1000 stepsX1 = .38 X2 = .28X3 = .21X4 = .13

Page 19: Mathematical Foundations for Search and Surf Engines

s1

.45

1s2

s3 s4

.25

.6

.75.55.4

Probabilities after 15 stepsX1 = .46X2 = .20X3 = .26X4 = .06

30 stepsX1 = .36 X2 = .26X3 = .23X4 = .13

100 stepsX1 = .38 X2 = .28X3 = .21X4 = .13

500 stepsX1 = .38 X2 = .28X3 = .21X4 = .13

1000 stepsX1 = .38 X2 = .28X3 = .21X4 = .13

Page 20: Mathematical Foundations for Search and Surf Engines

p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2

p13x1 + p23x2 + p33x3 = x3

∑xi = 1 xi ≥ 0

p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2

p13x1 + p23x2 + p33x3 = x3

∑xi = 1 xi ≥ 0

X1, X2, X3 are probabilities

p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2

p13x1 + p23x2 + p33x3 = x3

∑xi = 1 xi ≥ 0

Probability of going to S1 from S2

Steady State Constraints

Page 21: Mathematical Foundations for Search and Surf Engines

Solved by Power Iteration Method Based on Perron-Frobenius

Theorem

Where e is an initial vector and M is the stochasticmatrix associated to the system.

eM n

n lim

Page 22: Mathematical Foundations for Search and Surf Engines

Power Method Example

M =

0 1 0 01/2 0 1/2 01/2 0 0 1/21 0 0 0

1

4 3

2

e =

1111

0.36360.36360.18180.0909

eM n

n lim

Page 23: Mathematical Foundations for Search and Surf Engines

Power Method Example

M =

0 1 0 01/2 0 1/2 01/2 0 0 1/21 0 0 0

1

4 3

2

e =

210073

0.36360.36360.18180.0909

eM n

n lim

Page 24: Mathematical Foundations for Search and Surf Engines

Difficulties• Proof, Design and Implementation

• Notion of convergence• Deal with complex numbers• Unique solution………

Furthermore, it does not always work

Page 25: Mathematical Foundations for Search and Surf Engines

The process does not converge, even though the solutionis obvious.

1 6

3 452

Page 26: Mathematical Foundations for Search and Surf Engines

Perron-Frobenius Theorem: Is it really

necessary?

Page 27: Mathematical Foundations for Search and Surf Engines

Secret Weapon:

• Fourier• Gauss• Tarski• Robinson

The Power of Elimination

Page 28: Mathematical Foundations for Search and Surf Engines

The elimination of the proof is an idealseldom reached

Elimination gives us the heart of the proof

Page 29: Mathematical Foundations for Search and Surf Engines

Symbolic Gaussian EliminationThree variables:

p11x1 + p21x2 + p31x3 = x1

p12x1 + p22x2 + p32x3 = x2

p13x1 + p23x2 + p33x3 = x3

∑xi = 1

with Maple we get:x1 = (p31p21 + p31p23 + p32p21) / Σ

x2 = (p13p32 + p12p31 + p12p32) / Σ

x3 = (p13p21 + p12p23 + p13p23) / Σ

Σ = (p31p21 + p31p23 + p32p21 + p13p32 + p12p31 + p12p32 +

p13p21 + p12p23 + p13p23)

Page 30: Mathematical Foundations for Search and Surf Engines

s1

p12

p21p11

s2p22

s3

p33

p13p23

p31 p32

x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ

Page 31: Mathematical Foundations for Search and Surf Engines

s1

p12

p21p11

s2p22

s3

p33

p13p23

p31 p32

x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ

Page 32: Mathematical Foundations for Search and Surf Engines

s1

p12

p21p11

s2p22

s3

p33

p13p23

p31 p32

x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ

Page 33: Mathematical Foundations for Search and Surf Engines

s1

p12

p21p11

s2p22

s3

p33

p13p23

p31 p32

x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ

Page 34: Mathematical Foundations for Search and Surf Engines

s1

p12

p21p11

s2p22

s3

p33

p13p23

p31 p32

x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ

Page 35: Mathematical Foundations for Search and Surf Engines

s1

p12

p21p11

s2p22

s3

p33

p13p23

p31 p32

x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ

Page 36: Mathematical Foundations for Search and Surf Engines

Symbolic Gaussian Elimination

System with four variables:

p21x2 + p31x3 + p41x4 = x1

p12x1 + p42x4 = x2

p13x1 = x3

p34x3= x4

∑xi = 1

Page 37: Mathematical Foundations for Search and Surf Engines

s1

p12

p21s2

s3

p31

s4

p41

p34

p42p13

x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ

Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12

+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

with Maple we get:

Page 38: Mathematical Foundations for Search and Surf Engines

s1

p12

p21s2

s3

p31

s4

p41

p34

p42p13

x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ

Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12

+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

Page 39: Mathematical Foundations for Search and Surf Engines

s1

p12

p21s2

s3

p31

s4

p41

p34

p42p13

x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ

Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12

+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

Page 40: Mathematical Foundations for Search and Surf Engines

s1

p12

p21s2

s3

p31

s4

p41

p34

p42p13

x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ

Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12

+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

Page 41: Mathematical Foundations for Search and Surf Engines

s1

p12

p21s2

s3

p31

s4

p41

p34

p42p13

x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ

Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12

+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

Page 42: Mathematical Foundations for Search and Surf Engines

Do you see the LIGHT?

James Brown (The Blues Brothers)

Page 43: Mathematical Foundations for Search and Surf Engines

s1

p12

p21s2

s3

p31

s4

p41

p34

p42p13

x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ

Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12

+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

Page 44: Mathematical Foundations for Search and Surf Engines

Ergodic Theorem Revisited

If there exists a reverse spanning tree in a graph of theMarkov chain associated to a stochastic system, then:

(a)the stochastic system admits the followingprobability vector as a solution:

(b) the solution is unique.(c) the conditions {xi ≥ 0}i=1,n are redundant and thesolution can be computed by Gaussian elimination.

Page 45: Mathematical Foundations for Search and Surf Engines

Cycle

This system is now covered by the proof

1

6

3

4

5

2

Page 46: Mathematical Foundations for Search and Surf Engines

Markov Chain as a Conservation System

s1

p12

p21p11s2

p22

s3

p33

p13 p 23

p31p32

p11x1 + p21x2 + p31x3 = x1(p11 + p12 + p13) p12x1 + p22x2 + p32x3 = x2(p21 + p22 + p23)p13x1 + p23x2 + p33x3 = x3(p31 + p32 + p33)

Page 47: Mathematical Foundations for Search and Surf Engines

Kirchoff’s Current Law (1847)• The sum of currents

flowing towards a node is equal to the sum of currents flowing away from the node.

i31 + i21 = i12 + i13

i32 + i12 = i21

i13 = i31 + i32

1 2

3

i13

i31 i32

i12

i21

i3 + i2 = i1 + i4

Page 48: Mathematical Foundations for Search and Surf Engines

Kirchhoff’s Matrix Tree Theorem (1847)

The theorem allows us to calculate the numberof spanning trees of a connected graph

Page 49: Mathematical Foundations for Search and Surf Engines

Two theorems for the price of one!!DifferencesErgodic Theorem:The symbolic proof is simpler and moreappropriate than using Perron-Frobenius

Kirchhoff Theorem:Our version calculates the spanning trees andnot their total sum.

Page 50: Mathematical Foundations for Search and Surf Engines

Internet Sites• Kleinberg• Google• SALSA

• Indegree Heuristic

Page 51: Mathematical Foundations for Search and Surf Engines

Google PageRank Patent

• “The rank of a page can be interpreted as the probability that a surfer will be at the page after following a large number of forward links.”

The Ergodic Theorem

Page 52: Mathematical Foundations for Search and Surf Engines

Google PageRank Patent

• “The iteration circulates the probability through the linked nodes like energy flows through a circuit and accumulates in important places.”

Kirchoff (1847)

Page 53: Mathematical Foundations for Search and Surf Engines

Rank Sinks

5

6

7

1

2

3

4

No Spanning Tree

Page 54: Mathematical Foundations for Search and Surf Engines

References• Kirchhoff, G. "Über die Auflösung der Gleichungen, auf welche man bei

der untersuchung der linearen verteilung galvanischer Ströme geführt wird." Ann. Phys. Chem. 72, 497-508, 1847.

• А. А. Марков. "Распространение закона больших чисел на величины, зависящие друг от друга". "Известия Физико-математического общества при Казанском университете", 2-я серия, том 15, ст. 135-156, 1906.

• H. Minkowski, Geometrie der Zahlen (Leipzig, 1896).• J. Farkas, "Theorie der einfachen Ungleichungen," J. F. Reine u. Ang. Mat.,

124, 1-27, 1902.• H. Weyl, "Elementare Theorie der konvexen Polyeder," Comm. Helvet., 7,

290-306, 1935.• A. Charnes, W. W. Cooper, “The Strong Minkowski Farkas-Weyl Theorem

for Vector Spaces Over Ordered Fields,” Proceedings of the National Academy of Sciences, pp. 914-916, 1958.

Page 55: Mathematical Foundations for Search and Surf Engines

References• E. Steinitz. Bedingt Konvergente Reihen und Konvexe Systeme. J. reine

angew. Math., 146:1-52, 1916.• K. Jeev, J-L. Lassez: Symbolic Stochastic Systems. MSV/AMCS 2004: 321-

328• V. Chandru, J-L. Lassez: Qualitative Theorem Proving in Linear Constraints.

Verification: Theory and Practice 2003: 395-406• Jean-Louis Lassez: From LP to LP: Programming with Constraints. DBPL

1991 : 257-283