probabilistic reasoning and learning with permutations thesis defense, 7/29/2011 jonathan huang...

49
Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford Xiaoye Jiang Stanford Ashish Kapoor Microsoft

Post on 19-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

Probabilistic Reasoning and Learning with Permutations

Thesis Defense, 7/29/2011

Jonathan Huang

Collaborators:

Carlos GuestrinCMU

Leonidas GuibasStanford

Xiaoye JiangStanford

Ashish KapoorMicrosoft

Page 2: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

2

Political Elections in Ireland

12

3

4

“But Ireland's complicated [election] system of proportional representation, … could upset the front-runner and help… the Fianna Fail candidate running second in the polls, to snatch victory.”

“Recent polling … indicates Doherty [Sinn Fein Party] is

leading the race.”

Page 3: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

3

Proportional Representation

ProsEncourages coalition governments

Discourages negative campaigning

No wasted votes – empowers voters

Irish Parliament, Maltese Parliament, Australian Senate, Iceland Constitutional Assembly, Academy Awards, University of Cambridge, Scotland local Governments, Cambridge (Mass) local, …..

ConFar more complex than

plurality voting…

Page 4: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

4

2002 Irish Election Data

Ireland

64,081 votes, 14 candidates

Major PartiesFianna Fail (FF)

Fine Gael (FG)

Minor PartiesIndependents (I)

Green Party (GP)

Christian Solidarity (CS)

Labour (L)

Sinn Fein (SF)

[Gormley, Murphy, 2006]

Predict winners Identify “voting-blocs” Formulate campaign strategies Engender an informed, effective democracy

Statistical analysis of voting data can:

Page 5: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

5

Distributions over Permutations

A B C D Probability

1 2 3 4 02 1 3 4 01 3 2 4 1/103 1 2 4 02 3 1 4 1/203 2 1 4 1/51 2 4 3 0

Ran

kin

gs

CandidatesA B C D Probability

1 2 3 4 0

2 1 3 4 0

1 3 2 4 1/10

3 1 2 4 0

2 3 1 4 1/20

3 2 1 4 1/5

1 2 4 3 0

“With probability 1/10: Candidate A ranked first, Candidate B ranked third, Candidate C ranked second, Candidate D ranked last”

Page 6: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

6

Permutations are Ubiquitous!

7

12

3

412

4

53

67

31

2

4

79

5

89

Politics Preferences

> >

> > Multiobject Tracking

Page 7: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

7

Problem #1: Representation

n n! Storage requirements9 362,880 3 megabytes

12 4.8x108 9.5 terabyes

15 1.31x1012 1729 petabytes (!!)

How can we tractably represent distributions over n! permutations in storage?

Page 8: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

8

First-order summary [Shin et al, ‘03]

14 Candidates

14 R

anks

1

3

5

7

9

11

130.05

0.1

0.15

0.2

0.25

FF FF FFFG FG FGI I I I GP CS SF L

25% voters rank Sinn Fein last

10% voters rank Sinn Fein first

ConReally coarse representation – can’t compute P(Sinn Fein candidate is first and Fianna Fail candidate is second)

Pron2 versus n! storage

For each (j,i) pair, store P(candidate j is in rank i)

Page 9: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

9

Decomposable DistributionsA

dd

itiv

e D

eco

mp

osi

tio

nM

ult

iplic

ativ

e D

eco

mp

osi

tio

n

Decompose functions on permutations into sums of

simpler functions

Decompose functions on permutations into products of

simpler functions

Page 10: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

10

Additive (Fourier) Decompositions

+.2 x +.1 x +.01 x.6 xf(x)=

Fourier coefficients Fourier basis functions

low frequency high frequency

Approximate distributions over permutations with low frequency basis functions

([Kondor2007,Huang2007,Huang2009])

Storing low frequency coefficients to approximate f

f .6 .2 .1 .1 .05 .01 .01 0 0 0

Page 11: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

11

Fourier coefficients for permutations

low frequency high frequency

f

Fourier coefficients for distributions on permutations are matrix-valued

Can exactly reconstruct all n! original probabilities

Can exactly reconstruct all first-order probabilities with first two matrices

Can exactly reconstruct all second-order probabilities with first three matrices

[Diaconis, ‘88]

Page 12: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

12

Second order summary (submatrix)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07R

anks

pai

rs

Candidate pairs

1,2

1,3

1,4

2,3

2,4

3,4

FF,FG FF,FF FF,SF FG,FF FG,SF FF,SF

7% voters placed two Fianna Fail candidates consecutively

in ranks 1 and 2

Capture higher order dependencies with O(n4) storage

Page 13: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

13

Accuracy/Storage Trade-off

Probability Fourier interpretation0th order Lowest frequency Fourier coefficient

1st order Reconstructible from O(n2) lowest frequency coefficients

2nd order Reconstructible from O(n4) lowest frequency coefficients

3rd order Reconstructible from O(n6) lowest frequency coefficients

… …

nth order Requires all n! Fourier coefficients

Problem #1: RepresentationStoring a low frequency Fourier approximation

is equivalent tostoring low-order probabilities

(and can be done in polynomial space)

Low-frequency Fourier approximations generalize the first-order summary!

Page 14: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

14

ContributionsRepresentation

Polynomial storage for approximate distributions

Low frequency = maintaining probabilities over small sets

[NIPS07, JMLR09]

Inference

Ad

dit

ive

(F

ou

rier

)D

ec

om

po

siti

on

Mu

ltip

licat

ive

D

ec

om

po

siti

on

Page 15: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

15

Problem #2: Probabilistic Inference in Ranking

What are the odds that someone will rank Sinn Fein first if he ranks Fianna Fail

second?

If a voter ranks Labour first, is he more likely to prefer Fine Gael over

Fianna Fail?

If I prefer Titanic to Star Wars, am I likely to also

prefer The English Patient to Jurassic Park?

Page 16: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

16

Problem #2: Inference

Posterior Likelihood PriorABCD

BACD

ACBD

CABD

xlikelihood

prior

posterior=

How can we efficiently compute a posterior based on a new

observation?

candidate ranking σ z = “Fianna Fail ranked second”P( )|

Complexity: O(n!)

Rev. Bayes

Bayes Rule:

Page 17: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

17

P(ranking) P(“Sinn Fein is first” | ranking)

Inference with Fourier coefficients

prior likelihood

Given:

posterior

P(ranking | “Sinn Fein is first”)

Compute:

From Signal Processing: Pointwise products correspond to convolutions of Fourier coefficients

Page 18: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

18

P(ranking) P(“Sinn Fein is first” | ranking)

Inference with Fourier coefficients

[Huang et al, NIPS 2007]

prior likelihood

posterior

Our algorithm applies to arbitrary distributionsdefined over arbitrary finite groups

Pointwise products correspond to (generalized) convolution in the Fourier domain

P(ranking | “Sinn Fein is first”)

Page 19: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

19

• Discard “high-frequency” coefficients after conditioning– Equivalently, maintain low-order

probabilities

Bandlimiting

Theorem. Given rth order terms of the prior and an sth order likelihood, then the (r-s)th order terms of the posterior can be exactly computed.

(Fourier methods work best on low-order observations)

[Huang et al, NIPS 2009]

Page 20: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

Dealing with the Impossible

Infeasible approximations (e.g. negative probabilities) can arise due to bandlimiting

20

Feasible Coefficients

Infeasible Coefficients

Infeasible approximation

Nearest feasible Fourier coefficients

(Efficient projection (to a relaxed polytope) possible using a quadratic program)

Solution [Huang, 2007]: Project to space of coefficients corresponding to feasible probabilities

Page 21: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

21

Permutations in Tracking

Track 1 Track 2 Track 3 Track 4

Applications to:- Monitoring for Assisted Living

- Video analysis for sports

- Video surveillance for crowds

Page 22: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

22

Probabilistic Inference in Tracking

Mixing @Tracks 1,2

Mixing @Tracks 1,3

Mixing @Tracks 1,4

Track 1

Track 2

Track 3

Track 4

Inference problem: Where is Alice?

?

Page 23: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

23

Simulated tracking dataProjection to the Marginal polytope

versus no projection (n=6)

Approximation by a uniform distribution

1st order 3rd order2nd order

Bet

ter

0

0.02

0.04

0.06

0.08

0.1

0.12

Err

or

w/o Projection

w/Projection 1st order

3rd order

2nd order

Page 24: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

24

Tracking with a camera network• Camera Network data:

– 8 cameras, multi-view, occlusion effects

– 11 individuals in lab– Identity observations

obtained from color histograms

– Mixing events declared when people walk close to each other B

ette

r

0

10

20

30

40

50

60

% T

rack

s co

rrec

tly Id

en

tifie

d

Omniscient tracker

time-independent classification

w/o Projection

2nd order w/Projection

Problem #2: Inference

can be formulated in Fourier domain as (generalized) convolution, and approximated via

bandlimiting/projections; low-order observations = polytime/accurate inference

Page 25: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

25

ContributionsRepresentation Inference

Polynomial storage for approximate distributions

Low frequency = maintaining probabilities over small sets

Polytime Fourier domain conditioning algorithm for

finite groups

Approximation guarantee for low order observations

[NIPS07, JMLR09] [NIPS07, JMLR09]Ad

dit

ive

(F

ou

rier

)D

ec

om

po

siti

on

Mu

ltip

licat

ive

D

ec

om

po

siti

on

Page 26: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

26

Even polynomial is too slow…

Representation Depth # Fourier coefficients1st order O(n2)

2nd order O(n4)

3rd order O(n6)

4th order O(n8)

3rd order2nd order1st order

Exact inference

Bet

ter

Ru

nni

ng

time

in s

eco

nds

4 5 6 7 80

1

2

3

4

n

Can we achieve more compact representations?

Page 27: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

27

Idea: Assume a ranking is created by “shuffling” smaller, independent rankings

Artichoke Broccoli>

Rank Veggies

Cherry Dates>Rank Fruits

Veggie

Fruit

Veggie

Fruit

>

>

>Veggie

Fruit

Veggie

Fruit

>

>

>Veggie

Fruit

Veggie

Fruit

>

>

>

Veggie

Fruit

Veggie

Fruit

>

>

>

Artichoke

Broccoli

Cherry

Dates

>

>

>

Interleave (riffle shuffle) veggie/fruit rankings to form a

complete ranking[Huang, Guestrin, 2009]

Riffle independent distributions can be represented with a reduced set of parameters!

Riffled Independence

Page 28: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

28

10 20 30 40 50 60 70 80 90 100 110 12000.010.020.030.040.050.060.070.080.090.1

permutations

pro

bab

ility Blue line: candidate {2} riffle

independent of candidates {1,3,4,5}

American Psych. Assoc. (APA) Election (1980)

Empirically, we can find approximate riffled independence in real datasets

William Bevan

Ira Iscoe

Charles Kiesler

Max Siegle

Logan Wright

5738 full ballots, 5 candidates

dataset from [Diaconis, ‘89]

5!=120

Page 29: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

29

#(rankings)5! = 120

Can we do better?

Parameter Counting

{1,2,3,4,5}

{1,3,4,5} {2}

Item set decomposition

Relative ranking of candidates {1,3,4,5}

Interleaving candidate {2} with remaining candidates

4! = 24

5

30 Total # of model parameters <

Relative ranking of candidates {2} 1! = 1

{4,3} {1,5}

Relative ranking of candidates {4,3}

Interleaving candidates {4,3} with candidates {1,5}

2! = 2

6

Relative ranking of candidates {1,5} 2! = 2

16Total # of model parameters <

Problem #1: RepresentationDistributions which decompose into riffle

independent factors can be represented using exponentially fewer parameters

Page 30: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

30

Hierarchical Decompositions: Drawing a Ranking

Interleave Healthy foods with Junk food

Interleave fruits/vegetables

Rank Junk food

Rank fruits Rank vegetables

better

Food Preferences

Problem: For APA data, don’t know the hierarchy!

Page 31: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

31

Reverse Engineering the HierarchyMachine learning approach: Use the structure that best explains the data

ABCD

BACD

BADCBDCA

CBDA

CABD

CBAD

CDBA DCBA

DCAB

ADBC

{A,B,C,D}

{C,D,A} {B}

{A} {C,D}

Structure Learning Algorithm

Data HierarchyCore Problem: Given ranked data, determine whether subsets are riffle independent

Page 32: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

32

Measuring riffled independence

If i, (j,k) lie on opposite sides,Mutual information=0

Idea: measure independence between singleton rankings and pairwise rankings

preference over Fruit i

relative preference over Vegetable j, k

[Huang, Guestrin, 2010]

Riffled independence: absolute rankings of Fruits not informative about relative rankings in Vegetables

Page 33: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

33

Tripletwise objective function

A Ball items in set A –plays nono role inobjective

Measuring departure from riffled independence:

Exponential number of possible splits, but there is an efficient minimization

algorithm that works with high probability

Minimize:

Page 34: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

34

Hierarchy respects political coalition structure of the APA!

# model parameters: 11

Learning Structure from APA Data{12345}

{1345} {2}

{13} {45}

3. Charles Kiesler

1. William Bevan 4. Max Siegle

2. Ira Iscoe 5. Logan Wright

Candidates

Research psychologists

Clinical psychologists

Community psychologists

candidates

rank

s

1 2 3 4 5

1

2

3

4

5

candidates

rank

s

1 2 3 4 5

1

2

3

4

50

0.05

0.1

0.15

0.2

0.25

“True” first order Hierarchical first order

Page 35: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

35

true structure known

learned structure

random 1-chain ([Doignon et al, 2004])

Structure learning with synthetic data

1 2 3 4 5-6300

-6200

-6100

-6000

-5900

-5800

-5700

-5600

-5500

-5400lo

g-lik

elih

ood

log10(# samples)

16 items, 4 items in each leaf

Bet

ter

Theorem: Our algorithm recovers riffle independent split with probability given

samples (under mild assumptions on connectivity).

[Huang, Guestrin, 2011]

Page 36: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

36

Irish Election (No Structure Learning)

0

0.05

0.1

0.15

0.2

0.25“True” first order Probability Riffle Independent approximation

Sinn Fein, Christian Solidarity columns not well captured by a single split!

CandidatesFF FF FFFG FG FGI I I I GP CS SF L

CandidatesFF FF FFFG FG FGI I I I GP CS SF L

Major parties riffle independent of minor parties?

ran

ks

2

4

6

8

10

12

14

Page 37: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

37

Structure Learning the Irish Election{1,2,3,4,5,6,7,8,9,10,11,12,13,14}

{1,2,3,4,5,6,7,8,9,10,11,13,14}

{12}

{11} {1,2,3,4,5,6,7,8,9,10,13,14}

{2,3,5,6,7,8,9,10,14} {1,4,13}

{2,5,6} {3,7,8,9,10,14}

Sinn Fein

Christian Solidarity

Fianna Fail

Fine Gael Independents, Labour, GreenFull model ~ 87 billion

Hierarchical model ~1000

# Parameters

Brute force optimization: 70.2sOur method: 2.3s

Running time

“True” first order Learned first order

Candidates

Ran

ks

2 4 6 8 10 12 14

2468101214

Candidates

Ran

ks

2 4 6 8 10 12 14

2468101214 0

0.05

0.1

0.15

0.2

0.25

Page 38: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

38

Preference Analysis (for Sushi)

1. Ebi (shrimp)2. Anago (sea eel)3. Maguro (tuna)4. Ika (squid)5. Uni (sea urchin)6. Sake (salmon roe)7. Tamago (egg) 8. Toro (fatty tuna)9. Tekka-make (tuna roll)10. Kappa-maki (cucumber roll)

Contenders

5000 preference rankings of 10 types of sushi

Sushi

Ran

ks

1 2 3 4 5 6 7 8 9 10

First

Last

Fatty tuna (Toro)is a favorite!

No one likes cucumber roll !

Page 39: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

39

{1,2,3,4,5,6,7,8,9,10}

{2} {1,3,4,5,6,7,8,9,10}

{1,3,5,6,7,8,9,10}{4}

{1,3,7,8,9,10} {5,6}

{3,7,8,9,10} {1}

{3,8,9} {7,10}

(sea eel)

(squid)

(sea urchin, salmon roe)

(shrimp)

(tuna, fatty tuna, tuna roll)

(egg, cucumber roll)

Sushi Hierarchy

Page 40: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

40

ContributionsRepresentation Inference

Polynomial storage for approximate distributions

Low frequency = maintaining probabilities over small sets

Polytime Fourier domain conditioning algorithm for

finite groups

Approximation guarantee for low order observations

[NIPS07, JMLR09]

[NIPS09, ICML10, EJS11]

[NIPS07, JMLR09]

Introduction of Hierarchical Riffled Independence models

Structure learning algorithm with polynomial time/samples guarantee

Ad

dit

ive

(F

ou

rier

)D

ec

om

po

siti

on

Mu

ltip

licat

ive

(R

iffl

e In

dep

en

de

nt)

D

ec

om

po

siti

on

Page 41: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

41

Top-k Inference Problem

12

3

4

--

-

2 4 6 8 10 12 14

# candidates specified

k

num

ber

of v

otes

0

20,000

10,000

Most voters rank just the top-3 or top-4

candidates

Inference problem: Given an observation of a voter’s top-k rankings,

infer his preferences over remaining candidates

Page 42: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

42

Inference in Riffled Independent Models

Decomposition Can efficiently perform inference with

Fourier (Additive) Low order likelihoods (observations depend on few items)

Riffle Independent (Multiplicative) ????

Posterior Likelihood Prior

Bayes Rule: O(n!) operation

Answer: Efficient inference possible if and only if observations take the

form of partial rankings!(including top-k observations)

Page 43: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

43

The Top-1 Inference ProblemSometimes we can decompose the

observation into smaller observations

Bayes rule complexity: factorial in number of items?

“Candidate 3 (FF party) ranked in first place” decomposes as:

{all candidates}

{1,2,3} {4,5,6,7,8}Fianna Fail Other Candidates

Bayes rule complexity: linear in # of parameters

Interleaving Observation: Fianna Fail candidate ranked in

first place overall

Fianna Fail Observation: Candidate 3 ranked first among

FF candidates

Observation:

Top-1 inference always decomposes into inference for each node in the

hierarchical model

Page 44: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

44

Efficient inference for partial rankings

Fine Gael

Fianna Fail Sinn Fein

Independent

Green

Labour

Socialistxxx

In general, there are many forms of partial rankings allowing items to be tied

(Approval voting)

• First place observations:

• Top-k observations:

• Approval voting observations:

G|ABCDEFH “G in first place”

G|F|A|BCDEH “G in first place, F in second, A in third”

ACFG|BDEH “Approve of candidates in {ACFG} ”

Page 45: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

45

Main Theorem

[Huang, Kapoor, 2011]

Theorem: Any partial ranking observation is decomposable with respect to any hierarchy.

i.e., Inference for partial rankings is efficientwith running time linear in #(parameters)

Hierarchy H1 Hierarchy H2

Hierarchy H3

Partial rankings

(But what’s out here?)

Converse to Main Theorem:Every observation that decomposes with respect to all hierarchies takes the form of some partial

ranking.

Page 46: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

46

Learning with Top-k Votes (Irish Data)

21800

22200

22600

Low

er is

Bet

ter

Full rankings only

Full rankings + Partial rankings

Neg

ativ

e Lo

g-Li

kelih

ood

Riffle independent model

Nonparametric Mallows [Lebanon, 2008]

Using inference, we can efficiently build accurate, interpretable models

of partial rankings

Page 47: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

47

ContributionsRepresentation Inference

Ad

dit

ive

Dec

om

po

siti

on

Mu

ltip

licat

ive

Dec

om

po

siti

on

Polynomial storage for approximate distributions

Low frequency = maintaining probabilities over small sets

Polytime Fourier domain conditioning algorithm for

finite groups

Approximation guarantee for low order observations

Introduction of Hierarchical Riffled Independence models

Structure learning algorithm with polynomial time/samples guarantee

Decomposability theorem for partial rankings

Learning distributions with partial rankings

[NIPS07, JMLR09]

[NIPS09, ICML10, EJS11] [NIPS-CSS10]

[NIPS07, JMLR09]Algorithms for exploiting both

decompositions for scalable inference

[AISTATS08, NIPS09, EJS11]

Page 48: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

48

Main Technical Contributions

• Fourier theoretic conditioning algorithm with projection to the marginal polytope [NIPS07, JMLR09]

• Fourier theoretic characterization of probabilistic independence [AISTATS07]

• Definition of riffled independence [NIPS09]• Polynomial sample/time complexity structure learning

algorithms [ICML10]• Theoretical connection between efficient inference in

riffle independent models and partial ranking [UAI11]• Tractable model estimation algorithm with partial

rankings [UAI11]

Page 49: Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford

49

Thank You Carlos Guestrin

Leo Guibas, John Lafferty, Drew Bagnell, Alex Smola

Ashish Kapoor, Eric Horvitz, Ali Rahimi

Risi Kondor, Marina Meila, Guy Lebanon, Tiberio Caetano, Xiaoye Jiang

SELECT Lab, Michelle Martin

Friends

Lucia Castellanos

Billy, Farn-lin, and Jonah Huang