lifted first-order probabilistic inference rodrigo de salvo braz, eyal amir and dan roth

Lifted First-Order Probabilistic InferenceRodrigo de Salvo Braz, Eyal Amir and Dan Roth

This research is supported by ARDA’s AQUAINT Program, by NSF grant ITR-IIS-0085980 and DAF grant FA8750-04-2-0222 (DARPA REAL program). AAAI’06, IJCAI’05

GoalA general system to reason with symbolic data in a probabilistic fashion

• Being able to describe objects of arbitrary structure: lists, trees, objects with various fields and refering to other objects;• Being able to describe probabilistic facts.

Previous approaches - Logic• Powerful data representation, powerful knowledge declaration, but lack of probabilities;• Prevents us to say things as simple as: “most birds fly”, “most capitalized words are proper nouns” or “rectangular objects are common”.

Lifted First-Order Probabilistic Inference• Allows for declaration of probabilistic knowledge (KB) that applies to many objects:

Prob(properNoun(Word) | noun(Word), capitalized(Word), similarTo(Word, Word2), knownName(Word2)) = 0.7holds for all words involved.

• Does not build BN in advance of inference. Instead, performs inference using KB items as necessary.• Does not consider individual objects separately unless this is really necessary: similar to theorem-proving, but with probabilities.• Example: given capitalized(Beautiful) and adjective(Beautiful), the algorithm does not have to consider the whole list of known names in order to decide that Beautiful is not a proper noun, because it is not a noun in the first place. An algorithm constructing a model before doing inference would build a node per known name!

ExampleA group of people, each with data such as name, age, profession, as well as relationships between them:Prob(friends(X,Z) | friends(X,Y), friends(Y,Z)) = 0.6Prob(age(Person) > 20 | degree(Person)) = 0.9.

Key ideasEach atom in a rule is a parameterized random variable. noun(Word) actually stands for all its instance random variables noun(w1), noun(w2),..., word(wn).We use a First-Order Variable Elimination algorithm, where often all instances of a parameterized random variable can be eliminated at once, regardless of the domain size, since they all have the same structure. This is an idea introduced by David Poole in 2003 that we formalized and call Inversion Elimination (IE). However, IE only works when these instances are independent of each other. For more general cases, we introduced Counting Elimination, which however does depend on the domain size.

ContributionsA First-Order Probabilistic Inference algorithm which:• provides a language for both structured data and probabilistic information;• takes advantage of first-order level information by eliminating many objects in a single step, as opposed to previous work.

Comparing Lifted and Propositional Inferences

We compare lifted methods (inversion and counting elimination) to their propositional counterparts (which produce identical numerical results), for P(sick(Person) | epidemics), P(death | sick(Person)) in (I) and P(p(X),p(Y),r) in (II).

(I) Inversion elimination

-10

90

190

290

390

490

590

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Domain size

Aver

age

run

time

(ms)

Lifted

Ground

(II) Counting elimination

-100400900

14001900240029003400

1 2 3 4 5 6 7 8 9 10

Domain size

Aver

age

run

time

(ms)

Lifted Ground

ExampleWeb pages, with their words, subjects and links, and how they relate to each other.Prob(subject(Page,coffe) | wordIn(Page,java)) = 0.2Prob(subject(Page,programming) | wordIn(Page,java)) = 0.8Prob(subject(Page,Subj) | link(Page,Page2), subject(Page2, Subj)) = 0.7.

ExampleSmoking facts in a large population:Prob(friends(X,Y)) = 0.1Prob(smoker(X)) = 0.3Prob(smoker(Y) | friends(X,Y), smoker(X)) = 0.6Prob(cancer(X) | smoker(X)) = 0.6, X johnProb(cancer(john)) = 0.01A possible query would be: in a population of 40,000, what is the probability of a male having cancer?

Previous approaches - Probabilistic models• Powerful probabilistic capabilities in models such as Bayesian networks and Markov models, but lacks structured data such as lists, trees, etc.• Cannot automatically apply knowledge to more than one object at the same time (for example, probabilistic properties of nouns to all nouns in a piece of text).

Previous approach - Knowledge-driven Construction• Constructs a propositional probabilistic model based on logic-like probabilistic knowledge;• Builds model before starting inference, often building unnecessary parts.

Where we areWe have:• predicates (relational structure);• no individuals split unnecessarily.But we do not have yet:• function symbols, so no data structures such as lists and trees;• does not take full advantage of inference during construction. This will require approximation.

Video AnalysisContent Extraction

Video AnalysisContent Extraction

lifted first-order probabilistic inference rodrigo de salvo braz, eyal amir and dan roth

Documents

time msii

probabilistic information

order level information

advantage of inference

structured data

data structures

ardas aquaint program

daf grant fa8750