practical probabilistic relational learning sriraam natarajan

Practical Probabilistic Relational Learning

Sriraam Natarajan

Take-Away Message

Learn from rich, highly structured data!

Traditional Learning

+

DataAttributes(Features)

Data is i.i.d.

B E A M J

1 0 1 1 0

0 0 0 0 1

. . .

0 1 1 0 1

Earthquake

Alarm

Burglary

MaryCalls

JohnCalls

Learning

Earthquake

Alarm

Burglary

MaryCalls JohnCalls

0.08 0.92 0.01 0.99

0.1 0.9

0.55 0.45

0.6 0.4

0.95 0.05

0.3 0.7

0.8 0.2

0.1 0.9

0.9 0.1

PatientID Date Prescribed Date Filled Physician Medication Dose Duration

P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months

PatientID SNP1 SNP2 … SNP500K

P1 AA AB BB P2 AB BB AA

Real-World Problem: Predicting Adverse Drug Reactions

PatientID Gender Birthdate

P1 M 3/22/63

PatientID Date Physician Symptoms Diagnosis

P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza

PatientID Date Lab Test Result

P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45

Pati

en

t Ta

ble

Vis

it T

ab

le

Lab

Tests

SN

P T

ab

le

Pre

scri

pti

on

s

Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models

Logic

Probabilities

Add ProbabilitiesStatistical Relational

Learning (SRL)

• Several previous SRL Workshops in the past decade• This year – StaRAI @ AAAI 2013

Add Relations

PropositionalLogic

First Order Logic

Statistical Relational Learning

Probability Theory Probabilistic Logic

Inductive Logic Programming

Classical MachineLearning

Prop Rule Learning

Deterministic

Stochastic

Learning

No Learning

Prop FO

Costs and Benefits of the SRL soup

BenefitsRich pool of different languagesVery likely that there is a language that fits your task

at hand wellA lot research remains to be done, ;-)

Costs“Learning” SRL is much harderNot all frameworks support all kinds of inference and

learning settings

How do we actually learn relational models from data?

Why is this problem hard?

Non-convex problem Repeated search of parameters for every step in

induction of the model First-order logic allows for different levels of

generalization Repeated inference for every step of parameter

learningInference is P# complete

How can we scale this?

Relational Probability Trees

Each conditional probability distribution can be learned as a tree

Leaves are probabilities The final model is the

set of the RRTs

male(X)

chol(X,Y,L), Y>40,L>200

diag(X,Hypertension,Z),Z>55

bmi(X,W,55), W>30

0.8

0.77

0.05

0.3

noyes

noyes

no

no

yes

yes[Blockeel & De Raedt ’98]

To predict heartAttack(X)

…

Gradient (Tree) Boosting [Friedman Annals of Statistics 29(5):1189-1232, 2001]

Models = weighted combination of a large number of small trees (models) Intuition: Generate an additive model by sequentially fitting small trees to

pseudo-residuals from a regression at each iteration…

Data

Predictions

- Residuals=Data

+Loss fct

Initial Model+

++

Induce

Iterate

Final Model =

+ + + +…

Boosting Results – MLJ 11Algo Likelihood AUC-ROC AUC-PR Time

Boosting 0.810 0.961 0.930 9sMLN 0.730 0.535 0.621 93 hrs

Predicting the advisor for a

student

Movie Recommendation

Citation Analysis Machine Reading

Other Applications

Similar Results in several other problems Imitation Learning – Learning how to act from

demonstrations (Natarajan et al IJCAI ‘11) Robocup, a grid world domain, traffic signal domain and blocksworld

Prediction of CAC Levels – Predicting cardio-vascular risks in young adults (Natarajan et al – IAAI 13)

Prediction of heart attacks (Weiss et al – IAAI 12, AI Magazine 12)

Prediction of onset of Alzheimer’s (Natarajan et al ICMLA ’12, Natarajan et al IJMLC 2013)

Parallel Lifted Learning

Stochastic ML

Statistical Relational

Scales well, stochastic gradients, online learning, …

Symmetries, compact models, lifted inference, ….

ParallelSymmetries, compact models, lifted inference, ….

Symmetry based inference

1

3

5

42 3

2

1

4

5

1

3

5

42

1

3

5

42

P(Anna) HI (Bob)

P(Bob)HI(Anna)

root clause

P(Anna) !P(Bob)

neighboring clauses

P(Anna) => !HI(Bob)

P(Anna) => HI(Anna)

P(Bob) => HI(Bob)

P(Bob) => !HI(Anna)

Tree (set of clauses)

P(Anna)!P(Bob)P(Bob)=> HI(Bob)P(Bob)=> !HI(Anna)

Variabilized tree

P(X)!P(Y)P(Y)=> HI(Y)P(Y)=> !HI(X)

Lifted TrainingGenerate tree

pieces from corresponding

patterns.

Compute gradient using lifted BP

Update covariance matrix C or some low rank variant

Update parameter vector and the corresponding

equations

Randomly draw mini-batches

Generate initial tree pieces and

variablize its arguments.

Challenges

Message schedules Iterative Map-reduce? How do we take this idea to learning the

models?How can we more efficiently parallelize

symmetry identification?What are the compelling problems? Vision,

NLP,…

Conclusion

The world is inherently relational and uncertain SRL has developed into an exciting field in the past decade

Several previous SRL workshops Boosting Relational models has promising initial results

Applied to several different problems First scalable relational learning algorithm How can we parallelize/scale this algorithm? Can this benefit from an inference algorithm like Belief

Propagation that can be parallelized easily?

practical probabilistic relational learning sriraam natarajan

Documents

srl learning problem

learning settingshow

relational models

logic probability

costslearning srl

data set

structured data

final w