section 1 introduction combined

7/28/2019 Section 1 Introduction Combined

1/40

Probabilistic

Graphical

Models Introduction

the PGM C


2/40

Probabilistic

Models


3/40

Course Structure

10 weeks + final Videos + quizzes

pro em se s 25% of score

Multiple submissions


4/40

Course Structure

9 programming assignments Genetically inherited diseases Optical character recognition

9 x 7% = 63% of score Final exam

12% of score


5/40

Background

Required Basic probability theory Some programming

Recommended

Machine learning Simple optimization Matlab or Octave


6/40

Other Issues

Honor code Time management (10-15 hrs / wee

scuss on orum s u y groups


7/40

What youll learn

Fundamental methods Real-world applications How to use these methods in your wo


8/40

Daphne Koller

Mo#va#on'

and'Overview'

Probabilis#c'

Graphical'

Models'Introduc#on'


9/40

Daphne Koller

predisposing factorssymptomstest results

millions of pixels orthousands of superpixels

each needs to be labeled

{grass, sky, water, cow, horse, }diseasestreatment outcomes


10/40

Daphne Koller

ProbabilisticGraphical

Models


11/40

Daphne Koller

ModelsData

Model

Declarative representation

AlgorithmAlgorithm Algorithm

elicitation

domain expert

Learning


12/40

Daphne Koller

Uncertainty Partial knowledge of state of the world Noisy observations Phenomena not covered by our model Inherent stochasticity


13/40

Daphne Koller

Probability Theory

Declarative representation with clearsemantics

Powerful reasoning patterns Established learning methods


14/40

Daphne Koller

Complex Systemspredisposing factorssymptomstest results

diseasestreatment outcomes

class labels forthousands of superpixels

Random variables X1,, Xn

Joint distribution P(X1,, Xn)


15/40

Daphne Koller

Graphical Models

IntelligenceDifficulty

Grade

Letter

SAT BD

C

A

Bayesian networks Markov networks


16/40

Daphne Koller

Graphical Models

M. Pradhan, G. Provan, B. Middleton, M. Henrion, UAI 94


17/40

Daphne Koller

Graphical Representation Intuitive & compact data structureEfficient reasoning using general-purposealgorithms

Sparse parameterizationfeasible elicitationlearning from data


18/40

Daphne Koller

Many Applications Medical diagnosis Fault diagnosis Natural language

processing

Traffic analysis Social network models Message decoding

Computer vision Image segmentation 3D reconstruction Holistic scene analysis

Speech recognition

Robot localization &mapping


19/40

Daphne Koller

Image Segmentation


20/40

Daphne Koller

-Medical DiagnosisThanks to: Eric Horvitz, Microsoft Research


21/40

Daphne Koller

Textual Information Extraction

Mrs. Green spoke today in New York. Green chairs the finance committee.


22/40

Daphne Koller

Learned

Model

Multi-Sensor Integration: Traffic Trained on historical data Learn to predict current & future road speed, including

on unmeasured roads Dynamic route optimization

Multiple views

on traffic

Incident reports

Weather

Thanks to: Eric Horvitz, Microsoft Research

I95 corridor experiment: accurateto 5 MPH in 85% of cases Fielded in 72 cities


23/40

Daphne Koller

Known 15/17Supported 2/17

Reversed 1

Missed 3

Causal protein-signaling networks derived from multiparameter single-cell dataSachs et al., Science 2005

Biological Network ReconstructionPhospho-ProteinsPhospho-LipidsPerturbed in data

PKC

Raf

Erk

Mek

PlcPKA

Akt

Jnk P38

PIP2

PIP3

Subsequently validated in wetlab

This figure may be used for non-commercial and classroom purposes only.Any other uses require the prior written permission from AAAS


24/40

Daphne Koller

Overview Representation

Directed and undirected Temporal and plate models

Inference Exact and approximate Decision making

Learning Parameters and structure With and without complete data


25/40

Daphne Koller

Preliminaries:+

Distribu0ons+

Probabilis0c+

Graphical+

Models+Introduc0on+


26/40

Daphne Koller

Joint Distribution

Intelligence (I) i0 (low), i1 (high),

Difficulty (D) d0 (easy), d1 (hard)

Grade (G) g1 (A),g2 (B), g3 (C)

I D G Prob.

i0 d0 g1 0.126

i0 d0 g2 0.168

i0 d0 g3 0.126

i0 d1 g1 0.009

i0 d1 g2 0.045

i0 d1 g3 0.126

i1 d0 g1 0.252

i1 d0 g2 0.0224

i1 d0 g3 0.0056

i1 d1 g1 0.06

i1 d1 g2 0.036

i1 d1 g3 0.024


27/40

Daphne Koller

ConditioningI D G Prob.

i0 d0 g1 0.126

i0 d0 g2 0.168

i0 d0 g3 0.126

i0 d1 g1 0.009

i0 d1 g2 0.045

i0 d1 g3 0.126

i1 d0 g1 0.252

i1 d0 g2 0.0224

i1 d0 g3 0.0056

i1 d1 g1 0.06i1 d1 g2 0.036i1 d1 g3 0.024

condition on g1


28/40

Daphne Koller

Conditioning: ReductionI D G Prob.

i0 d0 g1 0.126

i0 d1 g1 0.009

i1 d0 g1 0.252

i1 d1 g1 0.06


29/40

Daphne Koller

P(I, D, g1)

Conditioning: RenormalizationI D G Prob.

i0 d0 g1 0.126

i0 d1 g1 0.009

i1 d0 g1 0.252

i1 d1 g1 0.06P(I, D | g1)

I D Prob.

i0 d0 0.282

i0 d1 0.02

i1 d0 0.564

i1 d1 0.1340.447


30/40

Daphne Koller

Marginalization

D Prob.

d0 0.846

d1 0.154

I D Prob.

i0 d0 0.282

i0 d1 0.02

i1 d0 0.564

i1 d1 0.134

Marginalize I


31/40

Daphne Koller

Preliminaries:+

Factors+

Probabilis1c+

Graphical+

Models+Introduc1on+


32/40

Daphne Koller

Factors A factor (X1,,Xk)

Scope = {X1,,Xk} : Val(X1,,Xk)

R


33/40

Daphne Koller

I D G Prob.

i0 d0 g1 0.126

i0 d0 g2 0.168

i0

d0

g3 0.126

i0 d1 g1 0.009

i0 d1 g2 0.045

i0 d1 g3 0.126

i1 d0 g1 0.252

i1 d0 g2 0.0224

i1 d0 g3 0.0056

i1 d1 g1 0.06i1 d1 g2 0.036i1 d1 g3 0.024

Joint Distribution

P(I,D,G)


34/40

Daphne Koller

Unnormalized measure P(I,D,g1)

I D G Prob.

i0 d0 g1 0.126

i0 d1 g1 0.009

i1 d0 g1 0.252

i1 d1 g1 0.06P(I,D,g1)


35/40

Daphne Koller

Conditional ProbabilityDistribution (CPD)

0.30.080.25

0.4g2

0.020.9i1,d00.70.05i0,d1

0.5

0.3g1 g3

0.2i1,d1

0.3i0,d0

P(G | I,D)


36/40

Daphne Koller

General factors

A B a0 b0 30

a0 b1 5

a1 b0 1

a1 b1 10


37/40

Daphne Koller

a1 b1 0.5

a1 b2 0.8

a2 b1 0.1

a2 b2 0

a3 b1 0.3

a3 b2 0.9

b1 c1 0.5

b1 c2 0.7

b2 c1 0.1

b2 c2 0.2

a1 b1 c1 0.50.5 = 0.25

a1 b1 c2 0.50.7 = 0.35

a1 b2 c1 0.80.1 = 0.08

a1 b2 c2 0.80.2 = 0.16

a2 b1 c1 0.10.5 = 0.05

a2 b1 c2 0.10.7 = 0.07

a2 b2 c1 00.1 = 0

a2 b2 c2 00.2 = 0

a3 b1 c1 0.30.5 = 0.15

a3 b1 c2 0.30.7 = 0.21

a3 b2 c1 0.90.1 = 0.09

a3 b2 c2 0.90.2 = 0.18

Factor Product


38/40

Daphne Koller

a1 b1 c1 0.25

a1 b1 c2 0.35

a1 b2 c1 0.08

a1 b2 c2 0.16

a2 b1 c1 0.05

a2 b1 c2 0.07

a2 b2 c1 0

a2 b2 c2 0

a3 b1 c1 0.15

a3 b1 c2 0.21

a3 b2 c1 0.09

a3 b2 c2 0.18

a1 c1 0.33

a1 c2 0.51

a2 c1 0.05

a2 c2 0.07

a3 c1 0.24

a3 c2 0.39

Factor Marginalization


39/40

Daphne Koller

a1 b1 c1 0.25

a1 b2 c1 0.08

a2 b1 c1 0.05

a2 b2 c1 0

a3 b1 c1 0.15

a3 b2 c1 0.09

a1 b1 c1 0.25

a1 b1 c2 0.35

a1 b2 c1 0.08

a1 b2 c2 0.16

a2 b1 c1 0.05

a2 b1 c2 0.07

a2 b2 c1 0

a2 b2 c2 0

a3 b1 c1 0.15

a3 b1 c2 0.21

a3 b2 c1 0.09

a3 b2 c2 0.18

Factor Reductiona1 b1 c1 0.25

a1 b1 c2 0.35

a1 b2 c1 0.08

a1 b2 c2 0.16

a2 b1 c1 0.05

a2 b1 c2 0.07

a2 b2 c1 0

a2 b2 c2 0

a3 b1 c1 0.15

a3 b1 c2 0.21

a3 b2 c1 0.09

a3 b2 c2 0.18


40/40

Daphne Koller

Why factors? Fundamental building block for defining

distributions in high-dimensional spaces

Set of basic operations for manipulatingthese probability distributions

section 1 introduction combined

Documents