cs7015 (deep learning) : lecture 18 - markov networksmiteshk/cs7015/slides/teaching/pdf/lec… ·...

115
1/29 CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Upload: others

Post on 04-Oct-2020

8 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

1/29

CS7015 (Deep Learning) : Lecture 18Markov Networks

Mitesh M. Khapra

Department of Computer Science and EngineeringIndian Institute of Technology Madras

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 2: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

2/29

Acknowledgments

Probabilistic Graphical models: Principles and Techniques, Daphne Kollerand Nir Friedman

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 3: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

3/29

Module 18.1: Markov Networks: Motivation

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 4: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

4/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 5: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

4/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 6: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

4/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 7: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

4/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 8: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

4/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 9: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

4/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 10: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

4/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 11: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

4/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 12: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

4/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 13: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

5/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Now suppose there was some miscon-ception in the lecture due to some er-ror made by the teacher

Each one of A, B, C, D could have in-dependently cleared this misconcep-tion by thinking about it after the lec-ture

In subsequent study pairs, each stu-dent could then pass on this inform-ation to their partner

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 14: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

5/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Now suppose there was some miscon-ception in the lecture due to some er-ror made by the teacher

Each one of A, B, C, D could have in-dependently cleared this misconcep-tion by thinking about it after the lec-ture

In subsequent study pairs, each stu-dent could then pass on this inform-ation to their partner

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 15: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

5/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Now suppose there was some miscon-ception in the lecture due to some er-ror made by the teacher

Each one of A, B, C, D could have in-dependently cleared this misconcep-tion by thinking about it after the lec-ture

In subsequent study pairs, each stu-dent could then pass on this inform-ation to their partner

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 16: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

5/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

To motivate undirected graphicalmodels let us consider a new example

Now suppose there was some miscon-ception in the lecture due to some er-ror made by the teacher

Each one of A, B, C, D could have in-dependently cleared this misconcep-tion by thinking about it after the lec-ture

In subsequent study pairs, each stu-dent could then pass on this inform-ation to their partner

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 17: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

6/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

We are now interested in knowingwhether a student still has the mis-conception or not

Or we are interested in P (A,B,C,D)

where A, B, C, D can take values 0(no misconception) or 1 (misconcep-tion)

How do we model this using aBayesian Network ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 18: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

6/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

We are now interested in knowingwhether a student still has the mis-conception or not

Or we are interested in P (A,B,C,D)

where A, B, C, D can take values 0(no misconception) or 1 (misconcep-tion)

How do we model this using aBayesian Network ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 19: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

6/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

We are now interested in knowingwhether a student still has the mis-conception or not

Or we are interested in P (A,B,C,D)

where A, B, C, D can take values 0(no misconception) or 1 (misconcep-tion)

How do we model this using aBayesian Network ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 20: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

6/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

We are now interested in knowingwhether a student still has the mis-conception or not

Or we are interested in P (A,B,C,D)

where A, B, C, D can take values 0(no misconception) or 1 (misconcep-tion)

How do we model this using aBayesian Network ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 21: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

7/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

First let us examine the conditionalindependencies in this problem

A ⊥ C|{B,D} (because A & C neverinteract)

B ⊥ D|{A,C} (because B & D neverinteract)

There are no other conditional inde-pendencies in the problem

Now let us try to represent this usinga Bayesian Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 22: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

7/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

First let us examine the conditionalindependencies in this problem

A ⊥ C|{B,D} (because A & C neverinteract)

B ⊥ D|{A,C} (because B & D neverinteract)

There are no other conditional inde-pendencies in the problem

Now let us try to represent this usinga Bayesian Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 23: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

7/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

First let us examine the conditionalindependencies in this problem

A ⊥ C|{B,D} (because A & C neverinteract)

B ⊥ D|{A,C} (because B & D neverinteract)

There are no other conditional inde-pendencies in the problem

Now let us try to represent this usinga Bayesian Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 24: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

7/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

First let us examine the conditionalindependencies in this problem

A ⊥ C|{B,D} (because A & C neverinteract)

B ⊥ D|{A,C} (because B & D neverinteract)

There are no other conditional inde-pendencies in the problem

Now let us try to represent this usinga Bayesian Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 25: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

7/29

D B

A

C

A,B,C,D are four students

A and B study together sometimes

B and C study together sometimes

C and D study together sometimes

A and D study together sometimes

A and C never study together

B and D never study together

First let us examine the conditionalindependencies in this problem

A ⊥ C|{B,D} (because A & C neverinteract)

B ⊥ D|{A,C} (because B & D neverinteract)

There are no other conditional inde-pendencies in the problem

Now let us try to represent this usinga Bayesian Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 26: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

8/29

D B

A

C

How about this one?

Indeed, it captures the following in-dependencies relation

A ⊥ C|{B,D}

But, it also implies that

B 6⊥ D|{A,C}

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 27: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

8/29

D B

A

C

How about this one?

Indeed, it captures the following in-dependencies relation

A ⊥ C|{B,D}

But, it also implies that

B 6⊥ D|{A,C}

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 28: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

8/29

D B

A

C

How about this one?

Indeed, it captures the following in-dependencies relation

A ⊥ C|{B,D}

But, it also implies that

B 6⊥ D|{A,C}

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 29: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

9/29

D B

C A

Perfect Map: A graph G is a Per-fect Map for a distribution P if the in-dependance relations implied by thegraph are exactly the same as thoseimplied by the distribution

Let us try a different network

Again

A ⊥ C|{B,D}

But

B ⊥ D(unconditional)

You can try other networks

Turns out there is no Bayesian Net-work which can exactly capture inde-pendence relations that we are inter-ested in

There is no Perfect Map for the dis-tribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 30: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

9/29

D B

C A

Perfect Map: A graph G is a Per-fect Map for a distribution P if the in-dependance relations implied by thegraph are exactly the same as thoseimplied by the distribution

Let us try a different network

Again

A ⊥ C|{B,D}

But

B ⊥ D(unconditional)

You can try other networks

Turns out there is no Bayesian Net-work which can exactly capture inde-pendence relations that we are inter-ested in

There is no Perfect Map for the dis-tribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 31: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

9/29

D B

C A

Perfect Map: A graph G is a Per-fect Map for a distribution P if the in-dependance relations implied by thegraph are exactly the same as thoseimplied by the distribution

Let us try a different network

Again

A ⊥ C|{B,D}

But

B ⊥ D(unconditional)

You can try other networks

Turns out there is no Bayesian Net-work which can exactly capture inde-pendence relations that we are inter-ested in

There is no Perfect Map for the dis-tribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 32: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

9/29

D B

C A

Perfect Map: A graph G is a Per-fect Map for a distribution P if the in-dependance relations implied by thegraph are exactly the same as thoseimplied by the distribution

Let us try a different network

Again

A ⊥ C|{B,D}

But

B ⊥ D(unconditional)

You can try other networks

Turns out there is no Bayesian Net-work which can exactly capture inde-pendence relations that we are inter-ested in

There is no Perfect Map for the dis-tribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 33: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

9/29

D B

C A

Perfect Map: A graph G is a Per-fect Map for a distribution P if the in-dependance relations implied by thegraph are exactly the same as thoseimplied by the distribution

Let us try a different network

Again

A ⊥ C|{B,D}

But

B ⊥ D(unconditional)

You can try other networks

Turns out there is no Bayesian Net-work which can exactly capture inde-pendence relations that we are inter-ested in

There is no Perfect Map for the dis-tribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 34: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

9/29

D B

C A

Perfect Map: A graph G is a Per-fect Map for a distribution P if the in-dependance relations implied by thegraph are exactly the same as thoseimplied by the distribution

Let us try a different network

Again

A ⊥ C|{B,D}

But

B ⊥ D(unconditional)

You can try other networks

Turns out there is no Bayesian Net-work which can exactly capture inde-pendence relations that we are inter-ested in

There is no Perfect Map for the dis-tribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 35: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

9/29

D B

C A

Perfect Map: A graph G is a Per-fect Map for a distribution P if the in-dependance relations implied by thegraph are exactly the same as thoseimplied by the distribution

Let us try a different network

Again

A ⊥ C|{B,D}

But

B ⊥ D(unconditional)

You can try other networks

Turns out there is no Bayesian Net-work which can exactly capture inde-pendence relations that we are inter-ested in

There is no Perfect Map for the dis-tribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 36: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

9/29

D B

C A

Perfect Map: A graph G is a Per-fect Map for a distribution P if the in-dependance relations implied by thegraph are exactly the same as thoseimplied by the distribution

Let us try a different network

Again

A ⊥ C|{B,D}

But

B ⊥ D(unconditional)

You can try other networks

Turns out there is no Bayesian Net-work which can exactly capture inde-pendence relations that we are inter-ested in

There is no Perfect Map for the dis-tribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 37: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

10/29

D B

A

C

The problem is that a directed graph-ical model is not suitable for this ex-ample

A directed edge between two nodesimplies some kind of direction in theinteraction

For example A → B could indicatethat A influences B but not the otherway round

But in our example A&B are equalpartners (they both contribute to thestudy discussion)

We want to capture the strength ofthis interaction (and there is no dir-ection here)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 38: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

10/29

D B

A

C

The problem is that a directed graph-ical model is not suitable for this ex-ample

A directed edge between two nodesimplies some kind of direction in theinteraction

For example A → B could indicatethat A influences B but not the otherway round

But in our example A&B are equalpartners (they both contribute to thestudy discussion)

We want to capture the strength ofthis interaction (and there is no dir-ection here)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 39: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

10/29

D B

A

C

The problem is that a directed graph-ical model is not suitable for this ex-ample

A directed edge between two nodesimplies some kind of direction in theinteraction

For example A → B could indicatethat A influences B but not the otherway round

But in our example A&B are equalpartners (they both contribute to thestudy discussion)

We want to capture the strength ofthis interaction (and there is no dir-ection here)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 40: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

10/29

D B

A

C

The problem is that a directed graph-ical model is not suitable for this ex-ample

A directed edge between two nodesimplies some kind of direction in theinteraction

For example A → B could indicatethat A influences B but not the otherway round

But in our example A&B are equalpartners (they both contribute to thestudy discussion)

We want to capture the strength ofthis interaction (and there is no dir-ection here)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 41: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

10/29

D B

A

C

The problem is that a directed graph-ical model is not suitable for this ex-ample

A directed edge between two nodesimplies some kind of direction in theinteraction

For example A → B could indicatethat A influences B but not the otherway round

But in our example A&B are equalpartners (they both contribute to thestudy discussion)

We want to capture the strength ofthis interaction (and there is no dir-ection here)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 42: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

11/29

D B

A

C

We move on from Directed Graph-ical Models to Undirected GraphicalModels

Also known as Markov Network

The Markov Network on the left ex-actly captures the interactions inher-ent in the problem

But how do we parameterize thisgraph?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 43: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

11/29

D B

A

C

We move on from Directed Graph-ical Models to Undirected GraphicalModels

Also known as Markov Network

The Markov Network on the left ex-actly captures the interactions inher-ent in the problem

But how do we parameterize thisgraph?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 44: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

11/29

D B

A

C

We move on from Directed Graph-ical Models to Undirected GraphicalModels

Also known as Markov Network

The Markov Network on the left ex-actly captures the interactions inher-ent in the problem

But how do we parameterize thisgraph?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 45: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

11/29

D B

A

C

We move on from Directed Graph-ical Models to Undirected GraphicalModels

Also known as Markov Network

The Markov Network on the left ex-actly captures the interactions inher-ent in the problem

But how do we parameterize thisgraph?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 46: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

12/29

Module 18.2: Factors in Markov Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 47: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

13/29

Grade SAT

Intellligence

Letter

Difficulty

P (G,S, I, L,D) =

P (I)P (D)P (G|I,D)P (S|I)P (L|G)

Recall that in the directed case thefactors were Conditional ProbabilityDistributions (CPDs)

Each such factor captured interaction(dependence) between the connectednodes

Can we use CPDs in the undirectedcase also ?

CPDs don’t make sense in the undir-ected case because there is no direc-tion and hence no natural condition-ing (Is A|B or B|A?)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 48: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

13/29

Grade SAT

Intellligence

Letter

Difficulty

P (G,S, I, L,D) =

P (I)P (D)P (G|I,D)P (S|I)P (L|G)

Recall that in the directed case thefactors were Conditional ProbabilityDistributions (CPDs)

Each such factor captured interaction(dependence) between the connectednodes

Can we use CPDs in the undirectedcase also ?

CPDs don’t make sense in the undir-ected case because there is no direc-tion and hence no natural condition-ing (Is A|B or B|A?)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 49: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

13/29

Grade SAT

Intellligence

Letter

Difficulty

P (G,S, I, L,D) =

P (I)P (D)P (G|I,D)P (S|I)P (L|G)

Recall that in the directed case thefactors were Conditional ProbabilityDistributions (CPDs)

Each such factor captured interaction(dependence) between the connectednodes

Can we use CPDs in the undirectedcase also ?

CPDs don’t make sense in the undir-ected case because there is no direc-tion and hence no natural condition-ing (Is A|B or B|A?)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 50: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

13/29

Grade SAT

Intellligence

Letter

Difficulty

P (G,S, I, L,D) =

P (I)P (D)P (G|I,D)P (S|I)P (L|G)

Recall that in the directed case thefactors were Conditional ProbabilityDistributions (CPDs)

Each such factor captured interaction(dependence) between the connectednodes

Can we use CPDs in the undirectedcase also ?

CPDs don’t make sense in the undir-ected case because there is no direc-tion and hence no natural condition-ing (Is A|B or B|A?)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 51: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

14/29

D B

A

C

So what should be the factors or para-meters in this case

Question: What do we want thesefactors to capture ?

Answer: The affinity between con-nected random variables

Just as in the directed case the factorscaptured the conditional dependencebetween a set of random variables,here we want them to capture the af-finity between them

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 52: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

14/29

D B

A

C

So what should be the factors or para-meters in this case

Question: What do we want thesefactors to capture ?

Answer: The affinity between con-nected random variables

Just as in the directed case the factorscaptured the conditional dependencebetween a set of random variables,here we want them to capture the af-finity between them

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 53: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

14/29

D B

A

C

So what should be the factors or para-meters in this case

Question: What do we want thesefactors to capture ?

Answer: The affinity between con-nected random variables

Just as in the directed case the factorscaptured the conditional dependencebetween a set of random variables,here we want them to capture the af-finity between them

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 54: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

14/29

D B

A

C

So what should be the factors or para-meters in this case

Question: What do we want thesefactors to capture ?

Answer: The affinity between con-nected random variables

Just as in the directed case the factorscaptured the conditional dependencebetween a set of random variables,here we want them to capture the af-finity between them

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 55: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

15/29

D B

A

C

However we can borrow the intuitionfrom the directed case.

Even in the undirected case, we wanteach such factor to capture inter-actions (affinity) between connectednodes

We could have factors φ1(A,B),φ2(B,C), φ3(C,D), φ4(D,A) whichcapture the affinity between the cor-responding nodes.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 56: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

15/29

D B

A

C

However we can borrow the intuitionfrom the directed case.

Even in the undirected case, we wanteach such factor to capture inter-actions (affinity) between connectednodes

We could have factors φ1(A,B),φ2(B,C), φ3(C,D), φ4(D,A) whichcapture the affinity between the cor-responding nodes.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 57: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

15/29

D B

A

C

However we can borrow the intuitionfrom the directed case.

Even in the undirected case, we wanteach such factor to capture inter-actions (affinity) between connectednodes

We could have factors φ1(A,B),φ2(B,C), φ3(C,D), φ4(D,A) whichcapture the affinity between the cor-responding nodes.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 58: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

16/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 a0 b0 a0 b0 a0 b0

a0 b1 a0 b1 a0 b1 a0 b1

a1 b0 a1 b0 a1 b0 a1 b0

a1 b1 a1 b1 a1 b1 a1 b1

But who will give us these values ?

Well now you need to learn them from data(same as in the directed case)

If you have access to a lot of past interac-tions between A&B then you could learnthese values(more on this later)

Intuitively, it makes sense to havethese factors associated with eachpair of connected random variables.

We could now assign some values ofthese factors

Roughly speaking φ1(A,B) assertsthat it is more likely for A and Bto agree [∵ weights for a0b0, a1b1 >a0b1, a1b0]

φ1(A,B) also assigns more weight tothe case when both do not have a mis-conception as compared to the casewhen both have the misconceptiona0b0 > a1b1

We could have similar assignments forthe other factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 59: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

16/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ?

Well now you need to learn them from data(same as in the directed case)

If you have access to a lot of past interac-tions between A&B then you could learnthese values(more on this later)

Intuitively, it makes sense to havethese factors associated with eachpair of connected random variables.

We could now assign some values ofthese factors

Roughly speaking φ1(A,B) assertsthat it is more likely for A and Bto agree [∵ weights for a0b0, a1b1 >a0b1, a1b0]

φ1(A,B) also assigns more weight tothe case when both do not have a mis-conception as compared to the casewhen both have the misconceptiona0b0 > a1b1

We could have similar assignments forthe other factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 60: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

16/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ?

Well now you need to learn them from data(same as in the directed case)

If you have access to a lot of past interac-tions between A&B then you could learnthese values(more on this later)

Intuitively, it makes sense to havethese factors associated with eachpair of connected random variables.

We could now assign some values ofthese factors

Roughly speaking φ1(A,B) assertsthat it is more likely for A and Bto agree [∵ weights for a0b0, a1b1 >a0b1, a1b0]

φ1(A,B) also assigns more weight tothe case when both do not have a mis-conception as compared to the casewhen both have the misconceptiona0b0 > a1b1

We could have similar assignments forthe other factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 61: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

16/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ?

Well now you need to learn them from data(same as in the directed case)

If you have access to a lot of past interac-tions between A&B then you could learnthese values(more on this later)

Intuitively, it makes sense to havethese factors associated with eachpair of connected random variables.

We could now assign some values ofthese factors

Roughly speaking φ1(A,B) assertsthat it is more likely for A and Bto agree [∵ weights for a0b0, a1b1 >a0b1, a1b0]

φ1(A,B) also assigns more weight tothe case when both do not have a mis-conception as compared to the casewhen both have the misconceptiona0b0 > a1b1

We could have similar assignments forthe other factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 62: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

16/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ?

Well now you need to learn them from data(same as in the directed case)

If you have access to a lot of past interac-tions between A&B then you could learnthese values(more on this later)

Intuitively, it makes sense to havethese factors associated with eachpair of connected random variables.

We could now assign some values ofthese factors

Roughly speaking φ1(A,B) assertsthat it is more likely for A and Bto agree [∵ weights for a0b0, a1b1 >a0b1, a1b0]

φ1(A,B) also assigns more weight tothe case when both do not have a mis-conception as compared to the casewhen both have the misconceptiona0b0 > a1b1

We could have similar assignments forthe other factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 63: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

16/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ?

Well now you need to learn them from data(same as in the directed case)

If you have access to a lot of past interac-tions between A&B then you could learnthese values(more on this later)

Intuitively, it makes sense to havethese factors associated with eachpair of connected random variables.

We could now assign some values ofthese factors

Roughly speaking φ1(A,B) assertsthat it is more likely for A and Bto agree [∵ weights for a0b0, a1b1 >a0b1, a1b0]

φ1(A,B) also assigns more weight tothe case when both do not have a mis-conception as compared to the casewhen both have the misconceptiona0b0 > a1b1

We could have similar assignments forthe other factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 64: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

16/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ?

Well now you need to learn them from data(same as in the directed case)

If you have access to a lot of past interac-tions between A&B then you could learnthese values(more on this later)

Intuitively, it makes sense to havethese factors associated with eachpair of connected random variables.

We could now assign some values ofthese factors

Roughly speaking φ1(A,B) assertsthat it is more likely for A and Bto agree [∵ weights for a0b0, a1b1 >a0b1, a1b0]

φ1(A,B) also assigns more weight tothe case when both do not have a mis-conception as compared to the casewhen both have the misconceptiona0b0 > a1b1

We could have similar assignments forthe other factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 65: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

16/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ?

Well now you need to learn them from data(same as in the directed case)

If you have access to a lot of past interac-tions between A&B then you could learnthese values(more on this later)

Intuitively, it makes sense to havethese factors associated with eachpair of connected random variables.

We could now assign some values ofthese factors

Roughly speaking φ1(A,B) assertsthat it is more likely for A and Bto agree [∵ weights for a0b0, a1b1 >a0b1, a1b0]

φ1(A,B) also assigns more weight tothe case when both do not have a mis-conception as compared to the casewhen both have the misconceptiona0b0 > a1b1

We could have similar assignments forthe other factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 66: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

17/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

Notice a few things

These tables do not represent prob-ability distributions

They are just weights which can beinterpreted as the relative likelihoodof an event

For example, a = 0, b = 0 is morelikely than a = 1, b = 1

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 67: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

17/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

Notice a few things

These tables do not represent prob-ability distributions

They are just weights which can beinterpreted as the relative likelihoodof an event

For example, a = 0, b = 0 is morelikely than a = 1, b = 1

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 68: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

17/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

Notice a few things

These tables do not represent prob-ability distributions

They are just weights which can beinterpreted as the relative likelihoodof an event

For example, a = 0, b = 0 is morelikely than a = 1, b = 1

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 69: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

17/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

Notice a few things

These tables do not represent prob-ability distributions

They are just weights which can beinterpreted as the relative likelihoodof an event

For example, a = 0, b = 0 is morelikely than a = 1, b = 1

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 70: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

18/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

But eventually we are interested inprobability distributions

In the directed case going fromfactors to a joint probability dis-tribution was easy as the factorswere themselves conditional probab-ility distributions

We could just write the joint probab-ility distribution as the product of thefactors (without violating the axiomsof probability)

What do we do in this case when thefactors are not probability distribu-tions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 71: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

18/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

But eventually we are interested inprobability distributions

In the directed case going fromfactors to a joint probability dis-tribution was easy as the factorswere themselves conditional probab-ility distributions

We could just write the joint probab-ility distribution as the product of thefactors (without violating the axiomsof probability)

What do we do in this case when thefactors are not probability distribu-tions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 72: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

18/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

But eventually we are interested inprobability distributions

In the directed case going fromfactors to a joint probability dis-tribution was easy as the factorswere themselves conditional probab-ility distributions

We could just write the joint probab-ility distribution as the product of thefactors (without violating the axiomsof probability)

What do we do in this case when thefactors are not probability distribu-tions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 73: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

18/29

D B

A

C

φ1(A,B) φ2(B,C) φ3(C,D) φ4(D,A)

a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

But eventually we are interested inprobability distributions

In the directed case going fromfactors to a joint probability dis-tribution was easy as the factorswere themselves conditional probab-ility distributions

We could just write the joint probab-ility distribution as the product of thefactors (without violating the axiomsof probability)

What do we do in this case when thefactors are not probability distribu-tions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 74: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

19/29

Assignment Unnormalized Normalized

a0 b0 c0 d0 300,000 4.17E-02a0 b0 c0 d1 300,000 4.17E-02a0 b0 c1 d0 300,000 4.17E-02a0 b0 c1 d1 30 4.17E-06a0 b1 c0 d0 500 6.94E-05a0 b1 c0 d1 500 6.94E-05a0 b1 c1 d0 5,000,000 6.94E-01a0 b1 c1 d1 500 6.94E-05a1 b0 c0 d0 100 1.39E-05a1 b0 c0 d1 1,000,000 1.39E-01a1 b0 c1 d0 100 1.39E-05a1 b0 c1 d1 100 1.39E-05a1 b1 c0 d0 10 1.39E-06a1 b1 c0 d1 100,000 1.39E-02a1 b1 c1 d0 100,000 1.39E-02a1 b1 c1 d1 100,000 1.39E-02

Well we could still write it as a productof these factors and normalize it appro-priately

P (a, b, c, d) =

1

Zφ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

where

Z =∑a,b,c,d

φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

Based on the values that we had assignedto the factors we can now compute thefull joint probability distribution

Z is called the partition function.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 75: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

19/29

Assignment Unnormalized Normalized

a0 b0 c0 d0 300,000 4.17E-02a0 b0 c0 d1 300,000 4.17E-02a0 b0 c1 d0 300,000 4.17E-02a0 b0 c1 d1 30 4.17E-06a0 b1 c0 d0 500 6.94E-05a0 b1 c0 d1 500 6.94E-05a0 b1 c1 d0 5,000,000 6.94E-01a0 b1 c1 d1 500 6.94E-05a1 b0 c0 d0 100 1.39E-05a1 b0 c0 d1 1,000,000 1.39E-01a1 b0 c1 d0 100 1.39E-05a1 b0 c1 d1 100 1.39E-05a1 b1 c0 d0 10 1.39E-06a1 b1 c0 d1 100,000 1.39E-02a1 b1 c1 d0 100,000 1.39E-02a1 b1 c1 d1 100,000 1.39E-02

Well we could still write it as a productof these factors and normalize it appro-priately

P (a, b, c, d) =

1

Zφ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

where

Z =∑a,b,c,d

φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

Based on the values that we had assignedto the factors we can now compute thefull joint probability distribution

Z is called the partition function.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 76: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

19/29

Assignment Unnormalized Normalized

a0 b0 c0 d0 300,000 4.17E-02a0 b0 c0 d1 300,000 4.17E-02a0 b0 c1 d0 300,000 4.17E-02a0 b0 c1 d1 30 4.17E-06a0 b1 c0 d0 500 6.94E-05a0 b1 c0 d1 500 6.94E-05a0 b1 c1 d0 5,000,000 6.94E-01a0 b1 c1 d1 500 6.94E-05a1 b0 c0 d0 100 1.39E-05a1 b0 c0 d1 1,000,000 1.39E-01a1 b0 c1 d0 100 1.39E-05a1 b0 c1 d1 100 1.39E-05a1 b1 c0 d0 10 1.39E-06a1 b1 c0 d1 100,000 1.39E-02a1 b1 c1 d0 100,000 1.39E-02a1 b1 c1 d1 100,000 1.39E-02

Well we could still write it as a productof these factors and normalize it appro-priately

P (a, b, c, d) =

1

Zφ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

where

Z =∑a,b,c,d

φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

Based on the values that we had assignedto the factors we can now compute thefull joint probability distribution

Z is called the partition function.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 77: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

19/29

Assignment Unnormalized Normalized

a0 b0 c0 d0 300,000 4.17E-02a0 b0 c0 d1 300,000 4.17E-02a0 b0 c1 d0 300,000 4.17E-02a0 b0 c1 d1 30 4.17E-06a0 b1 c0 d0 500 6.94E-05a0 b1 c0 d1 500 6.94E-05a0 b1 c1 d0 5,000,000 6.94E-01a0 b1 c1 d1 500 6.94E-05a1 b0 c0 d0 100 1.39E-05a1 b0 c0 d1 1,000,000 1.39E-01a1 b0 c1 d0 100 1.39E-05a1 b0 c1 d1 100 1.39E-05a1 b1 c0 d0 10 1.39E-06a1 b1 c0 d1 100,000 1.39E-02a1 b1 c1 d0 100,000 1.39E-02a1 b1 c1 d1 100,000 1.39E-02

Well we could still write it as a productof these factors and normalize it appro-priately

P (a, b, c, d) =

1

Zφ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

where

Z =∑a,b,c,d

φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

Based on the values that we had assignedto the factors we can now compute thefull joint probability distribution

Z is called the partition function.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 78: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

19/29

Assignment Unnormalized Normalized

a0 b0 c0 d0 300,000 4.17E-02a0 b0 c0 d1 300,000 4.17E-02a0 b0 c1 d0 300,000 4.17E-02a0 b0 c1 d1 30 4.17E-06a0 b1 c0 d0 500 6.94E-05a0 b1 c0 d1 500 6.94E-05a0 b1 c1 d0 5,000,000 6.94E-01a0 b1 c1 d1 500 6.94E-05a1 b0 c0 d0 100 1.39E-05a1 b0 c0 d1 1,000,000 1.39E-01a1 b0 c1 d0 100 1.39E-05a1 b0 c1 d1 100 1.39E-05a1 b1 c0 d0 10 1.39E-06a1 b1 c0 d1 100,000 1.39E-02a1 b1 c1 d0 100,000 1.39E-02a1 b1 c1 d1 100,000 1.39E-02

Well we could still write it as a productof these factors and normalize it appro-priately

P (a, b, c, d) =

1

Zφ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

where

Z =∑a,b,c,d

φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

Based on the values that we had assignedto the factors we can now compute thefull joint probability distribution

Z is called the partition function.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 79: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

20/29

D B

A

C

E F

Let us build on the original exampleby adding some more students

Once again there is an edge betweentwo students if they study together

One way of interpreting these newconnections is that {A,D,E} from astudy group or a clique

Similarly {A,F,B} form a studygroup and {C,D} form a study groupand {B,C} form a study group

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 80: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

20/29

D B

A

C

E F

Let us build on the original exampleby adding some more students

Once again there is an edge betweentwo students if they study together

One way of interpreting these newconnections is that {A,D,E} from astudy group or a clique

Similarly {A,F,B} form a studygroup and {C,D} form a study groupand {B,C} form a study group

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 81: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

20/29

D B

A

C

E F

Let us build on the original exampleby adding some more students

Once again there is an edge betweentwo students if they study together

One way of interpreting these newconnections is that {A,D,E} from astudy group or a clique

Similarly {A,F,B} form a studygroup and {C,D} form a study groupand {B,C} form a study group

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 82: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

20/29

D B

A

C

E F

Let us build on the original exampleby adding some more students

Once again there is an edge betweentwo students if they study together

One way of interpreting these newconnections is that {A,D,E} from astudy group or a clique

Similarly {A,F,B} form a studygroup and {C,D} form a study groupand {B,C} form a study group

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 83: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

20/29

D B

A

C

E F

Let us build on the original exampleby adding some more students

Once again there is an edge betweentwo students if they study together

One way of interpreting these newconnections is that {A,D,E} from astudy group or a clique

Similarly {A,F,B} form a studygroup and {C,D} form a study groupand {B,C} form a study group

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 84: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

21/29

D B

A

C

E F

φ1(A,E)φ2(A,F )φ3(B,F )φ4(A,B)

φ5(A,D)φ6(D,E)φ7(B,C)φ8(C,D)

φ1(A,E,D)φ2(A,F,B)φ3(B,C)φ4(C,D)

Now, what should the factors be?

We could still have factors which cap-ture pairwise interactions

But could we do something smarter(and more efficient)

Instead of having a factor for eachpair of nodes why not have it for eachmaximal clique?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 85: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

21/29

D B

A

C

E F

φ1(A,E)φ2(A,F )φ3(B,F )φ4(A,B)

φ5(A,D)φ6(D,E)φ7(B,C)φ8(C,D)

φ1(A,E,D)φ2(A,F,B)φ3(B,C)φ4(C,D)

Now, what should the factors be?

We could still have factors which cap-ture pairwise interactions

But could we do something smarter(and more efficient)

Instead of having a factor for eachpair of nodes why not have it for eachmaximal clique?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 86: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

21/29

D B

A

C

E F

φ1(A,E)φ2(A,F )φ3(B,F )φ4(A,B)

φ5(A,D)φ6(D,E)φ7(B,C)φ8(C,D)

φ1(A,E,D)φ2(A,F,B)φ3(B,C)φ4(C,D)

Now, what should the factors be?

We could still have factors which cap-ture pairwise interactions

But could we do something smarter(and more efficient)

Instead of having a factor for eachpair of nodes why not have it for eachmaximal clique?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 87: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

21/29

D B

A

C

E F

φ1(A,E)φ2(A,F )φ3(B,F )φ4(A,B)

φ5(A,D)φ6(D,E)φ7(B,C)φ8(C,D)

φ1(A,E,D)φ2(A,F,B)φ3(B,C)φ4(C,D)

Now, what should the factors be?

We could still have factors which cap-ture pairwise interactions

But could we do something smarter(and more efficient)

Instead of having a factor for eachpair of nodes why not have it for eachmaximal clique?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 88: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

21/29

D B

A

C

E F

φ1(A,E)φ2(A,F )φ3(B,F )φ4(A,B)

φ5(A,D)φ6(D,E)φ7(B,C)φ8(C,D)

φ1(A,E,D)φ2(A,F,B)φ3(B,C)φ4(C,D)

Now, what should the factors be?

We could still have factors which cap-ture pairwise interactions

But could we do something smarter(and more efficient)

Instead of having a factor for eachpair of nodes why not have it for eachmaximal clique?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 89: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

21/29

D B

A

C

E F

φ1(A,E)φ2(A,F )φ3(B,F )φ4(A,B)

φ5(A,D)φ6(D,E)φ7(B,C)φ8(C,D)

φ1(A,E,D)φ2(A,F,B)φ3(B,C)φ4(C,D)

Now, what should the factors be?

We could still have factors which cap-ture pairwise interactions

But could we do something smarter(and more efficient)

Instead of having a factor for eachpair of nodes why not have it for eachmaximal clique?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 90: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

22/29

D B

A

C

FE

G

What if we add one more student?

What will be the factors in this case?

Remember, we are interested in max-imal cliques

So instead of having factors φ(EAG)φ(GAD) φ(EGD) we will have asingle factor φ(AEGD) correspond-ing to the maximal clique

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 91: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

22/29

D B

A

C

FE

G

What if we add one more student?

What will be the factors in this case?

Remember, we are interested in max-imal cliques

So instead of having factors φ(EAG)φ(GAD) φ(EGD) we will have asingle factor φ(AEGD) correspond-ing to the maximal clique

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 92: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

22/29

D B

A

C

FE

G

What if we add one more student?

What will be the factors in this case?

Remember, we are interested in max-imal cliques

So instead of having factors φ(EAG)φ(GAD) φ(EGD) we will have asingle factor φ(AEGD) correspond-ing to the maximal clique

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 93: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

22/29

D B

A

C

FE

G

What if we add one more student?

What will be the factors in this case?

Remember, we are interested in max-imal cliques

So instead of having factors φ(EAG)φ(GAD) φ(EGD) we will have asingle factor φ(AEGD) correspond-ing to the maximal clique

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 94: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

22/29

D B

A

C

FE

G

What if we add one more student?

What will be the factors in this case?

Remember, we are interested in max-imal cliques

So instead of having factors φ(EAG)φ(GAD) φ(EGD) we will have asingle factor φ(AEGD) correspond-ing to the maximal clique

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 95: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

23/29

Grade SAT

Intellligence

Letter

Difficulty

A distribution P factorizes over a BayesianNetwork G if P can be expressed as

P (X1, . . . , Xn) =

n∏i=1

P (Xi|PaXi)

B C

A

D

E F

A distribution factorizes over a MarkovNetwork H if P can be expressed as

P (X1, . . . , Xn) =1

Z

m∏i=1

φ(Di)

where each Di is a complete sub-graph(maximal clique) in H

A distribution is a Gibbs distribution parametrized by a set of factors Φ = {φ1(D1), . . . , φm(Dm)}if it is defined as

P (X1, . . . , Xn) =1

Z

m∏i=1

φi(Di)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 96: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

23/29

Grade SAT

Intellligence

Letter

Difficulty

A distribution P factorizes over a BayesianNetwork G if P can be expressed as

P (X1, . . . , Xn) =

n∏i=1

P (Xi|PaXi)

B C

A

D

E F

A distribution factorizes over a MarkovNetwork H if P can be expressed as

P (X1, . . . , Xn) =1

Z

m∏i=1

φ(Di)

where each Di is a complete sub-graph(maximal clique) in H

A distribution is a Gibbs distribution parametrized by a set of factors Φ = {φ1(D1), . . . , φm(Dm)}if it is defined as

P (X1, . . . , Xn) =1

Z

m∏i=1

φi(Di)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 97: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

23/29

Grade SAT

Intellligence

Letter

Difficulty

A distribution P factorizes over a BayesianNetwork G if P can be expressed as

P (X1, . . . , Xn) =

n∏i=1

P (Xi|PaXi)

B C

A

D

E F

A distribution factorizes over a MarkovNetwork H if P can be expressed as

P (X1, . . . , Xn) =1

Z

m∏i=1

φ(Di)

where each Di is a complete sub-graph(maximal clique) in H

A distribution is a Gibbs distribution parametrized by a set of factors Φ = {φ1(D1), . . . , φm(Dm)}if it is defined as

P (X1, . . . , Xn) =1

Z

m∏i=1

φi(Di)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 98: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

23/29

Grade SAT

Intellligence

Letter

Difficulty

A distribution P factorizes over a BayesianNetwork G if P can be expressed as

P (X1, . . . , Xn) =

n∏i=1

P (Xi|PaXi)

B C

A

D

E F

A distribution factorizes over a MarkovNetwork H if P can be expressed as

P (X1, . . . , Xn) =1

Z

m∏i=1

φ(Di)

where each Di is a complete sub-graph(maximal clique) in H

A distribution is a Gibbs distribution parametrized by a set of factors Φ = {φ1(D1), . . . , φm(Dm)}if it is defined as

P (X1, . . . , Xn) =1

Z

m∏i=1

φi(Di)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 99: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

23/29

Grade SAT

Intellligence

Letter

Difficulty

A distribution P factorizes over a BayesianNetwork G if P can be expressed as

P (X1, . . . , Xn) =

n∏i=1

P (Xi|PaXi)

B C

A

D

E F

A distribution factorizes over a MarkovNetwork H if P can be expressed as

P (X1, . . . , Xn) =1

Z

m∏i=1

φ(Di)

where each Di is a complete sub-graph(maximal clique) in H

A distribution is a Gibbs distribution parametrized by a set of factors Φ = {φ1(D1), . . . , φm(Dm)}if it is defined as

P (X1, . . . , Xn) =1

Z

m∏i=1

φi(Di)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 100: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

24/29

Module 18.3: Local Independencies in a MarkovNetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 101: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

25/29

Let U be the set of all random vari-ables in our joint distribution

Let X,Y, Z be some distinct subsetsof U

A distribution P over these RVswould imply X⊥Y |Z if and only ifwe can write

P (X) = φ1(X,Z)φ2(Y, Z)

Let us see this in the context of ouroriginal example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 102: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

25/29

Let U be the set of all random vari-ables in our joint distribution

Let X,Y, Z be some distinct subsetsof U

A distribution P over these RVswould imply X⊥Y |Z if and only ifwe can write

P (X) = φ1(X,Z)φ2(Y, Z)

Let us see this in the context of ouroriginal example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 103: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

25/29

Let U be the set of all random vari-ables in our joint distribution

Let X,Y, Z be some distinct subsetsof U

A distribution P over these RVswould imply X⊥Y |Z if and only ifwe can write

P (X) = φ1(X,Z)φ2(Y, Z)

Let us see this in the context of ouroriginal example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 104: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

25/29

Let U be the set of all random vari-ables in our joint distribution

Let X,Y, Z be some distinct subsetsof U

A distribution P over these RVswould imply X⊥Y |Z if and only ifwe can write

P (X) = φ1(X,Z)φ2(Y, Z)

Let us see this in the context of ouroriginal example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 105: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

26/29

D B

A

C

In this example

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(B,C)φ3(C,D)φ4(D,A)]

We can rewrite this as

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(B,C)]︸ ︷︷ ︸

φ5(B,{A,C})

[φ3(C,D)φ4(D,A)]︸ ︷︷ ︸φ6(D,{A,C})

We can say that B⊥D|{A,C} whichis indeed true

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 106: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

26/29

D B

A

C

In this example

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(B,C)φ3(C,D)φ4(D,A)]

We can rewrite this as

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(B,C)]︸ ︷︷ ︸

φ5(B,{A,C})

[φ3(C,D)φ4(D,A)]︸ ︷︷ ︸φ6(D,{A,C})

We can say that B⊥D|{A,C} whichis indeed true

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 107: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

26/29

D B

A

C

In this example

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(B,C)φ3(C,D)φ4(D,A)]

We can rewrite this as

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(B,C)]︸ ︷︷ ︸

φ5(B,{A,C})

[φ3(C,D)φ4(D,A)]︸ ︷︷ ︸φ6(D,{A,C})

We can say that B⊥D|{A,C} whichis indeed true

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 108: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

27/29

D B

A

C

In this example

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(B,C)φ3(C,D)φ4(D,A)]

Alternatively we can rewrite this as

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(D,A)]︸ ︷︷ ︸

φ5(A,{B,D})

[φ3(C,D)φ4(B,C)]︸ ︷︷ ︸φ6(C,{B,D})

We can say that A⊥C|{B,D} whichis indeed true

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 109: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

27/29

D B

A

C

In this example

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(B,C)φ3(C,D)φ4(D,A)]

Alternatively we can rewrite this as

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(D,A)]︸ ︷︷ ︸

φ5(A,{B,D})

[φ3(C,D)φ4(B,C)]︸ ︷︷ ︸φ6(C,{B,D})

We can say that A⊥C|{B,D} whichis indeed true

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 110: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

27/29

D B

A

C

In this example

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(B,C)φ3(C,D)φ4(D,A)]

Alternatively we can rewrite this as

P (A,B,C,D) =

1

Z[φ1(A,B)φ2(D,A)]︸ ︷︷ ︸

φ5(A,{B,D})

[φ3(C,D)φ4(B,C)]︸ ︷︷ ︸φ6(C,{B,D})

We can say that A⊥C|{B,D} whichis indeed true

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 111: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

28/29

For a given Markov network H wedefine Markov Blanket of a RV X tobe the neighbors of X in H

Analogous to the case of BayesianNetworks we can define the local in-dependences associated with H to be

X⊥(U − {X} −MBH)|MBH(X)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 112: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

28/29

For a given Markov network H wedefine Markov Blanket of a RV X tobe the neighbors of X in H

Analogous to the case of BayesianNetworks we can define the local in-dependences associated with H to be

X⊥(U − {X} −MBH)|MBH(X)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 113: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

29/29

Bayesian network

Grade SAT

Intellligence

Letter

Difficulty

Local Independencies

Xi⊥NonDescendentsXi |ParentGXi

Markov network

Local Independencies

Xi⊥NonNeighborsXi |NeighborsGXi

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 114: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

29/29

Bayesian network

Grade SAT

Intellligence

Letter

Difficulty

Local Independencies

Xi⊥NonDescendentsXi |ParentGXi

Markov network

Local Independencies

Xi⊥NonNeighborsXi |NeighborsGXi

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Page 115: CS7015 (Deep Learning) : Lecture 18 - Markov Networksmiteshk/CS7015/Slides/Teaching/pdf/Lec… · Band Cstudy together sometimes Cand Dstudy together sometimes Aand Dstudy together

29/29

Bayesian network

Grade SAT

Intellligence

Letter

Difficulty

Local Independencies

Xi⊥NonDescendentsXi |ParentGXi

Markov network

Local Independencies

Xi⊥NonNeighborsXi |NeighborsGXi

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18