cs7015 (deep learning) : lecture 17€¦ · mitesh m. khapra cs7015 (deep learning) : lecture 17....

332
1/85 CS7015 (Deep Learning) : Lecture 17 Recap of Probability Theory, Bayesian Networks, Conditional Independence in Bayesian Networks Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Upload: others

Post on 14-Aug-2020

27 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

1/85

CS7015 (Deep Learning) : Lecture 17Recap of Probability Theory, Bayesian Networks, Conditional Independence in

Bayesian Networks

Mitesh M. Khapra

Department of Computer Science and EngineeringIndian Institute of Technology Madras

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 2: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

2/85

Module 17.0: Recap of Probability Theory

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 3: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

3/85

We will start with a quick recap of some basic concepts from probability

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 4: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

4/85

A1

A2

A3

A4 A5

Ω

Axioms of Probability

For any event A,

P (A) ≥ 0

If A1, A2, A3, ...., An are disjointevents (i.e., Ai ∩ Aj = φ ∀i 6= j)then

P (∪Ai) =∑i

P (Ai)

If Ω is the universal set containing allevents then

P (Ω) = 1

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 5: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

4/85

A1

A2

A3

A4 A5

Ω

Axioms of Probability

For any event A,

P (A) ≥ 0

If A1, A2, A3, ...., An are disjointevents (i.e., Ai ∩ Aj = φ ∀i 6= j)then

P (∪Ai) =∑i

P (Ai)

If Ω is the universal set containing allevents then

P (Ω) = 1

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 6: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

4/85

A1

A2

A3

A4 A5

Ω

Axioms of Probability

For any event A,

P (A) ≥ 0

If A1, A2, A3, ...., An are disjointevents (i.e., Ai ∩ Aj = φ ∀i 6= j)then

P (∪Ai) =∑i

P (Ai)

If Ω is the universal set containing allevents then

P (Ω) = 1

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 7: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

4/85

A1

A2

A3

A4 A5

Ω

Axioms of Probability

For any event A,

P (A) ≥ 0

If A1, A2, A3, ...., An are disjointevents (i.e., Ai ∩ Aj = φ ∀i 6= j)then

P (∪Ai) =∑i

P (Ai)

If Ω is the universal set containing allevents then

P (Ω) = 1

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 8: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

5/85

A

B

C

Ω

Ω

Grades

A

B

CG

Random Variable (intuition)

Suppose a student can get one of 3possible grades in a course: A,B,C

One way of interpreting this is thatthere are 3 possible events here

Another way of looking at this isthere is a random variable G whicheach student to one of the 3 possiblevalues

And we are interested in P (G = g)where g ∈ A,B,COf course, both interpretations areconceptually equivalent

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 9: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

5/85

A

B

C

Ω

Ω

Grades

A

B

CG

Random Variable (intuition)

Suppose a student can get one of 3possible grades in a course: A,B,C

One way of interpreting this is thatthere are 3 possible events here

Another way of looking at this isthere is a random variable G whicheach student to one of the 3 possiblevalues

And we are interested in P (G = g)where g ∈ A,B,COf course, both interpretations areconceptually equivalent

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 10: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

5/85

Ω

Grades

A

B

CG

Random Variable (intuition)

Suppose a student can get one of 3possible grades in a course: A,B,C

One way of interpreting this is thatthere are 3 possible events here

Another way of looking at this isthere is a random variable G whicheach student to one of the 3 possiblevalues

And we are interested in P (G = g)where g ∈ A,B,COf course, both interpretations areconceptually equivalent

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 11: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

5/85

Ω

Grades

A

B

CG

Random Variable (intuition)

Suppose a student can get one of 3possible grades in a course: A,B,C

One way of interpreting this is thatthere are 3 possible events here

Another way of looking at this isthere is a random variable G whicheach student to one of the 3 possiblevalues

And we are interested in P (G = g)where g ∈ A,B,C

Of course, both interpretations areconceptually equivalent

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 12: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

5/85

Ω

Grades

A

B

CG

Random Variable (intuition)

Suppose a student can get one of 3possible grades in a course: A,B,C

One way of interpreting this is thatthere are 3 possible events here

Another way of looking at this isthere is a random variable G whicheach student to one of the 3 possiblevalues

And we are interested in P (G = g)where g ∈ A,B,COf course, both interpretations areconceptually equivalent

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 13: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

6/85

Ω

Grades

A

B

CG

Ω

Grades

A

B

CG

Height

Short

Tall

H

Age

Young

AdultA

Random Variable (intuition)

But the second one (using randomvariables) is more compact

Specially, when there are multipleattributes associated with a student(outcome) - grade, height, age, etc.

We could have one random variablecorresponding to each attribute

And then ask for outcomes (or stu-dents) where Grade = g, Height =h, Age = a and so on

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 14: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

6/85

Ω

Grades

A

B

CG

Height

Short

Tall

H

Age

Young

AdultA

Random Variable (intuition)

But the second one (using randomvariables) is more compact

Specially, when there are multipleattributes associated with a student(outcome) - grade, height, age, etc.

We could have one random variablecorresponding to each attribute

And then ask for outcomes (or stu-dents) where Grade = g, Height =h, Age = a and so on

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 15: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

6/85

Ω

Grades

A

B

CG

Height

Short

Tall

H

Age

Young

AdultA

Random Variable (intuition)

But the second one (using randomvariables) is more compact

Specially, when there are multipleattributes associated with a student(outcome) - grade, height, age, etc.

We could have one random variablecorresponding to each attribute

And then ask for outcomes (or stu-dents) where Grade = g, Height =h, Age = a and so on

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 16: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

6/85

Ω

Grades

A

B

CG

Height

Short

Tall

H

Age

Young

AdultA

Random Variable (intuition)

But the second one (using randomvariables) is more compact

Specially, when there are multipleattributes associated with a student(outcome) - grade, height, age, etc.

We could have one random variablecorresponding to each attribute

And then ask for outcomes (or stu-dents) where Grade = g, Height =h, Age = a and so on

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 17: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

7/85

Ω

Grades

A

B

CG

Height

Short

Tall

H

Age

Young

AdultA

Random Variable (formal)

A random variable is a functionwhich maps each outcome in Ω to avalue

In the previous example, G (or fgrade)maps each student in Ω to a value: A,B or C

The event Grade = A is a shorthandfor the event ω ∈ Ω : fGrade = A

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 18: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

7/85

Ω

Grades

A

B

CG

Height

Short

Tall

H

Age

Young

AdultA

Random Variable (formal)

A random variable is a functionwhich maps each outcome in Ω to avalue

In the previous example, G (or fgrade)maps each student in Ω to a value: A,B or C

The event Grade = A is a shorthandfor the event ω ∈ Ω : fGrade = A

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 19: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

7/85

Ω

Grades

A

B

CG

Height

Short

Tall

H

Age

Young

AdultA

Random Variable (formal)

A random variable is a functionwhich maps each outcome in Ω to avalue

In the previous example, G (or fgrade)maps each student in Ω to a value: A,B or C

The event Grade = A is a shorthandfor the event ω ∈ Ω : fGrade = A

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 20: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

7/85

Ω

Grades

A

B

CG

Height

Short

Tall

H

Age

Young

AdultA

Random Variable (formal)

A random variable is a functionwhich maps each outcome in Ω to avalue

In the previous example, G (or fgrade)maps each student in Ω to a value: A,B or C

The event Grade = A is a shorthandfor the event ω ∈ Ω : fGrade = A

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 21: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

8/85

Ω

Grades

A

B

CG

Height

120cm...

200cm

H

Weight

45kg

.

.

.

120kgA

Random Variable (continuous v/sdiscrete)

A random variable can either takecontinuous values (for example,weight, height)

Or discrete values (for example,grade, nationality)

For this discussion we will mainly fo-cus on discrete random variables

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 22: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

8/85

Ω

Grades

A

B

CG

Height

120cm...

200cm

H

Weight

45kg

.

.

.

120kgA

Random Variable (continuous v/sdiscrete)

A random variable can either takecontinuous values (for example,weight, height)

Or discrete values (for example,grade, nationality)

For this discussion we will mainly fo-cus on discrete random variables

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 23: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

8/85

Ω

Grades

A

B

CG

Height

120cm...

200cm

H

Weight

45kg

.

.

.

120kgA

Random Variable (continuous v/sdiscrete)

A random variable can either takecontinuous values (for example,weight, height)

Or discrete values (for example,grade, nationality)

For this discussion we will mainly fo-cus on discrete random variables

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 24: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

9/85

G P (G =g)

A 0.1B 0.2C 0.7

Marginal Distribution

What do we mean by marginal dis-tribution over a random variable ?

Consider our random variable G forgrades

Specifying the marginal distributionover G means specifying

P (G = g) ∀g ∈ A,B,C

We denote this marginal distributioncompactly by P (G)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 25: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

9/85

G P (G =g)

A 0.1B 0.2C 0.7

Marginal Distribution

What do we mean by marginal dis-tribution over a random variable ?

Consider our random variable G forgrades

Specifying the marginal distributionover G means specifying

P (G = g) ∀g ∈ A,B,C

We denote this marginal distributioncompactly by P (G)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 26: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

9/85

G P (G =g)

A 0.1B 0.2C 0.7

Marginal Distribution

What do we mean by marginal dis-tribution over a random variable ?

Consider our random variable G forgrades

Specifying the marginal distributionover G means specifying

P (G = g) ∀g ∈ A,B,C

We denote this marginal distributioncompactly by P (G)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 27: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

9/85

G P (G =g)

A 0.1B 0.2C 0.7

Marginal Distribution

What do we mean by marginal dis-tribution over a random variable ?

Consider our random variable G forgrades

Specifying the marginal distributionover G means specifying

P (G = g) ∀g ∈ A,B,C

We denote this marginal distributioncompactly by P (G)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 28: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

10/85

G I P (G = g, I = i)

A High 0.3A Low 0.1B High 0.15B Low 0.15C High 0.1C Low 0.2

Joint Distribution

Consider two random variable G (grade) andI (intellegence ∈ High, Low)

The joint distribution over these two randomvariables assigns probabilities to all events in-volving these two random variables

P (G = g, I = i) ∀(g, i) ∈ A,B,C × H,L

We denote this joint distribution compactlyby P (G, I)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 29: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

10/85

G I P (G = g, I = i)

A High 0.3A Low 0.1B High 0.15B Low 0.15C High 0.1C Low 0.2

Joint Distribution

Consider two random variable G (grade) andI (intellegence ∈ High, Low)The joint distribution over these two randomvariables assigns probabilities to all events in-volving these two random variables

P (G = g, I = i) ∀(g, i) ∈ A,B,C × H,L

We denote this joint distribution compactlyby P (G, I)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 30: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

10/85

G I P (G = g, I = i)

A High 0.3A Low 0.1B High 0.15B Low 0.15C High 0.1C Low 0.2

Joint Distribution

Consider two random variable G (grade) andI (intellegence ∈ High, Low)The joint distribution over these two randomvariables assigns probabilities to all events in-volving these two random variables

P (G = g, I = i) ∀(g, i) ∈ A,B,C × H,L

We denote this joint distribution compactlyby P (G, I)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 31: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

10/85

G I P (G = g, I = i)

A High 0.3A Low 0.1B High 0.15B Low 0.15C High 0.1C Low 0.2

Joint Distribution

Consider two random variable G (grade) andI (intellegence ∈ High, Low)The joint distribution over these two randomvariables assigns probabilities to all events in-volving these two random variables

P (G = g, I = i) ∀(g, i) ∈ A,B,C × H,L

We denote this joint distribution compactlyby P (G, I)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 32: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

11/85

G P (G|I = H)

A 0.6B 0.3C 0.1

G P (G|I = L)

A 0.3B 0.4C 0.3

Conditional Distribution

Consider two random variable G (grade) and I (intel-legence)

Suppose we are given the value of I (say, I = H) thenthe conditional distribution P (G|I) is defined as

P (G = g|I = H) =P (G = g, I = H)

P (I = H)∀g ∈ A,B,C

More compactly defined as

P (G|I) =P (G, I)

P (I)

or P (G, I)︸ ︷︷ ︸joint

= P (G|I)︸ ︷︷ ︸conditional

∗ P (I)︸ ︷︷ ︸marginal

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 33: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

11/85

G P (G|I = H)

A 0.6B 0.3C 0.1

G P (G|I = L)

A 0.3B 0.4C 0.3

Conditional Distribution

Consider two random variable G (grade) and I (intel-legence)

Suppose we are given the value of I (say, I = H) thenthe conditional distribution P (G|I) is defined as

P (G = g|I = H) =P (G = g, I = H)

P (I = H)∀g ∈ A,B,C

More compactly defined as

P (G|I) =P (G, I)

P (I)

or P (G, I)︸ ︷︷ ︸joint

= P (G|I)︸ ︷︷ ︸conditional

∗ P (I)︸ ︷︷ ︸marginal

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 34: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

11/85

G P (G|I = H)

A 0.6B 0.3C 0.1

G P (G|I = L)

A 0.3B 0.4C 0.3

Conditional Distribution

Consider two random variable G (grade) and I (intel-legence)

Suppose we are given the value of I (say, I = H) thenthe conditional distribution P (G|I) is defined as

P (G = g|I = H) =P (G = g, I = H)

P (I = H)∀g ∈ A,B,C

More compactly defined as

P (G|I) =P (G, I)

P (I)

or P (G, I)︸ ︷︷ ︸joint

= P (G|I)︸ ︷︷ ︸conditional

∗ P (I)︸ ︷︷ ︸marginal

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 35: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

12/85

X1 . . . Xn P (X1, X2, . . . , Xn)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . ∑= 1

Joint Distribution (n random variables)

The joint distribution of n random variablesassigns probabilities to all events involving then random variables,

In other words it assigns

P (X1 = x1, X2 = x2, ..., Xn = xn)

for all possible values that variable Xi can take

If each random variableXi can take two valuesthen the joint distribution will assign probab-ilities to the 2n possible events

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 36: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

12/85

X1 . . . Xn P (X1, X2, . . . , Xn)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . ∑= 1

Joint Distribution (n random variables)

The joint distribution of n random variablesassigns probabilities to all events involving then random variables,

In other words it assigns

P (X1 = x1, X2 = x2, ..., Xn = xn)

for all possible values that variable Xi can take

If each random variableXi can take two valuesthen the joint distribution will assign probab-ilities to the 2n possible events

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 37: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

12/85

X1 . . . Xn P (X1, X2, . . . , Xn)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . ∑= 1

Joint Distribution (n random variables)

The joint distribution of n random variablesassigns probabilities to all events involving then random variables,

In other words it assigns

P (X1 = x1, X2 = x2, ..., Xn = xn)

for all possible values that variable Xi can take

If each random variableXi can take two valuesthen the joint distribution will assign probab-ilities to the 2n possible events

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 38: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

13/85

X1 . . . Xn P (X1, X2, . . . , Xn)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

Joint Distribution (n random variables)

The joint distribution over two random vari-ables X1 and X2 can be written as,

P (X1, X2) = P (X2|X1)P (X1) = P (X1|X2)P (X2)

Similarly for n random variables

P (X1, X2, ..., Xn)

= P (X2, ...,Xn|X1)P (X1)

= P (X3, ...,Xn|X1, X2)P (X2|X1)P (X1)

= P (X4, ...,Xn|X1, X2, X3)P (X3|X2, X1)

P (X2|X1)P (X1)

= P (X1)n∏i=2

P (Xi|Xi−11 ) (chain rule)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 39: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

13/85

X1 . . . Xn P (X1, X2, . . . , Xn)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

Joint Distribution (n random variables)

The joint distribution over two random vari-ables X1 and X2 can be written as,

P (X1, X2) = P (X2|X1)P (X1) = P (X1|X2)P (X2)

Similarly for n random variables

P (X1, X2, ..., Xn)

= P (X2, ..., Xn|X1)P (X1)

= P (X3, ..., Xn|X1, X2)P (X2|X1)P (X1)

= P (X4, ..., Xn|X1, X2, X3)P (X3|X2, X1)

P (X2|X1)P (X1)

= P (X1)

n∏i=2

P (Xi|Xi−11 ) (chain rule)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 40: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

13/85

X1 . . . Xn P (X1, X2, . . . , Xn)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

Joint Distribution (n random variables)

The joint distribution over two random vari-ables X1 and X2 can be written as,

P (X1, X2) = P (X2|X1)P (X1) = P (X1|X2)P (X2)

Similarly for n random variables

P (X1, X2, ..., Xn)

= P (X2, ..., Xn|X1)P (X1)

= P (X3, ..., Xn|X1, X2)P (X2|X1)P (X1)

= P (X4, ..., Xn|X1, X2, X3)P (X3|X2, X1)

P (X2|X1)P (X1)

= P (X1)

n∏i=2

P (Xi|Xi−11 ) (chain rule)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 41: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

13/85

X1 . . . Xn P (X1, X2, . . . , Xn)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

Joint Distribution (n random variables)

The joint distribution over two random vari-ables X1 and X2 can be written as,

P (X1, X2) = P (X2|X1)P (X1) = P (X1|X2)P (X2)

Similarly for n random variables

P (X1, X2, ..., Xn)

= P (X2, ..., Xn|X1)P (X1)

= P (X3, ..., Xn|X1, X2)P (X2|X1)P (X1)

= P (X4, ..., Xn|X1, X2, X3)P (X3|X2, X1)

P (X2|X1)P (X1)

= P (X1)

n∏i=2

P (Xi|Xi−11 ) (chain rule)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 42: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

13/85

X1 . . . Xn P (X1, X2, . . . , Xn)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

Joint Distribution (n random variables)

The joint distribution over two random vari-ables X1 and X2 can be written as,

P (X1, X2) = P (X2|X1)P (X1) = P (X1|X2)P (X2)

Similarly for n random variables

P (X1, X2, ..., Xn)

= P (X2, ..., Xn|X1)P (X1)

= P (X3, ..., Xn|X1, X2)P (X2|X1)P (X1)

= P (X4, ..., Xn|X1, X2, X3)P (X3|X2, X1)

P (X2|X1)P (X1)

= P (X1)

n∏i=2

P (Xi|Xi−11 ) (chain rule)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 43: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

13/85

X1 . . . Xn P (X1, X2, . . . , Xn)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

Joint Distribution (n random variables)

The joint distribution over two random vari-ables X1 and X2 can be written as,

P (X1, X2) = P (X2|X1)P (X1) = P (X1|X2)P (X2)

Similarly for n random variables

P (X1, X2, ..., Xn)

= P (X2, ..., Xn|X1)P (X1)

= P (X3, ..., Xn|X1, X2)P (X2|X1)P (X1)

= P (X4, ..., Xn|X1, X2, X3)P (X3|X2, X1)

P (X2|X1)P (X1)

= P (X1)

n∏i=2

P (Xi|Xi−11 ) (chain rule)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 44: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

14/85

A B P (A = a,B = b)

High High 0.3High Low 0.25Low High 0.35Low Low 0.1

A P (A = a)

High 0.55Low 0.45

B P (B = a)

High 0.65Low 0.35

From Joint Distributions to MarginalDistributions

Suppose we are given a joint distribtion overtwo random variables A, B

The marginal distributions of A and B can becomputed as

P (A = a) =∑∀bP (A = a,B = b)

P (B = b) =∑∀a

P (A = a,B = b)

More compactly written as

P (A) =∑B

P (A,B)

P (B) =∑A

P (A,B)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 45: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

14/85

A B P (A = a,B = b)

High High 0.3High Low 0.25Low High 0.35Low Low 0.1

A P (A = a)

High 0.55Low 0.45

B P (B = a)

High 0.65Low 0.35

From Joint Distributions to MarginalDistributions

Suppose we are given a joint distribtion overtwo random variables A, B

The marginal distributions of A and B can becomputed as

P (A = a) =∑∀bP (A = a,B = b)

P (B = b) =∑∀a

P (A = a,B = b)

More compactly written as

P (A) =∑B

P (A,B)

P (B) =∑A

P (A,B)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 46: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

14/85

A B P (A = a,B = b)

High High 0.3High Low 0.25Low High 0.35Low Low 0.1

A P (A = a)

High 0.55Low 0.45

B P (B = a)

High 0.65Low 0.35

From Joint Distributions to MarginalDistributions

Suppose we are given a joint distribtion overtwo random variables A, B

The marginal distributions of A and B can becomputed as

P (A = a) =∑∀bP (A = a,B = b)

P (B = b) =∑∀a

P (A = a,B = b)

More compactly written as

P (A) =∑B

P (A,B)

P (B) =∑A

P (A,B)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 47: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

15/85

A B P (A = a,B = b)

High High 0.3High Low 0.25Low High 0.35Low Low 0.1

A P (A = a)

High 0.55Low 0.45

B P (B = a)

High 0.65Low 0.35

What if there are n random variables ?

Suppose we are given a joint distribtion overn random variables X1, X2, ..., Xn

The marginal distributions over X1 can becomputed as

P (X1 = x1)

=∑

∀x2,x3,...,xn

P (X1 = x1, X2 = x2, ..., Xn = xn)

More compactly written as

P (X1) =∑

X2,X3,...,Xn

P (X1, X2, ..., Xn)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 48: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

15/85

A B P (A = a,B = b)

High High 0.3High Low 0.25Low High 0.35Low Low 0.1

A P (A = a)

High 0.55Low 0.45

B P (B = a)

High 0.65Low 0.35

What if there are n random variables ?

Suppose we are given a joint distribtion overn random variables X1, X2, ..., Xn

The marginal distributions over X1 can becomputed as

P (X1 = x1)

=∑

∀x2,x3,...,xn

P (X1 = x1, X2 = x2, ..., Xn = xn)

More compactly written as

P (X1) =∑

X2,X3,...,Xn

P (X1, X2, ..., Xn)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 49: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

15/85

A B P (A = a,B = b)

High High 0.3High Low 0.25Low High 0.35Low Low 0.1

A P (A = a)

High 0.55Low 0.45

B P (B = a)

High 0.65Low 0.35

What if there are n random variables ?

Suppose we are given a joint distribtion overn random variables X1, X2, ..., Xn

The marginal distributions over X1 can becomputed as

P (X1 = x1)

=∑

∀x2,x3,...,xn

P (X1 = x1, X2 = x2, ..., Xn = xn)

More compactly written as

P (X1) =∑

X2,X3,...,Xn

P (X1, X2, ..., Xn)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 50: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

16/85

Recall that by Chain Rule ofProbability

P (X,Y ) = P (X)P (Y |X)

However, if X and Y are in-dependent, then

P (X,Y ) = P (X)P (Y )

Conditional Independence

Two random variables X and Y are said to beindependent if

P (X|Y ) = P (X)

We denote this as X ⊥⊥ YIn other words, knowing the value of Y doesnot change our belief about X

We would expect Grade to be dependent onIntelligence but independent of Weight

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 51: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

16/85

Recall that by Chain Rule ofProbability

P (X,Y ) = P (X)P (Y |X)

However, if X and Y are in-dependent, then

P (X,Y ) = P (X)P (Y )

Conditional Independence

Two random variables X and Y are said to beindependent if

P (X|Y ) = P (X)

We denote this as X ⊥⊥ Y

In other words, knowing the value of Y doesnot change our belief about X

We would expect Grade to be dependent onIntelligence but independent of Weight

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 52: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

16/85

Recall that by Chain Rule ofProbability

P (X,Y ) = P (X)P (Y |X)

However, if X and Y are in-dependent, then

P (X,Y ) = P (X)P (Y )

Conditional Independence

Two random variables X and Y are said to beindependent if

P (X|Y ) = P (X)

We denote this as X ⊥⊥ YIn other words, knowing the value of Y doesnot change our belief about X

We would expect Grade to be dependent onIntelligence but independent of Weight

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 53: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

16/85

Recall that by Chain Rule ofProbability

P (X,Y ) = P (X)P (Y |X)

However, if X and Y are in-dependent, then

P (X,Y ) = P (X)P (Y )

Conditional Independence

Two random variables X and Y are said to beindependent if

P (X|Y ) = P (X)

We denote this as X ⊥⊥ YIn other words, knowing the value of Y doesnot change our belief about X

We would expect Grade to be dependent onIntelligence but independent of Weight

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 54: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

16/85

Recall that by Chain Rule ofProbability

P (X,Y ) = P (X)P (Y |X)

However, if X and Y are in-dependent, then

P (X,Y ) = P (X)P (Y )

Conditional Independence

Two random variables X and Y are said to beindependent if

P (X|Y ) = P (X)

We denote this as X ⊥⊥ YIn other words, knowing the value of Y doesnot change our belief about X

We would expect Grade to be dependent onIntelligence but independent of Weight

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 55: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

16/85

Recall that by Chain Rule ofProbability

P (X,Y ) = P (X)P (Y |X)

However, if X and Y are in-dependent, then

P (X,Y ) = P (X)P (Y )

Conditional Independence

Two random variables X and Y are said to beindependent if

P (X|Y ) = P (X)

We denote this as X ⊥⊥ YIn other words, knowing the value of Y doesnot change our belief about X

We would expect Grade to be dependent onIntelligence but independent of Weight

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 56: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

17/85

Okay, we are now ready to move on to Bayesian Networks or Directed GraphicalModels

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 57: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

18/85

Module 17.1: Why are we interested in JointDistributions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 58: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

19/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

In many real world applications, wehave to deal with a large number ofrandom variables

For example, an oil company may beinterested in computing the probabil-ity of finding oil at a particular loca-tion

This may depend on various (ran-dom) variables

The company is interested in knowingthe joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 59: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

19/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

In many real world applications, wehave to deal with a large number ofrandom variables

For example, an oil company may beinterested in computing the probabil-ity of finding oil at a particular loca-tion

This may depend on various (ran-dom) variables

The company is interested in knowingthe joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 60: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

19/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

In many real world applications, wehave to deal with a large number ofrandom variables

For example, an oil company may beinterested in computing the probabil-ity of finding oil at a particular loca-tion

This may depend on various (ran-dom) variables

The company is interested in knowingthe joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 61: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

19/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

In many real world applications, wehave to deal with a large number ofrandom variables

For example, an oil company may beinterested in computing the probabil-ity of finding oil at a particular loca-tion

This may depend on various (ran-dom) variables

The company is interested in knowingthe joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 62: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

20/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

But why joint distribution?

Aren’t we just interested inP (Y |X1, X2, ..., Xn)?

Well, if we know the joint distribu-tion, we can find answers to a bunchof interesting questions

Let us see some such questions of in-terest

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 63: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

20/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

But why joint distribution?

Aren’t we just interested inP (Y |X1, X2, ..., Xn)?

Well, if we know the joint distribu-tion, we can find answers to a bunchof interesting questions

Let us see some such questions of in-terest

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 64: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

20/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

But why joint distribution?

Aren’t we just interested inP (Y |X1, X2, ..., Xn)?

Well, if we know the joint distribu-tion, we can find answers to a bunchof interesting questions

Let us see some such questions of in-terest

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 65: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

20/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

But why joint distribution?

Aren’t we just interested inP (Y |X1, X2, ..., Xn)?

Well, if we know the joint distribu-tion, we can find answers to a bunchof interesting questions

Let us see some such questions of in-terest

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 66: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

21/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

We can find the conditional distribution

P (Y |X1, ..., Xn) =P (Y,X1, ..., Xn)∑

X1,...,XnP (Y,X1, ..., Xn)

We can find the marginal distribution,

P (Y ) =∑

X1,...,Xn

P (Y,X1, X2, ..., Xn)

We can find the conditional independencies,

P (Y,X1) = P (Y )P (X1)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 67: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

21/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

We can find the conditional distribution

P (Y |X1, ..., Xn) =P (Y,X1, ..., Xn)∑

X1,...,XnP (Y,X1, ..., Xn)

We can find the marginal distribution,

P (Y ) =∑

X1,...,Xn

P (Y,X1, X2, ..., Xn)

We can find the conditional independencies,

P (Y,X1) = P (Y )P (X1)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 68: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

21/85

Y

Oil

Salinity

X1

Pressure

X2

Depth

X3

Biodiversity

X4

Temperature

X5

Density

X6

P (Y,X1, X2, X3, X4, X5, X6)

We can find the conditional distribution

P (Y |X1, ..., Xn) =P (Y,X1, ..., Xn)∑

X1,...,XnP (Y,X1, ..., Xn)

We can find the marginal distribution,

P (Y ) =∑

X1,...,Xn

P (Y,X1, X2, ..., Xn)

We can find the conditional independencies,

P (Y,X1) = P (Y )P (X1)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 69: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

22/85

Module 17.2: How do we represent a joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 70: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

23/85

Y (yes/no)

Oil

Salinity

X1 (high/low)

Pressure

X2 (high/low)

Depth

X3 (deep/shallow)

Biodiversity

X4 (high/low)

Temperature

X5 (high/low)

Density

X6 (high/low)

P (Y,X1, X2, X3, X4, X5, X6)

Let us return to the case of n randomvariables

For simplicity assume each of thesevariables can take binary values

To specify the joint distribution, weneed to specify 2n − 1 values. Whynot (2n)?

If we specify these 2n − 1 values, wehave an explicit representation for thejoint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 71: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

23/85

Y (yes/no)

Oil

Salinity

X1 (high/low)

Pressure

X2 (high/low)

Depth

X3 (deep/shallow)

Biodiversity

X4 (high/low)

Temperature

X5 (high/low)

Density

X6 (high/low)

P (Y,X1, X2, X3, X4, X5, X6)

Let us return to the case of n randomvariables

For simplicity assume each of thesevariables can take binary values

To specify the joint distribution, weneed to specify 2n − 1 values. Whynot (2n)?

If we specify these 2n − 1 values, wehave an explicit representation for thejoint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 72: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

23/85

Y (yes/no)

Oil

Salinity

X1 (high/low)

Pressure

X2 (high/low)

Depth

X3 (deep/shallow)

Biodiversity

X4 (high/low)

Temperature

X5 (high/low)

Density

X6 (high/low)

P (Y,X1, X2, X3, X4, X5, X6)

Let us return to the case of n randomvariables

For simplicity assume each of thesevariables can take binary values

To specify the joint distribution, weneed to specify 2n − 1 values. Whynot (2n)?

If we specify these 2n − 1 values, wehave an explicit representation for thejoint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 73: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

23/85

Y (yes/no)

Oil

Salinity

X1 (high/low)

Pressure

X2 (high/low)

Depth

X3 (deep/shallow)

Biodiversity

X4 (high/low)

Temperature

X5 (high/low)

Density

X6 (high/low)

P (Y,X1, X2, X3, X4, X5, X6)

Let us return to the case of n randomvariables

For simplicity assume each of thesevariables can take binary values

To specify the joint distribution, weneed to specify 2n − 1 values. Whynot (2n)?

If we specify these 2n − 1 values, wehave an explicit representation for thejoint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 74: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

24/85

X1 X2 X3 X4 ... Xn P

0 0 0 0 ... 0 0.011 0 0 0 ... 0 0.030 1 0 0 ... 0 0.051 1 0 0 ... 0 0.1

...

...

...1 1 1 1 ... 1 0.002

(Once the first 2n − 1 values are specifiedthe last value is deterministic as thevalues need to sum to 1)

Challenges with explicitrepresentation

Computational: Expensive to ma-nipulate and too large to to store

Cognitive: Impossible to acquire somany numbers from a human

Statistical: Need huge amounts ofdata to learn the parameters

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 75: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

24/85

X1 X2 X3 X4 ... Xn P

0 0 0 0 ... 0 0.011 0 0 0 ... 0 0.030 1 0 0 ... 0 0.051 1 0 0 ... 0 0.1

...

...

...1 1 1 1 ... 1 0.002

(Once the first 2n − 1 values are specifiedthe last value is deterministic as thevalues need to sum to 1)

Challenges with explicitrepresentation

Computational: Expensive to ma-nipulate and too large to to store

Cognitive: Impossible to acquire somany numbers from a human

Statistical: Need huge amounts ofdata to learn the parameters

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 76: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

24/85

X1 X2 X3 X4 ... Xn P

0 0 0 0 ... 0 0.011 0 0 0 ... 0 0.030 1 0 0 ... 0 0.051 1 0 0 ... 0 0.1

...

...

...1 1 1 1 ... 1 0.002

(Once the first 2n − 1 values are specifiedthe last value is deterministic as thevalues need to sum to 1)

Challenges with explicitrepresentation

Computational: Expensive to ma-nipulate and too large to to store

Cognitive: Impossible to acquire somany numbers from a human

Statistical: Need huge amounts ofdata to learn the parameters

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 77: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

24/85

X1 X2 X3 X4 ... Xn P

0 0 0 0 ... 0 0.011 0 0 0 ... 0 0.030 1 0 0 ... 0 0.051 1 0 0 ... 0 0.1

...

...

...1 1 1 1 ... 1 0.002

(Once the first 2n − 1 values are specifiedthe last value is deterministic as thevalues need to sum to 1)

Challenges with explicitrepresentation

Computational: Expensive to ma-nipulate and too large to to store

Cognitive: Impossible to acquire somany numbers from a human

Statistical: Need huge amounts ofdata to learn the parameters

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 78: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

25/85

Module 17.3: Can we represent the joint distributionmore compactly?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 79: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

26/85

I S P (I, S)

0 0 0.6650 1 0.0351 0 0.061 1 0.24

This distribution has (22 − 1 = 3)parameters.

Alternatively, the table has 4 rowsbut the last row is deterministicgiven the first 3 rows (or parameters)

Consider the case of two random vari-ables, Intelligence (I) and SAT Scores(S)

Assume that both are binary and takevalues from High(1), Low(0)

Here is one way of specifying the jointdistribution

Of course, there are many such jointdistributions possible

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 80: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

26/85

I S P (I, S)

0 0 0.6650 1 0.0351 0 0.061 1 0.24

This distribution has (22 − 1 = 3)parameters.

Alternatively, the table has 4 rowsbut the last row is deterministicgiven the first 3 rows (or parameters)

Consider the case of two random vari-ables, Intelligence (I) and SAT Scores(S)

Assume that both are binary and takevalues from High(1), Low(0)

Here is one way of specifying the jointdistribution

Of course, there are many such jointdistributions possible

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 81: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

26/85

I S P (I, S)

0 0 0.6650 1 0.0351 0 0.061 1 0.24

This distribution has (22 − 1 = 3)parameters.

Alternatively, the table has 4 rowsbut the last row is deterministicgiven the first 3 rows (or parameters)

Consider the case of two random vari-ables, Intelligence (I) and SAT Scores(S)

Assume that both are binary and takevalues from High(1), Low(0)

Here is one way of specifying the jointdistribution

Of course, there are many such jointdistributions possible

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 82: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

26/85

I S P (I, S)

0 0 0.6650 1 0.0351 0 0.061 1 0.24

This distribution has (22 − 1 = 3)parameters.

Alternatively, the table has 4 rowsbut the last row is deterministicgiven the first 3 rows (or parameters)

Consider the case of two random vari-ables, Intelligence (I) and SAT Scores(S)

Assume that both are binary and takevalues from High(1), Low(0)

Here is one way of specifying the jointdistribution

Of course, there are many such jointdistributions possible

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 83: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

26/85

I S P (I, S)

0 0 0.6650 1 0.0351 0 0.061 1 0.24

This distribution has (22 − 1 = 3)parameters.

Alternatively, the table has 4 rowsbut the last row is deterministicgiven the first 3 rows (or parameters)

Consider the case of two random vari-ables, Intelligence (I) and SAT Scores(S)

Assume that both are binary and takevalues from High(1), Low(0)

Here is one way of specifying the jointdistribution

Of course, there are many such jointdistributions possible

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 84: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

26/85

I S P (I, S)

0 0 0.6650 1 0.0351 0 0.061 1 0.24

This distribution has (22 − 1 = 3)parameters.

Alternatively, the table has 4 rowsbut the last row is deterministicgiven the first 3 rows (or parameters)

Consider the case of two random vari-ables, Intelligence (I) and SAT Scores(S)

Assume that both are binary and takevalues from High(1), Low(0)

Here is one way of specifying the jointdistribution

Of course, there are many such jointdistributions possible

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 85: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

27/85

i = 0 i = 1

P (I) 0.7 0.3

no.of parameters=1

s = 0 s = 1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8

no.of parameters=2

What! So from 3 parameters we havegone to 6 parameters?

Well, not really! (remember sum foreach row in the above table has to be1)

The number of parameters is still 3

Note that there is a natural orderingin these two random variables

The SAT Score (S) presumably de-pends upon the Intelligence (I). Analternate and even more natural wayto represent the same distribution is

P (I, S) = P (I)× P (S|I)

Instead of specifying the 4 entries inP (I, S), we can specify 2 entries forP (I) and 4 entries for P (S|I)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 86: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

27/85

i = 0 i = 1

P (I) 0.7 0.3

no.of parameters=1

s = 0 s = 1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8

no.of parameters=2

What! So from 3 parameters we havegone to 6 parameters?

Well, not really! (remember sum foreach row in the above table has to be1)

The number of parameters is still 3

Note that there is a natural orderingin these two random variables

The SAT Score (S) presumably de-pends upon the Intelligence (I). Analternate and even more natural wayto represent the same distribution is

P (I, S) = P (I)× P (S|I)

Instead of specifying the 4 entries inP (I, S), we can specify 2 entries forP (I) and 4 entries for P (S|I)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 87: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

27/85

i = 0 i = 1

P (I) 0.7 0.3

no.of parameters=1

s = 0 s = 1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8

no.of parameters=2

What! So from 3 parameters we havegone to 6 parameters?

Well, not really! (remember sum foreach row in the above table has to be1)

The number of parameters is still 3

Note that there is a natural orderingin these two random variables

The SAT Score (S) presumably de-pends upon the Intelligence (I). Analternate and even more natural wayto represent the same distribution is

P (I, S) = P (I)× P (S|I)

Instead of specifying the 4 entries inP (I, S), we can specify 2 entries forP (I) and 4 entries for P (S|I)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 88: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

27/85

i = 0 i = 1

P (I) 0.7 0.3

no.of parameters=1

s = 0 s = 1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8

no.of parameters=2

What! So from 3 parameters we havegone to 6 parameters?

Well, not really! (remember sum foreach row in the above table has to be1)

The number of parameters is still 3

Note that there is a natural orderingin these two random variables

The SAT Score (S) presumably de-pends upon the Intelligence (I). Analternate and even more natural wayto represent the same distribution is

P (I, S) = P (I)× P (S|I)

Instead of specifying the 4 entries inP (I, S), we can specify 2 entries forP (I) and 4 entries for P (S|I)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 89: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

27/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s = 0 s = 1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

What! So from 3 parameters we havegone to 6 parameters?

Well, not really! (remember sum foreach row in the above table has to be1)

The number of parameters is still 3

Note that there is a natural orderingin these two random variables

The SAT Score (S) presumably de-pends upon the Intelligence (I). Analternate and even more natural wayto represent the same distribution is

P (I, S) = P (I)× P (S|I)

Instead of specifying the 4 entries inP (I, S), we can specify 2 entries forP (I) and 4 entries for P (S|I)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 90: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

27/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s = 0 s = 1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

What! So from 3 parameters we havegone to 6 parameters?

Well, not really! (remember sum foreach row in the above table has to be1)

The number of parameters is still 3

Note that there is a natural orderingin these two random variables

The SAT Score (S) presumably de-pends upon the Intelligence (I). Analternate and even more natural wayto represent the same distribution is

P (I, S) = P (I)× P (S|I)

Instead of specifying the 4 entries inP (I, S), we can specify 2 entries forP (I) and 4 entries for P (S|I)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 91: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

28/85

i=0 i=1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

What have we achieved so far?

We were not able to reduce the num-ber of parameters

But, we have a more natural way ofrepresenting the distribution

This is known as conditional para-meterization

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 92: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

28/85

i=0 i=1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

What have we achieved so far?

We were not able to reduce the num-ber of parameters

But, we have a more natural way ofrepresenting the distribution

This is known as conditional para-meterization

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 93: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

28/85

i=0 i=1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

What have we achieved so far?

We were not able to reduce the num-ber of parameters

But, we have a more natural way ofrepresenting the distribution

This is known as conditional para-meterization

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 94: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

28/85

i=0 i=1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

What have we achieved so far?

We were not able to reduce the num-ber of parameters

But, we have a more natural way ofrepresenting the distribution

This is known as conditional para-meterization

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 95: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

29/85

Intelligence

GradeSAT

Now consider a third random variableGrade (G)

Notice that none of these 3 variablesare independent of each other

Grade and SAT Score are clearly cor-related with Intelligence

Grade and SAT Score are also correl-ated because we would expect

P (G = 1|S = 1) > P (G = 1|S = 0)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 96: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

29/85

Intelligence

GradeSAT

Now consider a third random variableGrade (G)

Notice that none of these 3 variablesare independent of each other

Grade and SAT Score are clearly cor-related with Intelligence

Grade and SAT Score are also correl-ated because we would expect

P (G = 1|S = 1) > P (G = 1|S = 0)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 97: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

29/85

Intelligence

GradeSAT

Now consider a third random variableGrade (G)

Notice that none of these 3 variablesare independent of each other

Grade and SAT Score are clearly cor-related with Intelligence

Grade and SAT Score are also correl-ated because we would expect

P (G = 1|S = 1) > P (G = 1|S = 0)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 98: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

29/85

Intelligence

GradeSAT

Now consider a third random variableGrade (G)

Notice that none of these 3 variablesare independent of each other

Grade and SAT Score are clearly cor-related with Intelligence

Grade and SAT Score are also correl-ated because we would expect

P (G = 1|S = 1) > P (G = 1|S = 0)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 99: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

30/85

Intelligence

GradeSAT

However, it is possible that the dis-tribution satisfies a conditional inde-pendence

If we know that I = H, then it ispossible that S = H does not give anyextra information for determining G

In other words, if we know that thestudent is intelligent we can make in-ferences about his grade without evenknowing the SAT score

Formally, we assume that (S ⊥ G|I)

Note that this is just an assumption

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 100: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

30/85

Intelligence

GradeSAT

However, it is possible that the dis-tribution satisfies a conditional inde-pendence

If we know that I = H, then it ispossible that S = H does not give anyextra information for determining G

In other words, if we know that thestudent is intelligent we can make in-ferences about his grade without evenknowing the SAT score

Formally, we assume that (S ⊥ G|I)

Note that this is just an assumption

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 101: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

30/85

Intelligence

GradeSAT

However, it is possible that the dis-tribution satisfies a conditional inde-pendence

If we know that I = H, then it ispossible that S = H does not give anyextra information for determining G

In other words, if we know that thestudent is intelligent we can make in-ferences about his grade without evenknowing the SAT score

Formally, we assume that (S ⊥ G|I)

Note that this is just an assumption

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 102: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

30/85

Intelligence

GradeSAT

However, it is possible that the dis-tribution satisfies a conditional inde-pendence

If we know that I = H, then it ispossible that S = H does not give anyextra information for determining G

In other words, if we know that thestudent is intelligent we can make in-ferences about his grade without evenknowing the SAT score

Formally, we assume that (S ⊥ G|I)

Note that this is just an assumption

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 103: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

30/85

Intelligence

GradeSAT

However, it is possible that the dis-tribution satisfies a conditional inde-pendence

If we know that I = H, then it ispossible that S = H does not give anyextra information for determining G

In other words, if we know that thestudent is intelligent we can make in-ferences about his grade without evenknowing the SAT score

Formally, we assume that (S ⊥ G|I)

Note that this is just an assumption

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 104: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

31/85

Intelligence

GradeSAT

We could argue that in many casesS 6⊥ G|I

For example, a student might be in-telligent, but we also have to factor inhis/her ability to write in time boundexams

In which case S and G are not in-dependent given I (because the SATscore tells us about the ability towrite time bound exams)

But, for this discussion, we will as-sume S ⊥ G|I

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 105: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

31/85

Intelligence

GradeSAT

We could argue that in many casesS 6⊥ G|I

For example, a student might be in-telligent, but we also have to factor inhis/her ability to write in time boundexams

In which case S and G are not in-dependent given I (because the SATscore tells us about the ability towrite time bound exams)

But, for this discussion, we will as-sume S ⊥ G|I

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 106: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

31/85

Intelligence

GradeSAT

We could argue that in many casesS 6⊥ G|I

For example, a student might be in-telligent, but we also have to factor inhis/her ability to write in time boundexams

In which case S and G are not in-dependent given I (because the SATscore tells us about the ability towrite time bound exams)

But, for this discussion, we will as-sume S ⊥ G|I

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 107: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

31/85

Intelligence

GradeSAT

We could argue that in many casesS 6⊥ G|I

For example, a student might be in-telligent, but we also have to factor inhis/her ability to write in time boundexams

In which case S and G are not in-dependent given I (because the SATscore tells us about the ability towrite time bound exams)

But, for this discussion, we will as-sume S ⊥ G|I

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 108: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

32/85

Question

Now let’s see the implication of this assumption

Does it simplify things in any way?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 109: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

32/85

Question

Now let’s see the implication of this assumption

Does it simplify things in any way?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 110: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 111: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 112: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 113: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 114: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 115: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 116: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 117: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3

no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 118: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 119: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8

no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 120: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 121: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09

no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 122: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 123: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

33/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

How many parameters do we need tospecify P (I,G, S)?

(2× 2× 3− 1 = 11)

What if we use conditional paramet-erization by following the chain rule?

P (I,G, S) = P (S,G|I)P (I)

= P (S|G, I)P (G|I)P (I)

= P (S|I)P (G|I)P (I)

since (S ⊥ G|I)

We need the following distributions tofully specify the joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 124: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

34/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

The alternate parameterization ismore natural than that of the jointdistribution

The alternate parameterization ismore compact than that of the jointdistribution

The alternate parameterization ismore modular. (When we added G,we could just reuse the tables for P (I)and P (S|I))

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 125: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

34/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

The alternate parameterization ismore natural than that of the jointdistribution

The alternate parameterization ismore compact than that of the jointdistribution

The alternate parameterization ismore modular. (When we added G,we could just reuse the tables for P (I)and P (S|I))

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 126: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

34/85

i = 0 i = 1

P (I) 0.7 0.3no.of parameters=1

s=0 s=1

P (S|I = 0) 0.95 0.05

P (S|I = 1) 0.2 0.8no.of parameters=2

g=A g=B g=C

P(G—I=0) 0.2 0.34 0.46

P(G—I=1) 0.74 0.17 0.09no.of parameters=4

total no.of parameters=7

The alternate parameterization ismore natural than that of the jointdistribution

The alternate parameterization ismore compact than that of the jointdistribution

The alternate parameterization ismore modular. (When we added G,we could just reuse the tables for P (I)and P (S|I))

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 127: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

35/85

Module 17.4: Can we use a graph to represent a jointdistribution?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 128: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

36/85

C

X1 X2. . .X3 Xn

This is called the Naive Bayes model

It makes the Naive assumption thatnC2 pairs are independent given C

Suppose we have n random variables,all of which are independent given an-other random variable C

The joint distribution factorizes as,

P (C,X1, ...,Xn) = P (C)P (X1|C)

P (X2|X1, C)

P (X3|X2, X1, C)...

= P (C)

n∏i=1

P (Xi|C)

since Xi ⊥ Xj |C

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 129: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

36/85

C

X1 X2. . .X3 Xn

This is called the Naive Bayes model

It makes the Naive assumption thatnC2 pairs are independent given C

Suppose we have n random variables,all of which are independent given an-other random variable C

The joint distribution factorizes as,

P (C,X1, ..., Xn) = P (C)P (X1|C)

P (X2|X1, C)

P (X3|X2, X1, C)...

= P (C)

n∏i=1

P (Xi|C)

since Xi ⊥ Xj |C

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 130: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

36/85

C

X1 X2. . .X3 Xn

This is called the Naive Bayes model

It makes the Naive assumption thatnC2 pairs are independent given C

Suppose we have n random variables,all of which are independent given an-other random variable C

The joint distribution factorizes as,

P (C,X1, ..., Xn) = P (C)P (X1|C)

P (X2|X1, C)

P (X3|X2, X1, C)...

= P (C)

n∏i=1

P (Xi|C)

since Xi ⊥ Xj |C

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 131: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

36/85

C

X1 X2. . .X3 Xn

This is called the Naive Bayes model

It makes the Naive assumption thatnC2 pairs are independent given C

Suppose we have n random variables,all of which are independent given an-other random variable C

The joint distribution factorizes as,

P (C,X1, ..., Xn) = P (C)P (X1|C)

P (X2|X1, C)

P (X3|X2, X1, C)...

= P (C)

n∏i=1

P (Xi|C)

since Xi ⊥ Xj |C

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 132: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

36/85

C

X1 X2. . .X3 Xn

This is called the Naive Bayes model

It makes the Naive assumption thatnC2 pairs are independent given C

Suppose we have n random variables,all of which are independent given an-other random variable C

The joint distribution factorizes as,

P (C,X1, ..., Xn) = P (C)P (X1|C)

P (X2|X1, C)

P (X3|X2, X1, C)...

= P (C)

n∏i=1

P (Xi|C)

since Xi ⊥ Xj |C

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 133: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

36/85

C

X1 X2. . .X3 Xn

This is called the Naive Bayes model

It makes the Naive assumption thatnC2 pairs are independent given C

Suppose we have n random variables,all of which are independent given an-other random variable C

The joint distribution factorizes as,

P (C,X1, ..., Xn) = P (C)P (X1|C)

P (X2|X1, C)

P (X3|X2, X1, C)...

= P (C)

n∏i=1

P (Xi|C)

since Xi ⊥ Xj |C

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 134: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

37/85

I

Intelligence

GGrade

I

Intelligence

G

Grade

S

SAT

C

X1 X2 . . . Xn

Bayesian networks build on the intu-itions that we developed for the NaiveBayes model

But they are not restricted to strong(naive) independence assumptions

We use graphs to represent the jointdistribution

Nodes: Random Variables

Edges: Indicate dependence

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 135: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

37/85

I

Intelligence

GGrade

I

Intelligence

G

Grade

S

SAT

C

X1 X2 . . . Xn

Bayesian networks build on the intu-itions that we developed for the NaiveBayes model

But they are not restricted to strong(naive) independence assumptions

We use graphs to represent the jointdistribution

Nodes: Random Variables

Edges: Indicate dependence

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 136: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

37/85

I

Intelligence

GGrade

I

Intelligence

G

Grade

S

SAT

C

X1 X2 . . . Xn

Bayesian networks build on the intu-itions that we developed for the NaiveBayes model

But they are not restricted to strong(naive) independence assumptions

We use graphs to represent the jointdistribution

Nodes: Random Variables

Edges: Indicate dependence

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 137: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

37/85

I

Intelligence

GGrade

I

Intelligence

G

Grade

S

SAT

C

X1 X2 . . . Xn

Bayesian networks build on the intu-itions that we developed for the NaiveBayes model

But they are not restricted to strong(naive) independence assumptions

We use graphs to represent the jointdistribution

Nodes: Random Variables

Edges: Indicate dependence

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 138: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

37/85

I

Intelligence

GGrade

I

Intelligence

G

Grade

S

SAT

C

X1 X2 . . . Xn

Bayesian networks build on the intu-itions that we developed for the NaiveBayes model

But they are not restricted to strong(naive) independence assumptions

We use graphs to represent the jointdistribution

Nodes: Random Variables

Edges: Indicate dependence

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 139: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

38/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Let’s revisit the student example

We will introduce a few more randomvariables and independence assump-tions

The grade now depends on student’sIntelligence & exam’s Difficulty level

The SAT score depends on Intelli-gence

The recommendation Letter from thecourse instructor depends on theGrade

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 140: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

38/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Let’s revisit the student example

We will introduce a few more randomvariables and independence assump-tions

The grade now depends on student’sIntelligence & exam’s Difficulty level

The SAT score depends on Intelli-gence

The recommendation Letter from thecourse instructor depends on theGrade

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 141: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

38/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Let’s revisit the student example

We will introduce a few more randomvariables and independence assump-tions

The grade now depends on student’sIntelligence & exam’s Difficulty level

The SAT score depends on Intelli-gence

The recommendation Letter from thecourse instructor depends on theGrade

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 142: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

38/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Let’s revisit the student example

We will introduce a few more randomvariables and independence assump-tions

The grade now depends on student’sIntelligence & exam’s Difficulty level

The SAT score depends on Intelli-gence

The recommendation Letter from thecourse instructor depends on theGrade

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 143: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

38/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Let’s revisit the student example

We will introduce a few more randomvariables and independence assump-tions

The grade now depends on student’sIntelligence & exam’s Difficulty level

The SAT score depends on Intelli-gence

The recommendation Letter from thecourse instructor depends on theGrade

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 144: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

39/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

The Bayesian network contains anode for each random variable

The edges denote the dependenciesbetween the random variables

Each variable depends directly on itsparents in the network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 145: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

39/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

The Bayesian network contains anode for each random variable

The edges denote the dependenciesbetween the random variables

Each variable depends directly on itsparents in the network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 146: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

39/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

The Bayesian network contains anode for each random variable

The edges denote the dependenciesbetween the random variables

Each variable depends directly on itsparents in the network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 147: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

40/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

The Bayesian network can be viewedas a data structure

It provides a skeleton for represent-ing a joint distribution compactly byfactorization

Let us see what this means

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 148: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

40/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

The Bayesian network can be viewedas a data structure

It provides a skeleton for represent-ing a joint distribution compactly byfactorization

Let us see what this means

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 149: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

40/85

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

The Bayesian network can be viewedas a data structure

It provides a skeleton for represent-ing a joint distribution compactly byfactorization

Let us see what this means

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 150: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

41/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Each node is associated with a localprobability model

Local, because it represents the de-pendencies of each variable on its par-ents

There are 5 such local probabilitymodels associated with the graph

Each variable (in general) is associ-ated with a conditional probabilitydistribution (conditional on its par-ents)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 151: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

41/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Each node is associated with a localprobability model

Local, because it represents the de-pendencies of each variable on its par-ents

There are 5 such local probabilitymodels associated with the graph

Each variable (in general) is associ-ated with a conditional probabilitydistribution (conditional on its par-ents)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 152: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

41/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Each node is associated with a localprobability model

Local, because it represents the de-pendencies of each variable on its par-ents

There are 5 such local probabilitymodels associated with the graph

Each variable (in general) is associ-ated with a conditional probabilitydistribution (conditional on its par-ents)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 153: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

41/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Each node is associated with a localprobability model

Local, because it represents the de-pendencies of each variable on its par-ents

There are 5 such local probabilitymodels associated with the graph

Each variable (in general) is associ-ated with a conditional probabilitydistribution (conditional on its par-ents)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 154: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

42/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

The graph gives us a natural factor-ization for the joint distribution

In this case,

P (I,D,G, S, L) = P (I)P (D)

P (G|I,D)P (S|I)P (L|G)

For example,

P (I = 1, D = 0, G = B,S = 1, L = 0)

= 0.3× 0.6× 0.08× 0.8× 0.4

The graph structure (nodes, edges)along with the conditional probabil-ity distribution is called a BayesianNetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 155: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

42/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

The graph gives us a natural factor-ization for the joint distribution

In this case,

P (I,D,G, S, L) = P (I)P (D)

P (G|I,D)P (S|I)P (L|G)

For example,

P (I = 1, D = 0, G = B,S = 1, L = 0)

= 0.3× 0.6× 0.08× 0.8× 0.4

The graph structure (nodes, edges)along with the conditional probabil-ity distribution is called a BayesianNetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 156: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

42/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

The graph gives us a natural factor-ization for the joint distribution

In this case,

P (I,D,G, S, L) = P (I)P (D)

P (G|I,D)P (S|I)P (L|G)

For example,

P (I = 1, D = 0, G = B,S = 1, L = 0)

= 0.3× 0.6× 0.08× 0.8× 0.4

The graph structure (nodes, edges)along with the conditional probabil-ity distribution is called a BayesianNetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 157: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

42/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

The graph gives us a natural factor-ization for the joint distribution

In this case,

P (I,D,G, S, L) = P (I)P (D)

P (G|I,D)P (S|I)P (L|G)

For example,

P (I = 1, D = 0, G = B,S = 1, L = 0)

= 0.3× 0.6× 0.08× 0.8× 0.4

The graph structure (nodes, edges)along with the conditional probabil-ity distribution is called a BayesianNetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 158: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

43/85

Module 17.5: Different types of reasoning in a Bayesiannetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 159: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

44/85

New Notations

We will denote P (I = 0) by P (i0)

In general, we will denote P (I = 0, D = 1, G = B,S = 1, L = 0) byP (i0, d1, gb, s1, l0)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 160: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

44/85

New Notations

We will denote P (I = 0) by P (i0)

In general, we will denote P (I = 0, D = 1, G = B,S = 1, L = 0) byP (i0, d1, gb, s1, l0)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 161: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

45/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

P (l1) =∑

I,D,G,S

P (I,D,G, S, l1)

=∑Iε(0,1)

∑Dε(0,1)

∑Gε(A,B,C)

∑Sε(0,1)

P (I,D,G, S, l1)

= 50.2%

Causal Reasoning

Here, we try to predict downstreameffects of various factors

Let us consider an example

What is the probability that a stu-dent will get a good recommendationletter, P (l1)?

P (l1) =∑Iε(0,1)

∑Dε(0,1)

∑Sε(0,1)

∑Gε(A,B,C)

P (I,D,G, S, l1)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 162: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

45/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

P (l1) =∑

I,D,G,S

P (I,D,G, S, l1)

=∑Iε(0,1)

∑Dε(0,1)

∑Gε(A,B,C)

∑Sε(0,1)

P (I,D,G, S, l1)

= 50.2%

Causal Reasoning

Here, we try to predict downstreameffects of various factors

Let us consider an example

What is the probability that a stu-dent will get a good recommendationletter, P (l1)?

P (l1) =∑Iε(0,1)

∑Dε(0,1)

∑Sε(0,1)

∑Gε(A,B,C)

P (I,D,G, S, l1)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 163: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

45/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

P (l1) =∑

I,D,G,S

P (I,D,G, S, l1)

=∑Iε(0,1)

∑Dε(0,1)

∑Gε(A,B,C)

∑Sε(0,1)

P (I,D,G, S, l1)

= 50.2%

Causal Reasoning

Here, we try to predict downstreameffects of various factors

Let us consider an example

What is the probability that a stu-dent will get a good recommendationletter, P (l1)?

P (l1) =∑Iε(0,1)

∑Dε(0,1)

∑Sε(0,1)

∑Gε(A,B,C)

P (I,D,G, S, l1)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 164: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

45/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

P (l1) =∑

I,D,G,S

P (I,D,G, S, l1)

=∑Iε(0,1)

∑Dε(0,1)

∑Gε(A,B,C)

∑Sε(0,1)

P (I,D,G, S, l1)

= 50.2%

Causal Reasoning

Here, we try to predict downstreameffects of various factors

Let us consider an example

What is the probability that a stu-dent will get a good recommendationletter, P (l1)?

P (l1) =∑Iε(0,1)

∑Dε(0,1)

∑Sε(0,1)

∑Gε(A,B,C)

P (I,D,G, S, l1)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 165: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

46/85

P (l1) =∑Iε(0,1)

∑Dε(0,1)

∑Sε(0,1)

∑Gε(A,B,C)

P (I,D,G, S, l1)

=∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D|I)∑Sε(0,1)

P (S|I,D)∑

Gε(A,B,C)

P (G|I,D, S).P (l1|G, I,D, S)

=∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑Sε(0,1)

P (S|I)∑

Gε(A,B,C)

P (G|I,D).P (l1|G)

D I

G S

L

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 166: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

46/85

P (l1) =∑Iε(0,1)

∑Dε(0,1)

∑Sε(0,1)

∑Gε(A,B,C)

P (I,D,G, S, l1)

=∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D|I)∑Sε(0,1)

P (S|I,D)∑

Gε(A,B,C)

P (G|I,D, S).P (l1|G, I,D, S)

=∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑Sε(0,1)

P (S|I)∑

Gε(A,B,C)

P (G|I,D).P (l1|G)

D I

G S

L

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 167: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

46/85

P (l1) =∑Iε(0,1)

∑Dε(0,1)

∑Sε(0,1)

∑Gε(A,B,C)

P (I,D,G, S, l1)

=∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D|I)∑Sε(0,1)

P (S|I,D)∑

Gε(A,B,C)

P (G|I,D, S).P (l1|G, I,D, S)

=∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑Sε(0,1)

P (S|I)∑

Gε(A,B,C)

P (G|I,D).P (l1|G)

D I

G S

L

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 168: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

47/85

P (l1) =∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑

Sε(0,1)

P (S|I)∑

Gε(A,B,C)

P (G|I,D)P (l1|G)

=∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑

Sε(0,1)

P (S|I)0.9(P (g1|I,D)) + 0.6(P (g2|I,D)) + 0.01(P (g3|I,D))

Similarly using the other tables, we can evaluate this equation

P (l1) = 0.502

D I

G S

L

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 169: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

47/85

P (l1) =∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑

Sε(0,1)

P (S|I)∑

Gε(A,B,C)

P (G|I,D)P (l1|G)

=∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑

Sε(0,1)

P (S|I)0.9(P (g1|I,D)) + 0.6(P (g2|I,D)) + 0.01(P (g3|I,D))

Similarly using the other tables, we can evaluate this equation

P (l1) = 0.502

D I

G S

L

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 170: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

47/85

P (l1) =∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑

Sε(0,1)

P (S|I)∑

Gε(A,B,C)

P (G|I,D)P (l1|G)

=∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑

Sε(0,1)

P (S|I)0.9(P (g1|I,D)) + 0.6(P (g2|I,D)) + 0.01(P (g3|I,D))

Similarly using the other tables, we can evaluate this equation

P (l1) = 0.502

D I

G S

L

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 171: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

47/85

P (l1) =∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑

Sε(0,1)

P (S|I)∑

Gε(A,B,C)

P (G|I,D)P (l1|G)

=∑Iε(0,1)

P (I)∑

Dε(0,1)

P (D)∑

Sε(0,1)

P (S|I)0.9(P (g1|I,D)) + 0.6(P (g2|I,D)) + 0.01(P (g3|I,D))

Similarly using the other tables, we can evaluate this equation

P (l1) = 0.502

D I

G S

L

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 172: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

48/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Causal Reasoning

Now what if we start adding inform-ation about the factors that could in-fluence l1

What if someone reveals that the stu-dent is not intelligent?

Intelligence will affect the score andhence the grade

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 173: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

48/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Causal Reasoning

Now what if we start adding inform-ation about the factors that could in-fluence l1

What if someone reveals that the stu-dent is not intelligent?

Intelligence will affect the score andhence the grade

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 174: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

48/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Causal Reasoning

Now what if we start adding inform-ation about the factors that could in-fluence l1

What if someone reveals that the stu-dent is not intelligent?

Intelligence will affect the score andhence the grade

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 175: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

49/85

P (l1|i0) = P (l1, i0)

P (i0)

P (l1, i0) =∑

D∈0,1

∑S∈0,1

∑G∈A,B,C

P (i0, D,G, S, l1)

=∑

D∈0,1

P (D)∑

S∈0,1

P (S|i0)∑

G∈A,B,C

P (G|D, i0)P (l1|G)

=∑

D∈0,1

P (D)∑

S∈0,1

P (S|i0)∑

G∈A,B,C

0.9P (g1|D, i0) + 0.6P (g2|D, i0) + 0.01P (g3|D, i0)

P (l1|i0) = 0.389

D I

G S

L

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 176: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

49/85

P (l1|i0) = P (l1, i0)

P (i0)

P (l1, i0) =∑

D∈0,1

∑S∈0,1

∑G∈A,B,C

P (i0, D,G, S, l1)

=∑

D∈0,1

P (D)∑

S∈0,1

P (S|i0)∑

G∈A,B,C

P (G|D, i0)P (l1|G)

=∑

D∈0,1

P (D)∑

S∈0,1

P (S|i0)∑

G∈A,B,C

0.9P (g1|D, i0) + 0.6P (g2|D, i0) + 0.01P (g3|D, i0)

P (l1|i0) = 0.389

D I

G S

L

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 177: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

49/85

P (l1|i0) = P (l1, i0)

P (i0)

P (l1, i0) =∑

D∈0,1

∑S∈0,1

∑G∈A,B,C

P (i0, D,G, S, l1)

=∑

D∈0,1

P (D)∑

S∈0,1

P (S|i0)∑

G∈A,B,C

P (G|D, i0)P (l1|G)

=∑

D∈0,1

P (D)∑

S∈0,1

P (S|i0)∑

G∈A,B,C

0.9P (g1|D, i0) + 0.6P (g2|D, i0) + 0.01P (g3|D, i0)

P (l1|i0) = 0.389

D I

G S

L

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 178: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

49/85

P (l1|i0) = P (l1, i0)

P (i0)

P (l1, i0) =∑

D∈0,1

∑S∈0,1

∑G∈A,B,C

P (i0, D,G, S, l1)

=∑

D∈0,1

P (D)∑

S∈0,1

P (S|i0)∑

G∈A,B,C

P (G|D, i0)P (l1|G)

=∑

D∈0,1

P (D)∑

S∈0,1

P (S|i0)∑

G∈A,B,C

0.9P (g1|D, i0) + 0.6P (g2|D, i0) + 0.01P (g3|D, i0)

P (l1|i0) = 0.389

D I

G S

L

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 179: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

49/85

P (l1|i0) = P (l1, i0)

P (i0)

P (l1, i0) =∑

D∈0,1

∑S∈0,1

∑G∈A,B,C

P (i0, D,G, S, l1)

=∑

D∈0,1

P (D)∑

S∈0,1

P (S|i0)∑

G∈A,B,C

P (G|D, i0)P (l1|G)

=∑

D∈0,1

P (D)∑

S∈0,1

P (S|i0)∑

G∈A,B,C

0.9P (g1|D, i0) + 0.6P (g2|D, i0) + 0.01P (g3|D, i0)

P (l1|i0) = 0.389

D I

G S

L

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 180: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

50/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Causal Reasoning

What if the course was easy?

A not so intelligent student may stillbe able to get a good grade and hencea good letter

P (l1|i0, d0) =∑

Gε(A,B,C)

∑Sε(0,1)

P (i0, d0, G, S, l1)

P (l1|i0, d1) = 0.513 (increases)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 181: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

50/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Causal Reasoning

What if the course was easy?

A not so intelligent student may stillbe able to get a good grade and hencea good letter

P (l1|i0, d0) =∑

Gε(A,B,C)

∑Sε(0,1)

P (i0, d0, G, S, l1)

P (l1|i0, d1) = 0.513 (increases)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 182: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

50/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Causal Reasoning

What if the course was easy?

A not so intelligent student may stillbe able to get a good grade and hencea good letter

P (l1|i0, d0) =∑

Gε(A,B,C)

∑Sε(0,1)

P (i0, d0, G, S, l1)

P (l1|i0, d1) = 0.513 (increases)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 183: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

50/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Causal Reasoning

What if the course was easy?

A not so intelligent student may stillbe able to get a good grade and hencea good letter

P (l1|i0, d0) =∑

Gε(A,B,C)

∑Sε(0,1)

P (i0, d0, G, S, l1)

P (l1|i0, d1) = 0.513 (increases)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 184: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

51/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Evidential Reasoning

Here, we reason about causes by look-ing at their effects

What is the probability of the studentbeing intelligent?

What is the probability of the coursebeing difficult?

Now let us see what happens if weobserve some effects

P (i1) = 0.3

P (d1) = 0.4

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 185: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

51/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Evidential Reasoning

Here, we reason about causes by look-ing at their effects

What is the probability of the studentbeing intelligent?

What is the probability of the coursebeing difficult?

Now let us see what happens if weobserve some effects

P (i1) =?

P (i1) = 0.3

P (d1) = 0.4

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 186: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

51/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Evidential Reasoning

Here, we reason about causes by look-ing at their effects

What is the probability of the studentbeing intelligent?

What is the probability of the coursebeing difficult?

Now let us see what happens if weobserve some effects

P (i1) = 0.3

P (d1) = 0.4

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 187: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

51/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Evidential Reasoning

Here, we reason about causes by look-ing at their effects

What is the probability of the studentbeing intelligent?

What is the probability of the coursebeing difficult?

Now let us see what happens if weobserve some effects

P (i1) = 0.3

P (d1) =?

P (d1) = 0.4

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 188: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

51/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Evidential Reasoning

Here, we reason about causes by look-ing at their effects

What is the probability of the studentbeing intelligent?

What is the probability of the coursebeing difficult?

Now let us see what happens if weobserve some effects

P (i1) = 0.3

P (d1) = 0.4

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 189: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

51/85

D

Difficulty

I Intelligence

G Grade S SAT

LLetter

d0 d1

0.6 0.4

i0 i1

0.7 0.3

g1 g2 g3

i0,d0 0.3 0.4 0.3

i0,d1 0.05 0.25 0.7

i1,d0 0.9 0.08 0.02

i1,d1 0.5 0.3 0.2 s0 s1

i0 0.95 0.05

i1 0.2 0.8

l0 l1

g1 0.1 0.9

g2 0.4 0.6

g3 0.99 0.01

Evidential Reasoning

Here, we reason about causes by look-ing at their effects

What is the probability of the studentbeing intelligent?

What is the probability of the coursebeing difficult?

Now let us see what happens if weobserve some effects

P (i1) = 0.3

P (d1) = 0.4

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 190: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

52/85

P (i1) = 0.3

P (d1) = 0.4

P (i1|g3) = 0.079(drops)

P (d1|g3) = 0.629(increases)

P (i1|l0) = 0.14(drops)

P (l1|l0, g3) = 0.079

(same as P (i1|g3))

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Evidential Reasoning

What if someone tells us that the stu-dent secured C grade?

What if instead of getting to knowthe grade, we get to know that thestudent got a poor recommendationletter?

What if we know about the grade aswell as the recommendation letter?

The last case is interesting! (We willreturn to it later)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 191: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

52/85

P (i1) = 0.3

P (d1) = 0.4

P (i1|g3) = 0.079(drops)

P (d1|g3) = 0.629(increases)

P (i1|l0) = 0.14(drops)

P (l1|l0, g3) = 0.079

(same as P (i1|g3))

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Evidential Reasoning

What if someone tells us that the stu-dent secured C grade?

What if instead of getting to knowthe grade, we get to know that thestudent got a poor recommendationletter?

What if we know about the grade aswell as the recommendation letter?

The last case is interesting! (We willreturn to it later)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 192: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

52/85

P (i1) = 0.3

P (d1) = 0.4

P (i1|g3) = 0.079(drops)

P (d1|g3) = 0.629(increases)

P (i1|l0) = 0.14(drops)

P (l1|l0, g3) = 0.079

(same as P (i1|g3))

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Evidential Reasoning

What if someone tells us that the stu-dent secured C grade?

What if instead of getting to knowthe grade, we get to know that thestudent got a poor recommendationletter?

What if we know about the grade aswell as the recommendation letter?

The last case is interesting! (We willreturn to it later)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 193: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

52/85

P (i1) = 0.3

P (d1) = 0.4

P (i1|g3) = 0.079(drops)

P (d1|g3) = 0.629(increases)

P (i1|l0) = 0.14(drops)

P (l1|l0, g3) = 0.079

(same as P (i1|g3))

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Evidential Reasoning

What if someone tells us that the stu-dent secured C grade?

What if instead of getting to knowthe grade, we get to know that thestudent got a poor recommendationletter?

What if we know about the grade aswell as the recommendation letter?

The last case is interesting! (We willreturn to it later)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 194: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

52/85

P (i1) = 0.3

P (d1) = 0.4

P (i1|g3) = 0.079(drops)

P (d1|g3) = 0.629(increases)

P (i1|l0) = 0.14(drops)

P (l1|l0, g3) = 0.079

(same as P (i1|g3))

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Evidential Reasoning

What if someone tells us that the stu-dent secured C grade?

What if instead of getting to knowthe grade, we get to know that thestudent got a poor recommendationletter?

What if we know about the grade aswell as the recommendation letter?

The last case is interesting! (We willreturn to it later)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 195: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

52/85

P (i1) = 0.3

P (d1) = 0.4

P (i1|g3) = 0.079(drops)

P (d1|g3) = 0.629(increases)

P (i1|l0) = 0.14(drops)

P (l1|l0, g3) = 0.079

(same as P (i1|g3))

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Evidential Reasoning

What if someone tells us that the stu-dent secured C grade?

What if instead of getting to knowthe grade, we get to know that thestudent got a poor recommendationletter?

What if we know about the grade aswell as the recommendation letter?

The last case is interesting! (We willreturn to it later)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 196: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

52/85

P (i1) = 0.3

P (d1) = 0.4

P (i1|g3) = 0.079(drops)

P (d1|g3) = 0.629(increases)

P (i1|l0) = 0.14(drops)

P (l1|l0, g3) = 0.079

(same as P (i1|g3))

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Evidential Reasoning

What if someone tells us that the stu-dent secured C grade?

What if instead of getting to knowthe grade, we get to know that thestudent got a poor recommendationletter?

What if we know about the grade aswell as the recommendation letter?

The last case is interesting! (We willreturn to it later)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 197: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

53/85

P (i1) = 0.3

P (i1|g3) = 0.079(drops)

P (i1|g3, d1) = 0.11(improves)

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Here, we see how different causes ofthe same effect can interact

We already saw how knowing thegrade influences our estimate of in-telligence

What if we were told the course wasdifficult?

Our belief in the student’s intelligenceimproves

Why? Let us see

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 198: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

53/85

P (i1) = 0.3

P (i1|g3) = 0.079(drops)

P (i1|g3, d1) = 0.11(improves)

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Here, we see how different causes ofthe same effect can interact

We already saw how knowing thegrade influences our estimate of in-telligence

What if we were told the course wasdifficult?

Our belief in the student’s intelligenceimproves

Why? Let us see

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 199: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

53/85

P (i1) = 0.3

P (i1|g3) = 0.079(drops)

P (i1|g3, d1) = 0.11(improves)

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Here, we see how different causes ofthe same effect can interact

We already saw how knowing thegrade influences our estimate of in-telligence

What if we were told the course wasdifficult?

Our belief in the student’s intelligenceimproves

Why? Let us see

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 200: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

53/85

P (i1) = 0.3

P (i1|g3) = 0.079(drops)

P (i1|g3, d1) = 0.11(improves)

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Here, we see how different causes ofthe same effect can interact

We already saw how knowing thegrade influences our estimate of in-telligence

What if we were told the course wasdifficult?

Our belief in the student’s intelligenceimproves

Why? Let us see

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 201: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

53/85

P (i1) = 0.3

P (i1|g3) = 0.079(drops)

P (i1|g3, d1) = 0.11(improves)

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Here, we see how different causes ofthe same effect can interact

We already saw how knowing thegrade influences our estimate of in-telligence

What if we were told the course wasdifficult?

Our belief in the student’s intelligenceimproves

Why? Let us see

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 202: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

54/85

P (i1) = 0.3

P (i1|g3) = 0.079

P (i1|g3, d1) = 0.11

P (i1|g2) = 0.175

P (i1|g2, d1) = 0.34

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Knowing that the course was difficultexplains away the bad grade

“Oh! Maybe the course was just toodifficult and the student might havereceived a bad grade despite being in-telligent!”

The explaining away effect could beeven more dramatic

Let us consider the case when thegrade was B

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 203: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

54/85

P (i1) = 0.3

P (i1|g3) = 0.079

P (i1|g3, d1) = 0.11

P (i1|g2) = 0.175

P (i1|g2, d1) = 0.34

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Knowing that the course was difficultexplains away the bad grade

“Oh! Maybe the course was just toodifficult and the student might havereceived a bad grade despite being in-telligent!”

The explaining away effect could beeven more dramatic

Let us consider the case when thegrade was B

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 204: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

54/85

P (i1) = 0.3

P (i1|g3) = 0.079

P (i1|g3, d1) = 0.11

P (i1|g2) = 0.175

P (i1|g2, d1) = 0.34

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Knowing that the course was difficultexplains away the bad grade

“Oh! Maybe the course was just toodifficult and the student might havereceived a bad grade despite being in-telligent!”

The explaining away effect could beeven more dramatic

Let us consider the case when thegrade was B

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 205: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

54/85

P (i1) = 0.3

P (i1|g3) = 0.079

P (i1|g3, d1) = 0.11

P (i1|g2) = 0.175

P (i1|g2, d1) = 0.34

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Knowing that the course was difficultexplains away the bad grade

“Oh! Maybe the course was just toodifficult and the student might havereceived a bad grade despite being in-telligent!”

The explaining away effect could beeven more dramatic

Let us consider the case when thegrade was B

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 206: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

54/85

P (i1) = 0.3

P (i1|g3) = 0.079

P (i1|g3, d1) = 0.11

P (i1|g2) = 0.175

P (i1|g2, d1) = 0.34

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Knowing that the course was difficultexplains away the bad grade

“Oh! Maybe the course was just toodifficult and the student might havereceived a bad grade despite being in-telligent!”

The explaining away effect could beeven more dramatic

Let us consider the case when thegrade was B

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 207: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

55/85

P (d1) = 0.40

P (d1|g3) = 0.629

P (d1|s1, g3) = 0.76

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Suppose we know that the studenthad a high SAT Score, what happensto our belief about the difficulty ofthe course?

Knowing that the SAT score was hightells us that the student seems intel-ligent and perhaps the reason why hescored a poor grade is that the coursewas difficult

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 208: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

55/85

P (d1) = 0.40

P (d1|g3) = 0.629

P (d1|s1, g3) = 0.76

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Suppose we know that the studenthad a high SAT Score, what happensto our belief about the difficulty ofthe course?

Knowing that the SAT score was hightells us that the student seems intel-ligent and perhaps the reason why hescored a poor grade is that the coursewas difficult

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 209: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

55/85

P (d1) = 0.40

P (d1|g3) = 0.629

P (d1|s1, g3) = 0.76

D

Difficulty

I

Intelligence

GGrade S

SAT

LLetter

Explaining Away

Suppose we know that the studenthad a high SAT Score, what happensto our belief about the difficulty ofthe course?

Knowing that the SAT score was hightells us that the student seems intel-ligent and perhaps the reason why hescored a poor grade is that the coursewas difficult

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 210: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

56/85

Module 17.6: Independencies encoded by a Bayesiannetwork (Case 1: Node and its parents)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 211: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

57/85

Why do we care about independencies encoded in a Bayesian network?

We saw that if two variables are independent then the chain rule getssimplified, resulting in simpler factors which in turn reduces the number ofparameters.

In the extreme case, we say that in the Bayesian network model, each factorwas very simple (just P (Xi|Y ) and as a result each factor just added 3parameters

The more the number of independencies, the fewer the parameters and thelesser is the inference time

For example, if we want to the compute the marginal P (S) then we just needto sum over the values of I and not on any other variables

Hence we are interested in finding the independencies encoded in a Bayesiannetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 212: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

57/85

Why do we care about independencies encoded in a Bayesian network?

We saw that if two variables are independent then the chain rule getssimplified, resulting in simpler factors which in turn reduces the number ofparameters.

In the extreme case, we say that in the Bayesian network model, each factorwas very simple (just P (Xi|Y ) and as a result each factor just added 3parameters

The more the number of independencies, the fewer the parameters and thelesser is the inference time

For example, if we want to the compute the marginal P (S) then we just needto sum over the values of I and not on any other variables

Hence we are interested in finding the independencies encoded in a Bayesiannetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 213: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

57/85

Why do we care about independencies encoded in a Bayesian network?

We saw that if two variables are independent then the chain rule getssimplified, resulting in simpler factors which in turn reduces the number ofparameters.

In the extreme case, we say that in the Bayesian network model, each factorwas very simple (just P (Xi|Y ) and as a result each factor just added 3parameters

The more the number of independencies, the fewer the parameters and thelesser is the inference time

For example, if we want to the compute the marginal P (S) then we just needto sum over the values of I and not on any other variables

Hence we are interested in finding the independencies encoded in a Bayesiannetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 214: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

57/85

Why do we care about independencies encoded in a Bayesian network?

We saw that if two variables are independent then the chain rule getssimplified, resulting in simpler factors which in turn reduces the number ofparameters.

In the extreme case, we say that in the Bayesian network model, each factorwas very simple (just P (Xi|Y ) and as a result each factor just added 3parameters

The more the number of independencies, the fewer the parameters and thelesser is the inference time

For example, if we want to the compute the marginal P (S) then we just needto sum over the values of I and not on any other variables

Hence we are interested in finding the independencies encoded in a Bayesiannetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 215: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

57/85

Why do we care about independencies encoded in a Bayesian network?

We saw that if two variables are independent then the chain rule getssimplified, resulting in simpler factors which in turn reduces the number ofparameters.

In the extreme case, we say that in the Bayesian network model, each factorwas very simple (just P (Xi|Y ) and as a result each factor just added 3parameters

The more the number of independencies, the fewer the parameters and thelesser is the inference time

For example, if we want to the compute the marginal P (S) then we just needto sum over the values of I and not on any other variables

Hence we are interested in finding the independencies encoded in a Bayesiannetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 216: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

57/85

Why do we care about independencies encoded in a Bayesian network?

We saw that if two variables are independent then the chain rule getssimplified, resulting in simpler factors which in turn reduces the number ofparameters.

In the extreme case, we say that in the Bayesian network model, each factorwas very simple (just P (Xi|Y ) and as a result each factor just added 3parameters

The more the number of independencies, the fewer the parameters and thelesser is the inference time

For example, if we want to the compute the marginal P (S) then we just needto sum over the values of I and not on any other variables

Hence we are interested in finding the independencies encoded in a Bayesiannetwork

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 217: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

58/85

In general, given n random variables, we are interested in knowing if

Xi ⊥ Xj

Xi ⊥ Xj |Z, where Z ⊆ X1, X2, ..., Xn/Xi, Xj

Let us answer some of the questions for our student Bayesian Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 218: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

58/85

In general, given n random variables, we are interested in knowing if

Xi ⊥ Xj

Xi ⊥ Xj |Z, where Z ⊆ X1, X2, ..., Xn/Xi, Xj

Let us answer some of the questions for our student Bayesian Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 219: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

58/85

In general, given n random variables, we are interested in knowing if

Xi ⊥ Xj

Xi ⊥ Xj |Z, where Z ⊆ X1, X2, ..., Xn/Xi, Xj

Let us answer some of the questions for our student Bayesian Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 220: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

59/85

D I

G S

L

To understand this let us return toour student example

First, let us see some independen-cies which clearly do not exist in thegraph

Is L ⊥ G? (No, by construction)

Is G ⊥ D? (No, by construction)

Is G ⊥ I? (No, by construction)

Is S ⊥ I? (No, by construction)

Rule: A node is not independent ofits parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 221: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

59/85

D I

G S

L

To understand this let us return toour student example

First, let us see some independen-cies which clearly do not exist in thegraph

Is L ⊥ G? (No, by construction)

Is G ⊥ D? (No, by construction)

Is G ⊥ I? (No, by construction)

Is S ⊥ I? (No, by construction)

Rule: A node is not independent ofits parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 222: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

59/85

D I

G S

L

To understand this let us return toour student example

First, let us see some independen-cies which clearly do not exist in thegraph

Is L ⊥ G? (No, by construction)

Is G ⊥ D? (No, by construction)

Is G ⊥ I? (No, by construction)

Is S ⊥ I? (No, by construction)

Rule: A node is not independent ofits parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 223: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

59/85

D I

G S

L

To understand this let us return toour student example

First, let us see some independen-cies which clearly do not exist in thegraph

Is L ⊥ G? (No, by construction)

Is G ⊥ D? (No, by construction)

Is G ⊥ I? (No, by construction)

Is S ⊥ I? (No, by construction)

Rule: A node is not independent ofits parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 224: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

59/85

D I

G S

L

To understand this let us return toour student example

First, let us see some independen-cies which clearly do not exist in thegraph

Is L ⊥ G? (No, by construction)

Is G ⊥ D? (No, by construction)

Is G ⊥ I? (No, by construction)

Is S ⊥ I? (No, by construction)

Rule: A node is not independent ofits parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 225: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

59/85

D I

G S

L

To understand this let us return toour student example

First, let us see some independen-cies which clearly do not exist in thegraph

Is L ⊥ G? (No, by construction)

Is G ⊥ D? (No, by construction)

Is G ⊥ I? (No, by construction)

Is S ⊥ I? (No, by construction)

Rule: A node is not independent ofits parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 226: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

59/85

D I

G S

L

To understand this let us return toour student example

First, let us see some independen-cies which clearly do not exist in thegraph

Is L ⊥ G? (No, by construction)

Is G ⊥ D? (No, by construction)

Is G ⊥ I? (No, by construction)

Is S ⊥ I? (No, by construction)

Rule?

Rule: A node is not independent ofits parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 227: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

59/85

D I

G S

L

To understand this let us return toour student example

First, let us see some independen-cies which clearly do not exist in thegraph

Is L ⊥ G? (No, by construction)

Is G ⊥ D? (No, by construction)

Is G ⊥ I? (No, by construction)

Is S ⊥ I? (No, by construction)

Rule: A node is not independent ofits parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 228: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

60/85

D I

G S

L

No, the instructor is not going to lookat the SAT score but the grade

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

Let us focus on G and L.

We already know that G 6⊥ L.

What if we know the value of I? DoesG become independent of L?

No (intuitively, the student may beintelligent or not but ultimately, theletter depends on the performance inthe course.)

If we know the value of D, does Gbecome independent of L.

No (intuitively, the course may beeasy or hard but the letter woulddepend on the performance in thecourse)

What if we know the value of S? DoesG become independent of L?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 229: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

60/85

D I

G S

L

No, the instructor is not going to lookat the SAT score but the grade

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

Let us focus on G and L.

We already know that G 6⊥ L.

What if we know the value of I? DoesG become independent of L?

No (intuitively, the student may beintelligent or not but ultimately, theletter depends on the performance inthe course.)

If we know the value of D, does Gbecome independent of L.

No (intuitively, the course may beeasy or hard but the letter woulddepend on the performance in thecourse)

What if we know the value of S? DoesG become independent of L?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 230: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

60/85

D I

G S

L

No, the instructor is not going to lookat the SAT score but the grade

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

Let us focus on G and L.

We already know that G 6⊥ L.

What if we know the value of I? DoesG become independent of L?

No (intuitively, the student may beintelligent or not but ultimately, theletter depends on the performance inthe course.)

If we know the value of D, does Gbecome independent of L.

No (intuitively, the course may beeasy or hard but the letter woulddepend on the performance in thecourse)

What if we know the value of S? DoesG become independent of L?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 231: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

60/85

D I

G S

L

No, the instructor is not going to lookat the SAT score but the grade

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

Let us focus on G and L.

We already know that G 6⊥ L.

What if we know the value of I? DoesG become independent of L?

No (intuitively, the student may beintelligent or not but ultimately, theletter depends on the performance inthe course.)

If we know the value of D, does Gbecome independent of L.

No (intuitively, the course may beeasy or hard but the letter woulddepend on the performance in thecourse)

What if we know the value of S? DoesG become independent of L?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 232: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

60/85

D I

G S

L

No, the instructor is not going to lookat the SAT score but the grade

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

Let us focus on G and L.

We already know that G 6⊥ L.

What if we know the value of I? DoesG become independent of L?

No (intuitively, the student may beintelligent or not but ultimately, theletter depends on the performance inthe course.)

If we know the value of D, does Gbecome independent of L.

No (intuitively, the course may beeasy or hard but the letter woulddepend on the performance in thecourse)

What if we know the value of S? DoesG become independent of L?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 233: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

60/85

D I

G S

L

No, the instructor is not going to lookat the SAT score but the grade

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

Let us focus on G and L.

We already know that G 6⊥ L.

What if we know the value of I? DoesG become independent of L?

No (intuitively, the student may beintelligent or not but ultimately, theletter depends on the performance inthe course.)

If we know the value of D, does Gbecome independent of L.

No (intuitively, the course may beeasy or hard but the letter woulddepend on the performance in thecourse)

What if we know the value of S? DoesG become independent of L?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 234: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

60/85

D I

G S

L

No, the instructor is not going to lookat the SAT score but the grade

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

Let us focus on G and L.

We already know that G 6⊥ L.

What if we know the value of I? DoesG become independent of L?

No (intuitively, the student may beintelligent or not but ultimately, theletter depends on the performance inthe course.)

If we know the value of D, does Gbecome independent of L.

No (intuitively, the course may beeasy or hard but the letter woulddepend on the performance in thecourse)

What if we know the value of S? DoesG become independent of L?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 235: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

60/85

D I

G S

L

No, the instructor is not going to lookat the SAT score but the grade

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

Let us focus on G and L.

We already know that G 6⊥ L.

What if we know the value of I? DoesG become independent of L?

No (intuitively, the student may beintelligent or not but ultimately, theletter depends on the performance inthe course.)

If we know the value of D, does Gbecome independent of L.

No (intuitively, the course may beeasy or hard but the letter woulddepend on the performance in thecourse)

What if we know the value of S? DoesG become independent of L?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 236: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

60/85

D I

G S

L

No, the instructor is not going to lookat the SAT score but the grade

Rule?

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

Let us focus on G and L.

We already know that G 6⊥ L.

What if we know the value of I? DoesG become independent of L?

No (intuitively, the student may beintelligent or not but ultimately, theletter depends on the performance inthe course.)

If we know the value of D, does Gbecome independent of L.

No (intuitively, the course may beeasy or hard but the letter woulddepend on the performance in thecourse)

What if we know the value of S? DoesG become independent of L?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 237: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

60/85

D I

G S

L

No, the instructor is not going to lookat the SAT score but the grade

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

Let us focus on G and L.

We already know that G 6⊥ L.

What if we know the value of I? DoesG become independent of L?

No (intuitively, the student may beintelligent or not but ultimately, theletter depends on the performance inthe course.)

If we know the value of D, does Gbecome independent of L.

No (intuitively, the course may beeasy or hard but the letter woulddepend on the performance in thecourse)

What if we know the value of S? DoesG become independent of L?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 238: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

61/85

D I

G S

L

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

The same argument can be madeabout the following pairs

G 6⊥ D (even when other variables aregiven)

G 6⊥ I (even when other variables aregiven)

S 6⊥ I (even when other variables aregiven)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 239: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

61/85

D I

G S

L

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

The same argument can be madeabout the following pairs

G 6⊥ D (even when other variables aregiven)

G 6⊥ I (even when other variables aregiven)

S 6⊥ I (even when other variables aregiven)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 240: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

61/85

D I

G S

L

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

The same argument can be madeabout the following pairs

G 6⊥ D (even when other variables aregiven)

G 6⊥ I (even when other variables aregiven)

S 6⊥ I (even when other variables aregiven)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 241: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

61/85

D I

G S

L

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

The same argument can be madeabout the following pairs

G 6⊥ D (even when other variables aregiven)

G 6⊥ I (even when other variables aregiven)

S 6⊥ I (even when other variables aregiven)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 242: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

61/85

D I

G S

L

Rule?

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

The same argument can be madeabout the following pairs

G 6⊥ D (even when other variables aregiven)

G 6⊥ I (even when other variables aregiven)

S 6⊥ I (even when other variables aregiven)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 243: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

61/85

D I

G S

L

Rule: A node is not independent ofits parents even when we are giventhe values of other variables

The same argument can be madeabout the following pairs

G 6⊥ D (even when other variables aregiven)

G 6⊥ I (even when other variables aregiven)

S 6⊥ I (even when other variables aregiven)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 244: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

62/85

Module 17.7: Independencies encoded by a Bayesiannetwork (Case 2: Node and its non-parents)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 245: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

63/85

D I

G S

L

Now let’s look at the relation betweena node and its non-parent nodes

Is L ⊥ S?

No, knowing the SAT score tells usabout I which in turn tells us some-thing about G and hence L

Hence we expect P (l1|s1) > P (l1|s0)Similarly we can argue L 6⊥ D andL 6⊥ I

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 246: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

63/85

D I

G S

L

Now let’s look at the relation betweena node and its non-parent nodes

Is L ⊥ S?

No, knowing the SAT score tells usabout I which in turn tells us some-thing about G and hence L

Hence we expect P (l1|s1) > P (l1|s0)Similarly we can argue L 6⊥ D andL 6⊥ I

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 247: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

63/85

D I

G S

L

Now let’s look at the relation betweena node and its non-parent nodes

Is L ⊥ S?

No, knowing the SAT score tells usabout I which in turn tells us some-thing about G and hence L

Hence we expect P (l1|s1) > P (l1|s0)Similarly we can argue L 6⊥ D andL 6⊥ I

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 248: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

63/85

D I

G S

L

Now let’s look at the relation betweena node and its non-parent nodes

Is L ⊥ S?

No, knowing the SAT score tells usabout I which in turn tells us some-thing about G and hence L

Hence we expect P (l1|s1) > P (l1|s0)

Similarly we can argue L 6⊥ D andL 6⊥ I

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 249: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

63/85

D I

G S

L

Now let’s look at the relation betweena node and its non-parent nodes

Is L ⊥ S?

No, knowing the SAT score tells usabout I which in turn tells us some-thing about G and hence L

Hence we expect P (l1|s1) > P (l1|s0)Similarly we can argue L 6⊥ D andL 6⊥ I

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 250: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

64/85

D I

G S

L

But what if we know the value of G?

Is (L ⊥ S)|G?

Yes, the grade completely determinesthe recommendation letter

Once we know the grade, other vari-ables do not add any information

Hence (L ⊥ S)|G

Similarly we can argue (L ⊥ I)|G and(L ⊥ D)|G

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 251: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

64/85

D I

G S

L

But what if we know the value of G?

Is (L ⊥ S)|G?

Yes, the grade completely determinesthe recommendation letter

Once we know the grade, other vari-ables do not add any information

Hence (L ⊥ S)|G

Similarly we can argue (L ⊥ I)|G and(L ⊥ D)|G

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 252: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

64/85

D I

G S

L

But what if we know the value of G?

Is (L ⊥ S)|G?

Yes, the grade completely determinesthe recommendation letter

Once we know the grade, other vari-ables do not add any information

Hence (L ⊥ S)|G

Similarly we can argue (L ⊥ I)|G and(L ⊥ D)|G

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 253: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

64/85

D I

G S

L

But what if we know the value of G?

Is (L ⊥ S)|G?

Yes, the grade completely determinesthe recommendation letter

Once we know the grade, other vari-ables do not add any information

Hence (L ⊥ S)|G

Similarly we can argue (L ⊥ I)|G and(L ⊥ D)|G

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 254: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

64/85

D I

G S

L

But what if we know the value of G?

Is (L ⊥ S)|G?

Yes, the grade completely determinesthe recommendation letter

Once we know the grade, other vari-ables do not add any information

Hence (L ⊥ S)|G

Similarly we can argue (L ⊥ I)|G and(L ⊥ D)|G

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 255: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

64/85

D I

G S

L

But what if we know the value of G?

Is (L ⊥ S)|G?

Yes, the grade completely determinesthe recommendation letter

Once we know the grade, other vari-ables do not add any information

Hence (L ⊥ S)|G

Similarly we can argue (L ⊥ I)|G and(L ⊥ D)|G

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 256: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

65/85

D I

G S

L

But, wait a minute!

The instructor may also want to lookat the SAT score in addition to thegrade

Well, we “assumed” that the in-structor only relies on the grade.

That was our “belief” of how theworld works

And hence we drew the network ac-cordingly

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 257: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

65/85

D I

G S

L

But, wait a minute!

The instructor may also want to lookat the SAT score in addition to thegrade

Well, we “assumed” that the in-structor only relies on the grade.

That was our “belief” of how theworld works

And hence we drew the network ac-cordingly

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 258: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

65/85

D I

G S

L

But, wait a minute!

The instructor may also want to lookat the SAT score in addition to thegrade

Well, we “assumed” that the in-structor only relies on the grade.

That was our “belief” of how theworld works

And hence we drew the network ac-cordingly

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 259: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

65/85

D I

G S

L

But, wait a minute!

The instructor may also want to lookat the SAT score in addition to thegrade

Well, we “assumed” that the in-structor only relies on the grade.

That was our “belief” of how theworld works

And hence we drew the network ac-cordingly

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 260: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

65/85

D I

G S

L

But, wait a minute!

The instructor may also want to lookat the SAT score in addition to thegrade

Well, we “assumed” that the in-structor only relies on the grade.

That was our “belief” of how theworld works

And hence we drew the network ac-cordingly

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 261: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

66/85

D I

C

G S

L

Of course we are free to change ourassumptions

We may want to assume that the in-structor also looks at the SAT score

But if that is the case we have tochange the network to reflect this de-pendence

Why just SAT score? The instructormay even consult one of his colleaguesand seek his/her opinion

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 262: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

66/85

D I

C

G S

L

Of course we are free to change ourassumptions

We may want to assume that the in-structor also looks at the SAT score

But if that is the case we have tochange the network to reflect this de-pendence

Why just SAT score? The instructormay even consult one of his colleaguesand seek his/her opinion

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 263: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

66/85

D I

C

G S

L

Of course we are free to change ourassumptions

We may want to assume that the in-structor also looks at the SAT score

But if that is the case we have tochange the network to reflect this de-pendence

Why just SAT score? The instructormay even consult one of his colleaguesand seek his/her opinion

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 264: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

66/85

D I

C G S

L

Of course we are free to change ourassumptions

We may want to assume that the in-structor also looks at the SAT score

But if that is the case we have tochange the network to reflect this de-pendence

Why just SAT score? The instructormay even consult one of his colleaguesand seek his/her opinion

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 265: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

67/85

D I

C G S

L

Remember: The graph is a reflec-tion of our assumptions about howthe world works

Our assumptions about dependenciesare encoded in the graph

Once we build the graph we freeze itand do all the reasoning and analysis(independence) on this graph

It is not fair to ask “what if” ques-tions involving other factors(For example, what if the professorwas in a bad mood?)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 266: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

67/85

D I

C G S

L

Remember: The graph is a reflec-tion of our assumptions about howthe world works

Our assumptions about dependenciesare encoded in the graph

Once we build the graph we freeze itand do all the reasoning and analysis(independence) on this graph

It is not fair to ask “what if” ques-tions involving other factors(For example, what if the professorwas in a bad mood?)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 267: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

67/85

D I

C G S

L

Remember: The graph is a reflec-tion of our assumptions about howthe world works

Our assumptions about dependenciesare encoded in the graph

Once we build the graph we freeze itand do all the reasoning and analysis(independence) on this graph

It is not fair to ask “what if” ques-tions involving other factors(For example, what if the professorwas in a bad mood?)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 268: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

67/85

D I

C G S

L

Remember: The graph is a reflec-tion of our assumptions about howthe world works

Our assumptions about dependenciesare encoded in the graph

Once we build the graph we freeze itand do all the reasoning and analysis(independence) on this graph

It is not fair to ask “what if” ques-tions involving other factors(For example, what if the professorwas in a bad mood?)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 269: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

68/85

D I

G S

L

(a)

D I

G S

L

(b)

If we believe Graph (a) is how theworld works then (L ⊥ S)|G

If we believe Graph(b) is how theworld works then (L 6⊥ S)|G

We will stick to Graph(a) for thediscussion

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 270: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

68/85

D I

G S

L

(a)

D I

G S

L

(b)

If we believe Graph (a) is how theworld works then (L ⊥ S)|G

If we believe Graph(b) is how theworld works then (L 6⊥ S)|G

We will stick to Graph(a) for thediscussion

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 271: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

68/85

D I

G S

L

(a)

D I

G S

L

(b)

If we believe Graph (a) is how theworld works then (L ⊥ S)|G

If we believe Graph(b) is how theworld works then (L 6⊥ S)|G

We will stick to Graph(a) for thediscussion

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 272: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

69/85

Let’s return back to our discussion of finding independence relations in thegraph

So far we have seen three cases as summarized in the next module

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 273: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

69/85

Let’s return back to our discussion of finding independence relations in thegraph

So far we have seen three cases as summarized in the next module

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 274: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

70/85

Module 17.8: Independencies encoded by a Bayesiannetwork (Case 3: Node and its descendants)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 275: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

71/85

D I

G S

L

(G 6⊥ D) (G 6⊥ I) (S 6⊥ I) (L 6⊥ G)A node is not independent of its par-ents

(G 6⊥ D, I)|S,L(S 6⊥ I)|D,G,L(L 6⊥ G)|D, I, SA node is not independent of its par-ents even when other variables aregiven

(S ⊥ G)|I?(L ⊥ D, I, S)|G?(G ⊥ L)|D, I?A node seems to be independent ofother variables given its parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 276: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

71/85

D I

G S

L

(G 6⊥ D) (G 6⊥ I) (S 6⊥ I) (L 6⊥ G)A node is not independent of its par-ents

(G 6⊥ D, I)|S,L(S 6⊥ I)|D,G,L(L 6⊥ G)|D, I, SA node is not independent of its par-ents even when other variables aregiven

(S ⊥ G)|I?(L ⊥ D, I, S)|G?(G ⊥ L)|D, I?A node seems to be independent ofother variables given its parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 277: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

71/85

D I

G S

L

(G 6⊥ D) (G 6⊥ I) (S 6⊥ I) (L 6⊥ G)A node is not independent of its par-ents

(G 6⊥ D, I)|S,L(S 6⊥ I)|D,G,L(L 6⊥ G)|D, I, SA node is not independent of its par-ents even when other variables aregiven

(S ⊥ G)|I?(L ⊥ D, I, S)|G?(G ⊥ L)|D, I?A node seems to be independent ofother variables given its parents

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 278: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

72/85

D I

G S

L

Let us inspect this last rule

Is (G ⊥ L)|D, I?

If you know that d = 0 and i = 1 thenyou would expect the student to geta good grade

But now if someone tells you that thestudent got a poor letter, your beliefwill change

So (G 6⊥ L)|D, I

The effect (letter) actually gives us in-formation about the cause (grade)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 279: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

72/85

D I

G S

L

Let us inspect this last rule

Is (G ⊥ L)|D, I?

If you know that d = 0 and i = 1 thenyou would expect the student to geta good grade

But now if someone tells you that thestudent got a poor letter, your beliefwill change

So (G 6⊥ L)|D, I

The effect (letter) actually gives us in-formation about the cause (grade)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 280: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

72/85

D I

G S

L

Let us inspect this last rule

Is (G ⊥ L)|D, I?

If you know that d = 0 and i = 1 thenyou would expect the student to geta good grade

But now if someone tells you that thestudent got a poor letter, your beliefwill change

So (G 6⊥ L)|D, I

The effect (letter) actually gives us in-formation about the cause (grade)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 281: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

72/85

D I

G S

L

Let us inspect this last rule

Is (G ⊥ L)|D, I?

If you know that d = 0 and i = 1 thenyou would expect the student to geta good grade

But now if someone tells you that thestudent got a poor letter, your beliefwill change

So (G 6⊥ L)|D, I

The effect (letter) actually gives us in-formation about the cause (grade)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 282: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

72/85

D I

G S

L

Let us inspect this last rule

Is (G ⊥ L)|D, I?

If you know that d = 0 and i = 1 thenyou would expect the student to geta good grade

But now if someone tells you that thestudent got a poor letter, your beliefwill change

So (G 6⊥ L)|D, I

The effect (letter) actually gives us in-formation about the cause (grade)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 283: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

72/85

D I

G S

L

Let us inspect this last rule

Is (G ⊥ L)|D, I?

If you know that d = 0 and i = 1 thenyou would expect the student to geta good grade

But now if someone tells you that thestudent got a poor letter, your beliefwill change

So (G 6⊥ L)|D, I

The effect (letter) actually gives us in-formation about the cause (grade)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 284: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

73/85

D I

G S

L

(G 6⊥ D) (G 6⊥ I) (S 6⊥ I) (L 6⊥ G)A node is not independent of its par-ents

(G 6⊥ D, I)|S,L(S 6⊥ I)|D,G,L(L 6⊥ G)|D, I, SA node is not independent of its par-ents even when other variables aregiven

(S ⊥ G)|I(L ⊥ D, I, S)|G(G 6⊥ L)|D, IGiven its parents, a node isindependent of all variablesexcept its descendants

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 285: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

73/85

D I

G S

L

(G 6⊥ D) (G 6⊥ I) (S 6⊥ I) (L 6⊥ G)A node is not independent of its par-ents

(G 6⊥ D, I)|S,L(S 6⊥ I)|D,G,L(L 6⊥ G)|D, I, SA node is not independent of its par-ents even when other variables aregiven

(S ⊥ G)|I(L ⊥ D, I, S)|G(G 6⊥ L)|D, IGiven its parents, a node isindependent of all variablesexcept its descendants

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 286: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

73/85

D I

G S

L

(G 6⊥ D) (G 6⊥ I) (S 6⊥ I) (L 6⊥ G)A node is not independent of its par-ents

(G 6⊥ D, I)|S,L(S 6⊥ I)|D,G,L(L 6⊥ G)|D, I, SA node is not independent of its par-ents even when other variables aregiven

(S ⊥ G)|I(L ⊥ D, I, S)|G(G 6⊥ L)|D, IGiven its parents, a node isindependent of all variablesexcept its descendants

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 287: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

74/85

Module 17.9: Bayesian Networks: Formal Semantics

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 288: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

75/85

We are now ready to formally define the semantics of a Bayesian Network

Bayesian Network Semantics:

A Bayesian Network structure G is a directed acyclic graph

where nodes representrandom variables X1, X2, ..., Xn. Let PGaXi

denote the parents of Xi in G and

NonDescendants(Xi) denote the variables in the graph that are not descendants ofXi. Then G encodes the following set of conditional independence assumptionscalled the local independencies and denoted by Ii(G) for each variable Xi.(Xi ⊥ NonDescendants(Xi)|PGaXi

)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 289: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

75/85

We are now ready to formally define the semantics of a Bayesian Network

Bayesian Network Semantics:

A Bayesian Network structure G is a directed acyclic graph where nodes representrandom variables X1, X2, ..., Xn.

Let PGaXidenote the parents of Xi in G and

NonDescendants(Xi) denote the variables in the graph that are not descendants ofXi. Then G encodes the following set of conditional independence assumptionscalled the local independencies and denoted by Ii(G) for each variable Xi.(Xi ⊥ NonDescendants(Xi)|PGaXi

)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 290: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

75/85

We are now ready to formally define the semantics of a Bayesian Network

Bayesian Network Semantics:

A Bayesian Network structure G is a directed acyclic graph where nodes representrandom variables X1, X2, ..., Xn. Let PGaXi

denote the parents of Xi in G

and

NonDescendants(Xi) denote the variables in the graph that are not descendants ofXi. Then G encodes the following set of conditional independence assumptionscalled the local independencies and denoted by Ii(G) for each variable Xi.(Xi ⊥ NonDescendants(Xi)|PGaXi

)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 291: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

75/85

We are now ready to formally define the semantics of a Bayesian Network

Bayesian Network Semantics:

A Bayesian Network structure G is a directed acyclic graph where nodes representrandom variables X1, X2, ..., Xn. Let PGaXi

denote the parents of Xi in G and

NonDescendants(Xi) denote the variables in the graph that are not descendants ofXi.

Then G encodes the following set of conditional independence assumptionscalled the local independencies and denoted by Ii(G) for each variable Xi.(Xi ⊥ NonDescendants(Xi)|PGaXi

)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 292: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

75/85

We are now ready to formally define the semantics of a Bayesian Network

Bayesian Network Semantics:

A Bayesian Network structure G is a directed acyclic graph where nodes representrandom variables X1, X2, ..., Xn. Let PGaXi

denote the parents of Xi in G and

NonDescendants(Xi) denote the variables in the graph that are not descendants ofXi. Then G encodes the following set of conditional independence assumptionscalled the local independencies and denoted by Ii(G) for each variable Xi.

(Xi ⊥ NonDescendants(Xi)|PGaXi)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 293: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

75/85

We are now ready to formally define the semantics of a Bayesian Network

Bayesian Network Semantics:

A Bayesian Network structure G is a directed acyclic graph where nodes representrandom variables X1, X2, ..., Xn. Let PGaXi

denote the parents of Xi in G and

NonDescendants(Xi) denote the variables in the graph that are not descendants ofXi. Then G encodes the following set of conditional independence assumptionscalled the local independencies and denoted by Ii(G) for each variable Xi.(Xi ⊥ NonDescendants(Xi)|PGaXi

)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 294: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

76/85

We will see some more formal definitions and then return to the question ofindependencies.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 295: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

77/85

Module 17.10: I Maps

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 296: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

78/85

D I

G S

L

Let P be a joint distribution overX = X1, X2, ..., Xn

We define I(P ) as the set ofindependence assumptions that holdin P .

For Example:I(P ) = (G ⊥ S|I,D), .....

Each element of this set is of theform Xi ⊥ Xj |Z,Z ⊆ X|Xi, Xj

Let I(G) be the set of independenceassumptions associated with a graphG.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 297: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

78/85

D I

G S

L

Let P be a joint distribution overX = X1, X2, ..., Xn

We define I(P ) as the set ofindependence assumptions that holdin P .

For Example:I(P ) = (G ⊥ S|I,D), .....

Each element of this set is of theform Xi ⊥ Xj |Z,Z ⊆ X|Xi, Xj

Let I(G) be the set of independenceassumptions associated with a graphG.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 298: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

78/85

D I

G S

L

Let P be a joint distribution overX = X1, X2, ..., Xn

We define I(P ) as the set ofindependence assumptions that holdin P .

For Example:I(P ) = (G ⊥ S|I,D), .....

Each element of this set is of theform Xi ⊥ Xj |Z,Z ⊆ X|Xi, Xj

Let I(G) be the set of independenceassumptions associated with a graphG.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 299: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

78/85

D I

G S

L

Let P be a joint distribution overX = X1, X2, ..., Xn

We define I(P ) as the set ofindependence assumptions that holdin P .

For Example:I(P ) = (G ⊥ S|I,D), .....

Each element of this set is of theform Xi ⊥ Xj |Z,Z ⊆ X|Xi, Xj

Let I(G) be the set of independenceassumptions associated with a graphG.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 300: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

78/85

D I

G S

L

Let P be a joint distribution overX = X1, X2, ..., Xn

We define I(P ) as the set ofindependence assumptions that holdin P .

For Example:I(P ) = (G ⊥ S|I,D), .....

Each element of this set is of theform Xi ⊥ Xj |Z,Z ⊆ X|Xi, Xj

Let I(G) be the set of independenceassumptions associated with a graphG.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 301: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

79/85

D I

G S

L

We say that G is an I-map for P ifI(G) ⊆ I(P )

G does not mislead us aboutindependencies in P

Any independence that G statesmust hold in P

But P can have additionalindependencies.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 302: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

79/85

D I

G S

L

We say that G is an I-map for P ifI(G) ⊆ I(P )

G does not mislead us aboutindependencies in P

Any independence that G statesmust hold in P

But P can have additionalindependencies.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 303: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

79/85

D I

G S

L

We say that G is an I-map for P ifI(G) ⊆ I(P )

G does not mislead us aboutindependencies in P

Any independence that G statesmust hold in P

But P can have additionalindependencies.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 304: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

79/85

D I

G S

L

We say that G is an I-map for P ifI(G) ⊆ I(P )

G does not mislead us aboutindependencies in P

Any independence that G statesmust hold in P

But P can have additionalindependencies.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 305: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

80/85

X Y P(X,Y)

0 0 0.08

0 1 0.32

1 0 0.12

1 1 0.48

Consider this joint distribution overX,Y

We need to find a G which is anI-map for this P

How do we find such a G?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 306: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

80/85

X Y P(X,Y)

0 0 0.08

0 1 0.32

1 0 0.12

1 1 0.48

Consider this joint distribution overX,Y

We need to find a G which is anI-map for this P

How do we find such a G?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 307: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

80/85

X Y P(X,Y)

0 0 0.08

0 1 0.32

1 0 0.12

1 1 0.48

Consider this joint distribution overX,Y

We need to find a G which is anI-map for this P

How do we find such a G?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 308: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

81/85

X Y P(X,Y)

0 0 0.08

0 1 0.32

1 0 0.12

1 1 0.48

Well since there are only 2 variableshere the only possibilities areI(P ) = (X ⊥ Y ) or I(P ) = Φ

From the table we can easily checkP (X,Y ) = P (X).P (Y )

I(P ) = (X ⊥ Y )

Now can you come up with a Gwhich satisfies I(G) ⊆ I(P )?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 309: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

81/85

X Y P(X,Y)

0 0 0.08

0 1 0.32

1 0 0.12

1 1 0.48

Well since there are only 2 variableshere the only possibilities areI(P ) = (X ⊥ Y ) or I(P ) = Φ

From the table we can easily checkP (X,Y ) = P (X).P (Y )

I(P ) = (X ⊥ Y )

Now can you come up with a Gwhich satisfies I(G) ⊆ I(P )?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 310: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

81/85

X Y P(X,Y)

0 0 0.08

0 1 0.32

1 0 0.12

1 1 0.48

Well since there are only 2 variableshere the only possibilities areI(P ) = (X ⊥ Y ) or I(P ) = Φ

From the table we can easily checkP (X,Y ) = P (X).P (Y )

I(P ) = (X ⊥ Y )

Now can you come up with a Gwhich satisfies I(G) ⊆ I(P )?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 311: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

81/85

X Y P(X,Y)

0 0 0.08

0 1 0.32

1 0 0.12

1 1 0.48

Well since there are only 2 variableshere the only possibilities areI(P ) = (X ⊥ Y ) or I(P ) = Φ

From the table we can easily checkP (X,Y ) = P (X).P (Y )

I(P ) = (X ⊥ Y )

Now can you come up with a Gwhich satisfies I(G) ⊆ I(P )?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 312: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

82/85

X Y X

Y X Y

I(G) = Φ I(G2) = Φ I(G3) = (X ⊥ Y )

Since we have only two variablesthere are only 3 possibilities for G

Which of these is an I-Map for P?

Well all three are I-Maps for P

They all satisfy the conditionI(G) ⊆ I(P )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 313: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

82/85

X Y X

Y X Y

I(G) = Φ I(G2) = Φ I(G3) = (X ⊥ Y )

Since we have only two variablesthere are only 3 possibilities for G

Which of these is an I-Map for P?

Well all three are I-Maps for P

They all satisfy the conditionI(G) ⊆ I(P )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 314: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

82/85

X Y X

Y X Y

I(G) = Φ I(G2) = Φ I(G3) = (X ⊥ Y )

Since we have only two variablesthere are only 3 possibilities for G

Which of these is an I-Map for P?

Well all three are I-Maps for P

They all satisfy the conditionI(G) ⊆ I(P )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 315: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

82/85

X Y X

Y X Y

I(G) = Φ I(G2) = Φ I(G3) = (X ⊥ Y )

Since we have only two variablesthere are only 3 possibilities for G

Which of these is an I-Map for P?

Well all three are I-Maps for P

They all satisfy the conditionI(G) ⊆ I(P )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 316: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

83/85

X Y P(X,Y)

0 0 0.08

0 1 0.32

1 0 0.12

1 1 0.48

Of course, this was just a toyexample

In practice, we do not know P andhence can’t compute I(P )

We just make some assumptionsabout I(P ) and then construct a Gsuch that I(G) ⊆ I(P )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 317: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

83/85

X Y P(X,Y)

0 0 0.08

0 1 0.32

1 0 0.12

1 1 0.48

Of course, this was just a toyexample

In practice, we do not know P andhence can’t compute I(P )

We just make some assumptionsabout I(P ) and then construct a Gsuch that I(G) ⊆ I(P )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 318: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

83/85

X Y P(X,Y)

0 0 0.08

0 1 0.32

1 0 0.12

1 1 0.48

Of course, this was just a toyexample

In practice, we do not know P andhence can’t compute I(P )

We just make some assumptionsabout I(P ) and then construct a Gsuch that I(G) ⊆ I(P )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 319: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

84/85

D I

G S

L

So why do we care about I-Map?

If G is an I-Map for a jointdistribution P then P factorizes overG

What does that mean?

Well, it just means that P can bewritten as a product of factors whereeach factor is a c.p.d associated withthe nodes of G

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 320: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

84/85

D I

G S

L

So why do we care about I-Map?

If G is an I-Map for a jointdistribution P then P factorizes overG

What does that mean?

Well, it just means that P can bewritten as a product of factors whereeach factor is a c.p.d associated withthe nodes of G

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 321: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

84/85

D I

G S

L

So why do we care about I-Map?

If G is an I-Map for a jointdistribution P then P factorizes overG

What does that mean?

Well, it just means that P can bewritten as a product of factors whereeach factor is a c.p.d associated withthe nodes of G

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 322: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

84/85

D I

G S

L

So why do we care about I-Map?

If G is an I-Map for a jointdistribution P then P factorizes overG

What does that mean?

Well, it just means that P can bewritten as a product of factors whereeach factor is a c.p.d associated withthe nodes of G

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 323: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

85/85

Theorem

Let G be a BN structure over a set ofrandom variables X and let P be a jointdistribution over these variables. If G isan I-Map for P, then P factorizesaccording to GProof:Exercise

Theorem

Let G be a BN structure over a set ofrandom variables X and let P be a jointdistribution over these variables. If Pfactorizes according to G, then G is anI-Map of PProof:Exercise

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 324: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

86/85

X1

X2

X3

X4

X5

What is this graph called?

The factorization entailed by theabove graph isP (X3)P (X5|X3)P (X1|X3, X5)P (X2|X1, X3, X5)P (X4|X1, X2, X3, X5)

which is just chain rule of probabilitywhich holds for any distribution

Consider a set of random variablesX1, X2, X3, X4, X5

There are many joint distributionspossible

Each may entail differentindependence relations

For example, in some cases L couldbe independent of S; in some not.

Can you think of a G which will bean I-Map for any distribution overP?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 325: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

86/85

X1

X2

X3

X4

X5

What is this graph called?

The factorization entailed by theabove graph isP (X3)P (X5|X3)P (X1|X3, X5)P (X2|X1, X3, X5)P (X4|X1, X2, X3, X5)

which is just chain rule of probabilitywhich holds for any distribution

Consider a set of random variablesX1, X2, X3, X4, X5

There are many joint distributionspossible

Each may entail differentindependence relations

For example, in some cases L couldbe independent of S; in some not.

Can you think of a G which will bean I-Map for any distribution overP?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 326: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

86/85

X1

X2

X3

X4

X5

What is this graph called?

The factorization entailed by theabove graph isP (X3)P (X5|X3)P (X1|X3, X5)P (X2|X1, X3, X5)P (X4|X1, X2, X3, X5)

which is just chain rule of probabilitywhich holds for any distribution

Consider a set of random variablesX1, X2, X3, X4, X5

There are many joint distributionspossible

Each may entail differentindependence relations

For example, in some cases L couldbe independent of S; in some not.

Can you think of a G which will bean I-Map for any distribution overP?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 327: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

86/85

X1

X2

X3

X4

X5

What is this graph called?

The factorization entailed by theabove graph isP (X3)P (X5|X3)P (X1|X3, X5)P (X2|X1, X3, X5)P (X4|X1, X2, X3, X5)

which is just chain rule of probabilitywhich holds for any distribution

Consider a set of random variablesX1, X2, X3, X4, X5

There are many joint distributionspossible

Each may entail differentindependence relations

For example, in some cases L couldbe independent of S; in some not.

Can you think of a G which will bean I-Map for any distribution overP?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 328: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

86/85

X1

X2

X3

X4

X5

What is this graph called?

The factorization entailed by theabove graph isP (X3)P (X5|X3)P (X1|X3, X5)P (X2|X1, X3, X5)P (X4|X1, X2, X3, X5)

which is just chain rule of probabilitywhich holds for any distribution

Consider a set of random variablesX1, X2, X3, X4, X5

There are many joint distributionspossible

Each may entail differentindependence relations

For example, in some cases L couldbe independent of S; in some not.

Can you think of a G which will bean I-Map for any distribution overP?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 329: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

86/85

X1

X2

X3

X4

X5

What is this graph called?

The factorization entailed by theabove graph isP (X3)P (X5|X3)P (X1|X3, X5)P (X2|X1, X3, X5)P (X4|X1, X2, X3, X5)

which is just chain rule of probabilitywhich holds for any distribution

Consider a set of random variablesX1, X2, X3, X4, X5

There are many joint distributionspossible

Each may entail differentindependence relations

For example, in some cases L couldbe independent of S; in some not.

Can you think of a G which will bean I-Map for any distribution overP?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 330: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

86/85

X1

X2

X3

X4

X5

Answer: A complete graph

The factorization entailed by theabove graph isP (X3)P (X5|X3)P (X1|X3, X5)P (X2|X1, X3, X5)P (X4|X1, X2, X3, X5)

which is just chain rule of probabilitywhich holds for any distribution

Consider a set of random variablesX1, X2, X3, X4, X5

There are many joint distributionspossible

Each may entail differentindependence relations

For example, in some cases L couldbe independent of S; in some not.

Can you think of a G which will bean I-Map for any distribution overP?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 331: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

86/85

X1

X2

X3

X4

X5

Answer: A complete graph

The factorization entailed by theabove graph isP (X3)P (X5|X3)P (X1|X3, X5)P (X2|X1, X3, X5)P (X4|X1, X2, X3, X5)

which is just chain rule of probabilitywhich holds for any distribution

Consider a set of random variablesX1, X2, X3, X4, X5

There are many joint distributionspossible

Each may entail differentindependence relations

For example, in some cases L couldbe independent of S; in some not.

Can you think of a G which will bean I-Map for any distribution overP?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17

Page 332: CS7015 (Deep Learning) : Lecture 17€¦ · Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17. 6/85 Grades A B G C Height Short Tall H Age Young A Adult Random Variable (intuition)

86/85

X1

X2

X3

X4

X5

Answer: A complete graph

The factorization entailed by theabove graph isP (X3)P (X5|X3)P (X1|X3, X5)P (X2|X1, X3, X5)P (X4|X1, X2, X3, X5)

which is just chain rule of probabilitywhich holds for any distribution

Consider a set of random variablesX1, X2, X3, X4, X5

There are many joint distributionspossible

Each may entail differentindependence relations

For example, in some cases L couldbe independent of S; in some not.

Can you think of a G which will bean I-Map for any distribution overP?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 17