on distributing probabilistic inference

Post on 05-Jan-2016

84 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

On Distributing Probabilistic Inference. Metron, Inc. Thor Whalen. Outline. Inference and distributed inference Conditional independence (CI) Graphical models, CI, and separability Using CI: Sufficient information. Outline. Inference and distributed inference Probabilistic inference - PowerPoint PPT Presentation

TRANSCRIPT

1

On Distributing On Distributing Probabilistic InferenceProbabilistic Inference

Metron, Inc.

Thor Whalen

2

OutlineOutline

• Inference and distributed inference• Conditional independence (CI)• Graphical models, CI, and separability• Using CI: Sufficient information

3

OutlineOutline

• Inference and distributed inference– Probabilistic inference– Distributed probabilistic inference– Use of distributed probabilistic inference

• Conditional independence (CI)• Graphical models, CI, and separability• Using CI: Sufficient information

4

Probabilistic InferenceProbabilistic Inference

• “(Probabilistic) inference, or model evaluation, is the process of updating probabilities of outcomes based upon the relationships in the model and the evidence known about the situation at hand.” (research.microsoft.com)

• Let V = {X1, ..., Xn} be the set of variables of interest• We are given a (prior) probability space P (V) on V

• Bayesian Inference: Given some evidence e, we compute the posterior probability space P ‘=P (V|e) Evidence

5

Probabilistic InferenceProbabilistic Inference

• A probability space on a finite set V of discrete variables can be represented as a table containing the probability of every combination of states of V• Such a table is commonly called a “potential”• A receives evidence → have P(e|A) → want P(A,B,C|e)

0.2240

0.0960

0.0120

0.0080

0.0540

0.1260

0.2880

0.1920

0.9000

0.3000

60.2240

0.0960

0.0120

0.0080

0.0540

0.1260

0.2880

0.1920

Probabilistic InferenceProbabilistic Inference

0.2240

0.0960

0.0120

0.0080

0.0540

0.1260

0.2880

0.1920

0.9000

0.3000

0.2016

0.0864

0.0036

0.0024

0.0162

0.0378

0.2592

0.1728

0.9000

0.9000

0.3000

0.3000

0.3000

0.3000

0.9000

0.9000

0.2585

0.1108

0.0046

0.0031

0.0208

0.0485

0.3323

0.2215

• A probability space on a finite set V of discrete variables can be represented as a table containing the probability of every combination of states of V• Such a table is commonly called a “potential”• A receives evidence → have P(e|A) → want P(A,B,C|e)• Assuming evidence e only depends on variable A, we have P(A,B,C|e) = P(A,B,C)P(e|A)

7

Distributed Probabilistic InferenceDistributed Probabilistic Inference• Several components, each inferring on a given subspace• Two components may communicate information about variables they have in common• Wish to be able to fuse evidence received throughout the system

Evidence

Evidence Probability

space

same

same

A A

B B

B B

C C

8

Use of Distributed Probabilistic InferenceUse of Distributed Probabilistic Inference

n

(assuming one million operations per second)

• Be able to implement probabilistic inference• Implement multi-agent systems:

- agents sense their environment and take actions intelligently- they can observe given variables of the probability space- they need to infer on other variables in order to take action- cooperate in exchanging information about the probability space

9

Use of Distributed Probabilistic InferenceUse of Distributed Probabilistic Inference• Be able to implement probabilistic inference• Implement multi-agent systems:

- agents sense their environment and take actions intelligently- they can observe given variables of the probability space- they need to infer on other variables in order to take action- cooperate in exchanging information about the probability space

Agent 1 Agent 2

10

OutlineOutline

• Inference and distributed inference• Conditional independence (CI)• Graphical models, CI, and separability• Using CI: Sufficient information

11

Conditional independenceConditional independence

0.0120 + 0.0080

• Marginalize out “A” from P(A,B,C) to get P(A,B)

• P(C|A,B) = P(A,B,C)/P(A,B)

• P(C|A,B) = P(C|B)

12

• Marginalize out “A” from P(A,B,C) to get P(A,B)

Conditional independenceConditional independence

• P(C|A,B) = P(A,B,C)/P(A,B)

• P(C|A,B) = P(C|B)

Wait a minute Dr. Watson!

P(C|A,B) is a table of size 8, whereas P(C|B) is of size 4!

Eh, but I see only four distinct numbers

here, doc...

... and “entropicaly” is not a word!

Why you little!!This is entropicaly

impossible!

13

Conditional independenceConditional independence

• Marginalize out “A” from P(A,B,C) to get P(A,B)

• P(C|A,B) = P(A,B,C)/P(A,B)

• P(C|A,B) = P(C|B)i.e.

( , , ) ,

( | , )

( | )

a b c A B C

P C c A a B b

P C c B b

We say that A and C are conditionally independent given B

Insensative to A

14

OutlineOutline

• Inference and distributed inference• Conditional independence (CI)• Graphical models, CI, and separability

– Bayes nets– Markov networks

• Using CI: Sufficient information

15

Bayes NetsBayes Nets

Note that

P(A,B,C,D,E) = P(B|A)P(C|A,B)P(D|A,B,C)P(E|A,B,C,D)P(A)

P(B) P(C|A) P(D|B,C) P(E|C)P(A)

P(B|A)P(C|A,B)P(D|A,B,C)P(E|A,B,C,D)P(A)

A B

DC

E

16

Bayes NetsBayes Nets• A Bayes Net is a representation of

a probability distribution P(V) on a set V=X1, ..., Xn of variables

17

Bayes NetsBayes Nets• A Bayes Net is a representation of

a probability distribution P(V) on a set V=X1, ..., Xn of variables

• A BN consists of – A Directed Acyclic Graph (DAG)

• Nodes: Variables of V

• Edges: Causal relations

A DAG is a directed graph with no directed cycles

The above directed graph is a DAG

Now this graph IS NOT a DAG because it has a directed cycle

Directed cycle

X1 X2 X3

X4 X5

X6

X7X8

X9 X10

X11 X12 X13

18

Bayes NetBayes Net• A Bayes Net is a representation of

a probability distribution P(V) on a set V=X1, ..., Xn of variables

• A BN consists of – A Directed Acyclic Graph (DAG)

• Nodes: Variables of V

• Edges: Causal relations

– A list of conditional probability distributions (CPDs); one for every node of the DAG

X1 X2 X3

X4 X5

X6

X7X8

X9 X10

X11 X12 X13

Etc...

19

- i.e. P(A , C | B) = P(A | B) P(C | B)

Bayes NetsBayes Nets• A Bayes Net is a representation of

a probability distribution P(V) on a set V=X1, ..., Xn of variables

• A BN consists of – A Directed Acyclic Graph (DAG)

• Nodes: variables of V• Edges: Causal relations

– A list of conditional probability distributions (CPDs); one for every node of the DAG

• The DAG exhibits particular (in)dependencies of P(V)

X1 X2 X3

X4 X5

X6

X7X8

X9 X10

X11 X12 X13

A C

BA and are independent given

B

C

- i.e. P(C | A, B) = P(C | B)

We say that B separates A and C

20

Bayes NetsBayes Nets• A Bayes Net is a representation of

a probability distribution P(V) on a set V=X1, ..., Xn of variables

• A BN consists of – A Directed Acyclic Graph (DAG)

• Nodes: variables of V

• Edges: Causal relations

– A list of conditional probability distributions (CPDs); one for every node of the DAG

• The DAG characterizes the (in)dependency structure of P(V)

• The CPDs characterize the probabilistic and/or deterministic relations between parent states and children states

X1 X2 X3

X4 X5

X6

X7X8

X9 X10

X11 X12 X13

21

X7

X3X1 X2

X5

X8

X4

X11 X13

X6

X7

X12Evidence

Bayes NetsBayes Nets

• The prior distributions on the variables of parentless nodes, along with the CPDs of the BN, induce prior distribution—called “beliefs” in the literature—on all the variables

• If the system receives evidence on a variable: – this evidence impacts its belief,– along with the beliefs of all other

variables

X9 X10

Parentless nodes

22

Markov networksMarkov networks

• The edges of a Markov network exhibit

direct dependencies between variables• The absence of an edge means absence of direct dependency• If a set B of nodes separates the graph into several components

then these components are independent given B

23

OutlineOutline

• Inference and distributed inference• Conditional independence (CI)• Graphical models, CI, and separability• Using CI: Sufficient information

– Specifications for a distributed inference system– A naive solution– Using separation

24

Using CI: Sufficient informationUsing CI: Sufficient information

B

CD

A

E

F

GH

I

JK

L

M

Evidence variables

Query variables

Agent 1

Agent 2

Agent 3

Agent 4

Specifications

• A probability space

• A number of agents, each having

- query variables

- evidence variables

What variables must agent 1 represent so that it may fuse the evidence received by other agents?

25

Using CI: Sufficient informationUsing CI: Sufficient information

B

CD

A

E

F

GH

I

JK

L

M

Agent 1

E F G

H I J

K L M

E F G

H I J

K L M

A B

E F G K L MH I J

Specifications

• A probability space

• A number of agents, each having

- query variables

- evidence variables

Agent 2

Agent 3

Agent 4

A naïve solution

• Agents contain their own query and evidence variables

• In order to receive evidence from the other agents, agent 1 must represent variables E, F, G, H, I, J, K, L, and M

Agent 1

Agent 2 Agent 3 Agent 4

26

E F G K L MH I J

Using CI: Sufficient informationUsing CI: Sufficient information

B

CD

A

E

F

GH

I

JK

L

M

Agent 1

E F G K L MH I J

A B

Specifications

• A probability space

• A number of agents, each having

- query variables

- evidence variables

Agent 2

Agent 3

Agent 4

A naïve solution

• Agents contain their own query and evidence variables

• In order to receive evidence from the other agents, agent 1 must represent variables E, F, G, H, I, J, K, L, and M

Y

ZX

separatesZ and X Y

Note that

whether Y is equal to:

• {K,L,M},

• {H,J,I}, or

• {E,F,G}.YY

Agent 1 must

represent many

variables!

How else could

the other agents

communicate their

evidence?

27

Likelihood given X and Z

of evidence on Y

Likelihood given Z

of evidence on Y

P(X,Z|eY)

Using CI: Sufficient informationUsing CI: Sufficient information

B

C D

A

E

F

G

separatesZ and X Y

Z

X

Y

X = {A,B}

Z = {C,D}

Z = {C,D}

Y = {E,G,F}

P(Y|Z) = P(Y|X,Z)

→→

=

→ It is sufficient for agent 2 to send its posterior on Z

to agent 1 for the latter to compute its posterior on X

Agent 1

Agent 2

P(Y,Z|eY)

P(X,Z)

P(Z|eY)

P(Y,Z)

ΣY

x P(X,Z)P(Z)-1

evidence eY

P(eY|Z)P(eY|X,Z)

28

P(X,Z|eY)

Using CI: Sufficient informationUsing CI: Sufficient information

B

C D

A

E

F

G

separatesZ and X Y

Z

X

Y

X = {A,B}

Z = {C,D}

Z = {C,D}

Y = {E,G,F}

P(Y|X,Z) = P(Y|Z)

→→

=

→ It is sufficient for agent 2 to send its posterior on Z

to agent 1 for the latter to compute its posterior on X

Agent 1

Agent 2

P(Y,Z|eY)

P(Z|eY)

P(eY)-1

P(Z)-1P(X,Z) P(Z|eY)

P(Z)-1= P(X,Z) P(Z,eY)

P(eY)-1= P(X,Z) P(eY|Z)

P(eY)-1= P(X,Z) P(eY|X,Z)

P(X,Z|eY)P(eY)-1= P(X,Z,eY) =

Because:

P(eY|Z)P(eY|X,Z)

29

E F G K L MH I J

Using CI: Sufficient informationUsing CI: Sufficient information

E F G K L MH I J

A B

Specifications

• A Bayes net

• A number of agents, each having

- query variables

- evidence variables

A naïve solution

• Agents contain their own query and evidence variables

• In order to receive evidence from the other agents, agent 1 must represent variables E, F, G, H, I, J, K, L, and M

C D

E F G K L MH I J

A B

C D C D C D

Using separation

• Agent 1 only needs to represent two extra variables

• Agent 1 may compute its posterior queries faster from CD than from EFGHIJK

• Communication lines need to transmit two variables instead of three

E F G K L MH I JE F G K L MH I J

C D C D C DC D C D C D

30

Using CI: Sufficient informationUsing CI: Sufficient information

Query variable

Evidence variable

Other variable

1,2 2,

Let and be repectively the evidence and querry sets of two different agents.

In order for the agent querying to fuse evidence received on by the other agent

we wish to find a sequence ,

E Q

Q E

S S 3 1,..., such that 2 jS k

1, 1, , 1 1, 1,( | , ) ( | ) and ( | , ) ( | )k k k k k k k k k k k k kP Q S E P Q S P S S E P Q S

31

32

MatLab ToolMatLab Tool

i + [Enter] to Initialize,

v + [Enter] to view all variables (even those containing no information),

e + [Enter] to enter evidence,

c + [Enter] to perform a inter-agent communication,

p + [Enter] to go to the previous step,

n + [Enter] to go to the next step,a + [Enter] to add a sensor,

r + [Enter] to remove a sensor,t + [Enter] to turn true marginals view ON,

m + [Enter] to turn discrepancy marking OFF,s + [Enter] to save to a file,

q + [Enter] to quit.

Enter Command:

• Insert evidence into given agents and propagate their impact inside the subnet

• Initiate communication between agents, followed by the propagation of new information

• View the marginal distributions of the different agents at every step

• Step forward and backward

• Save eye-friendly logs to a file

Main functionality

33

MatLab Tool: DisplayMatLab Tool: Display* Configuration 2: After evidence L(e|C) = (2,5) has been entered into subnet number 2

The TRUTH:------------------ ------------------ ------------------ ------------------ ------------------ ------------------| A | Values | | C | Values | | B | Values | | E | Values | | D | Values | | F | Values | |----------------| |----------------| |----------------| |----------------| |----------------| |----------------| | True | 0.2005 | | True | 0.1434 | | True | 0.4403 | | True | 0.5426 | | True | 0.2780 | | True | 0.2901 | | False | 0.7995 | | False | 0.8566 | | False | 0.5597 | | False | 0.4574 | | False | 0.7220 | | False | 0.7099 | ------------------ ------------------ ------------------ ------------------ ------------------ ------------------

SUBNET 1 (adjacent to subnets 2): Err(ACB) = 0.0527.~~~~ AD = 0.0704 / ~~~~ AD = 0.1072 / ~~~~ AD = 0.0493 / ------------------ ------------------ ------------------/ A / Values / / C / Values / / B / Values / | E | Values | | D | Values | | F | Values | /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ |----------------| |----------------| |----------------| / True / 0.3000 / / True / 0.2950 / / True / 0.5100 / | True | ###### | | True | ###### | | True | ###### | / False / 0.7000 / / False / 0.7050 / / False / 0.4900 / | False | ###### | | False | ###### | | False | ###### | ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ------------------ ------------------ ------------------

SUBNET 2 (adjacent to subnets 1, 3):------------------ ------------------ ------------------ ------------------ ------------------ ------------------| A | Values | | C | Values | | B | Values | | E | Values | | D | Values | | F | Values | |----------------| |----------------| |----------------| |----------------| |----------------| |----------------| | True | ###### | | True | 0.1434 | | True | 0.4403 | | True | 0.5426 | | True | 0.2780 | | True | ###### | | False | ###### | | False | 0.8566 | | False | 0.5597 | | False | 0.4574 | | False | 0.7220 | | False | ###### | ------------------ ------------------ ------------------ ------------------ ------------------ ------------------

SUBNET 3 (adjacent to subnets 2): Err(EDF) = 0.0169.------------------ ------------------ ------------------ ~~~~ AD = 0.0429 / ~~~~ AD = 0.0025 / ~~~~ AD = 0.0048 / | A | Values | | C | Values | | B | Values | / E / Values / / D / Values / / F / Values / |----------------| |----------------| |----------------| /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ | True | ###### | | True | ###### | | True | ###### | / True / 0.4820 / / True / 0.2745 / / True / 0.2969 / | False | ###### | | False | ###### | | False | ###### | / False / 0.5180 / / False / 0.7255 / / False / 0.7031 / ------------------ ------------------ ------------------ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~

Enter a command (enter h + [Enter] for help):

Indicates step number and last action that was taken

Shows the marginal distributions that would have been obtained by infering on the entire Bayes Net

Shows the marginal distributions of the variables represented in each subnet

Prompts for new action

34

Communication Graph ConsiderationsCommunication Graph Considerations

1

2

3

4

5

6

Agent 6 receives info from agent 1 through both agent 4 and 5.

How should subnet 6 deal with possible redundancy?

A communication graphOne solution (often adopted) would be to impose a tree structure to the communication graph

35

36

Communication Graph ConsiderationsCommunication Graph Considerations

• When choosing the communication graph, one should take into consideration

- The quality of the possible communication lines- The processing speed of the agents- The importance of given queries

If this is the key decision-making agent

...then this communication graph is more appropriate… than this one

37

Problem SpecificationProblem SpecificationGiven: • A prob. space on V={X1, ..., Xn}• A number of agents, each having:

– Qi: a set of query variables– Ei: a set of evidence variables

38

Problem SpecificationProblem SpecificationGiven: • A BN on V={X1, ..., Xn}

• A number of agents, each having:– Qi: a set of query variables

– Ei: a set of evidence variables

Determine: • An agent communication graph

• A subset Si of V for each agent

• An inference protocol that specifies – How to fuse evidence and messages

received from other agents– The content of messages between

agents

39

• A communication graph:– Nodes represent agents– Edges represent communication lines

• Each agent i has:– Qi: a set of query variables

– Ei: a set of evidence variables

– Pi(Si): a probability space on a subset Si of V

– An inference protocol. This includes a specification of• What to do with received evidence or messages?

• What messages must be sent to other agents?

Query variables

Evidence variables

Distributed Inference DesignDistributed Inference Design

40

Given a set of pairs (Q1, E1), ..., (Qk, Ek), we wish to find an inference scheme so that,

given any evidence e = e1, ..., ek

(where ei is the set of evidence received by subnet i), the agents may compute the correct posterior on their

query variables,

i.e. for all i, Pi (the probability space of agent i) must become consistent with P on its query variables

i.e. agent i must be able to compute, for all query variable Q of Qi, the probability Pi(Q|e) = P (Q|e)

Distributed Bayesian Inference ProblemDistributed Bayesian Inference Problem

41

More DefinitionsMore Definitions

Let X, Y and Z be subsets of V:

• If P is a prob. space on V, I(X,Y|Z)P is the statement

“X is independent of Y given Z,” i.e. P (X|Y,Z) = P (X|Z)

• If D is a DAG, I(X,Y|Z)D is the statement

“X and Y are d-separated by Z”

• If G a graph, I(X,Y|Z)G is the statement “X and Y are disconnected by Z”

• Theorem: If D is a Bayes Net for P and G is the moral

graph of the ancestor hull of XUYUZ, then

I(X,Y|Z)G ↔ I(X,Y|Z)D → I(X,Y|Z)P

42

B

A

Use of Conditional Independence Use of Conditional Independence in the Distributed Inference Problemin the Distributed Inference Problem

E1

Q2

• What should S1 send to S2 so that Q2 so may “feel” the effect of evidence received by S1 on E1?

• We want S2 to be able to update its probability space so that P2(Q2 | e1) = P (Q2 | e1)

• Claim: If I(E1,Q2|A,B)P then P1(A,B|e1) = is sufficient information for S2 to update its probability space

• “Proof”: P (Q2 | E1,A,B) = P (Q2 | A,B)

S1 S2

43

Using CI: Sufficient informationUsing CI: Sufficient information

44

Given a set of pairs (Q1, E1), ..., (Qk, Ek), we wish to find an inference scheme so that,

given any evidence e = e1, ..., ek

where ei is the set of evidence received by subnet i, the subnets may compute the correct posterior on their

query variables,

i.e. the Pi must become consistent with P on their query variables

i.e. subnet i must be able to compute, for all Q of Qi, the probability Pi(Q|e) = P (Q|e)

Distributed Bayesian Inference ProblemDistributed Bayesian Inference Problem

45

Distributed Bayesian Inference: Distributed Bayesian Inference: Inference ProtocolInference Protocol

• A message between two subnets is a joint distribution on a common subset of variables, computed from the probability space of the sender

• Subnets remember the last message that each subnet sent to it

• A subnet divides the new message by the old one and absorbs the result into its probability space

46

P(X,Z|eY)

Sufficient informationSufficient information

B

C D

A

E

F

G

separatesZ and X Y

Z

X

Y

X = {A,B}

Z = {C,D}

Z = {C,D}

Y = {E,G,F}

P(Y|Z) = P(Y|X,Z)

→→

Likelihood given X and Z

of evidence on Y

Likelihood given Z

of evidence on Y=

→ It is sufficient for agent 2 to send its posterior on Z

to agent 1 for the latter to compute its posterior on X

Agent 1

Agent 2

P(Y,Z|eY)

P(X,Z)

P(Z|eY)

P(Y,Z)

ΣY

x P(X,Z)P(Z)-1

evidence eY

47

P(X,Z|eY)

Sufficient informationSufficient information

B

C D

A

E

F

G

separatesZ and X Y

Z

X

Y

X = {A,B}

Z = {C,D}

Z = {C,D}

Y = {E,G,F}

P(Y|X,Z) = P(Y|Z)

→→

Likelihood given X and Z

of evidence on Y

Likelihood given Z

of evidence on Y=

→ It is sufficient for agent 2 to send its posterior on Z

to agent 1 for the latter to compute its posterior on X

Agent 1

Agent 2

P(Y,Z|eY)

P(Z|eY)

P(eY)-1

P(Z)-1P(X,Z) P(Z|eY)

P(Z)-1= P(X,Z) P(Z,eY)

P(eY)-1= P(X,Z) P(eY|Z)

P(eY)-1= P(X,Z) P(eY|X,Z)

P(X,Z|eY)P(eY)-1= P(X,Z,eY) =

Because:

P(eY|Z)P(eY|X,Z)

48

• In a tree communication graph every edge is the only communication line

between two parts of the network• Hence it must deliver enough information so that the evidence received in

one part may convey its impact to the query variables of the other part• We restrict ourselves to the case where every node represented by an agent can be queried or receive evidence• In this case it is sufficient that the set of variables Z, that will be represented in any communication line, separates the set X of variables of one side of the network from the set Y of variables of the other side

Communication Graph ConsiderationsCommunication Graph Considerations

Z

YX

top related