on distributing probabilistic inference

On Distributing On Distributing Probabilistic InferenceProbabilistic Inference

Metron, Inc.

Thor Whalen

OutlineOutline

• Inference and distributed inference• Conditional independence (CI)• Graphical models, CI, and separability• Using CI: Sufficient information

OutlineOutline

• Inference and distributed inference– Probabilistic inference– Distributed probabilistic inference– Use of distributed probabilistic inference

• Conditional independence (CI)• Graphical models, CI, and separability• Using CI: Sufficient information

Probabilistic InferenceProbabilistic Inference

• “(Probabilistic) inference, or model evaluation, is the process of updating probabilities of outcomes based upon the relationships in the model and the evidence known about the situation at hand.” (research.microsoft.com)

• Let V = {X1, ..., Xn} be the set of variables of interest• We are given a (prior) probability space P (V) on V

• Bayesian Inference: Given some evidence e, we compute the posterior probability space P ‘=P (V|e) Evidence

• A probability space on a finite set V of discrete variables can be represented as a table containing the probability of every combination of states of V• Such a table is commonly called a “potential”• A receives evidence → have P(e|A) → want P(A,B,C|e)

0.2240

0.0960

0.0120

0.0080

0.0540

0.1260

0.2880

0.1920

0.9000

0.3000

60.2240

0.0960

0.0120

0.0080

0.0540

0.1260

0.2880

0.1920

0.2240

0.0960

0.0120

0.0080

0.0540

0.1260

0.2880

0.1920

0.9000

0.3000

0.2016

0.0864

0.0036

0.0024

0.0162

0.0378

0.2592

0.1728

0.9000

0.3000

0.9000

0.2585

0.1108

0.0046

0.0031

0.0208

0.0485

0.3323

0.2215

• A probability space on a finite set V of discrete variables can be represented as a table containing the probability of every combination of states of V• Such a table is commonly called a “potential”• A receives evidence → have P(e|A) → want P(A,B,C|e)• Assuming evidence e only depends on variable A, we have P(A,B,C|e) = P(A,B,C)P(e|A)

Distributed Probabilistic InferenceDistributed Probabilistic Inference• Several components, each inferring on a given subspace• Two components may communicate information about variables they have in common• Wish to be able to fuse evidence received throughout the system

Evidence

Evidence Probability

Use of Distributed Probabilistic InferenceUse of Distributed Probabilistic Inference

(assuming one million operations per second)

• Be able to implement probabilistic inference• Implement multi-agent systems:

- agents sense their environment and take actions intelligently- they can observe given variables of the probability space- they need to infer on other variables in order to take action- cooperate in exchanging information about the probability space

Use of Distributed Probabilistic InferenceUse of Distributed Probabilistic Inference• Be able to implement probabilistic inference• Implement multi-agent systems:

- agents sense their environment and take actions intelligently- they can observe given variables of the probability space- they need to infer on other variables in order to take action- cooperate in exchanging information about the probability space

Agent 1 Agent 2

OutlineOutline

Conditional independenceConditional independence

0.0120 + 0.0080

• Marginalize out “A” from P(A,B,C) to get P(A,B)

• P(C|A,B) = P(A,B,C)/P(A,B)

• P(C|A,B) = P(C|B)

• P(C|A,B) = P(A,B,C)/P(A,B)

• P(C|A,B) = P(C|B)

Wait a minute Dr. Watson!

P(C|A,B) is a table of size 8, whereas P(C|B) is of size 4!

Eh, but I see only four distinct numbers

here, doc...

... and “entropicaly” is not a word!

Why you little!!This is entropicaly

impossible!

• P(C|A,B) = P(A,B,C)/P(A,B)

• P(C|A,B) = P(C|B)i.e.

( , , ) ,

( | , )

a b c A B C

P C c A a B b

P C c B b

We say that A and C are conditionally independent given B

Insensative to A

OutlineOutline

• Inference and distributed inference• Conditional independence (CI)• Graphical models, CI, and separability

– Bayes nets– Markov networks

• Using CI: Sufficient information

Bayes NetsBayes Nets

Note that

P(A,B,C,D,E) = P(B|A)P(C|A,B)P(D|A,B,C)P(E|A,B,C,D)P(A)

P(B) P(C|A) P(D|B,C) P(E|C)P(A)

P(B|A)P(C|A,B)P(D|A,B,C)P(E|A,B,C,D)P(A)

Bayes NetsBayes Nets• A Bayes Net is a representation of

a probability distribution P(V) on a set V=X1, ..., Xn of variables

• A BN consists of – A Directed Acyclic Graph (DAG)

• Nodes: Variables of V

• Edges: Causal relations

A DAG is a directed graph with no directed cycles

The above directed graph is a DAG

Now this graph IS NOT a DAG because it has a directed cycle

Directed cycle

X1 X2 X3

X9 X10

X11 X12 X13

Bayes NetBayes Net• A Bayes Net is a representation of

• Nodes: Variables of V

– A list of conditional probability distributions (CPDs); one for every node of the DAG

X1 X2 X3

X9 X10

X11 X12 X13

Etc...

- i.e. P(A , C | B) = P(A | B) P(C | B)

• Nodes: variables of V• Edges: Causal relations

• The DAG exhibits particular (in)dependencies of P(V)

X1 X2 X3

X9 X10

X11 X12 X13

BA and are independent given

- i.e. P(C | A, B) = P(C | B)

We say that B separates A and C

• Nodes: variables of V

• The DAG characterizes the (in)dependency structure of P(V)

• The CPDs characterize the probabilistic and/or deterministic relations between parent states and children states

X1 X2 X3

X9 X10

X11 X12 X13

X3X1 X2

X11 X13

X12Evidence

Bayes NetsBayes Nets

• The prior distributions on the variables of parentless nodes, along with the CPDs of the BN, induce prior distribution—called “beliefs” in the literature—on all the variables

• If the system receives evidence on a variable: – this evidence impacts its belief,– along with the beliefs of all other

variables

X9 X10

Parentless nodes

Markov networksMarkov networks

• The edges of a Markov network exhibit

direct dependencies between variables• The absence of an edge means absence of direct dependency• If a set B of nodes separates the graph into several components

then these components are independent given B

OutlineOutline

– Specifications for a distributed inference system– A naive solution– Using separation

Using CI: Sufficient informationUsing CI: Sufficient information

Evidence variables

Query variables

Agent 1

Agent 2

Agent 3

Agent 4

Specifications

• A probability space

• A number of agents, each having

- query variables

- evidence variables

What variables must agent 1 represent so that it may fuse the evidence received by other agents?

Agent 1

E F G K L MH I J

Specifications

- query variables

Agent 2

Agent 3

Agent 4

A naïve solution

• Agents contain their own query and evidence variables

• In order to receive evidence from the other agents, agent 1 must represent variables E, F, G, H, I, J, K, L, and M

Agent 1

Agent 2 Agent 3 Agent 4

E F G K L MH I J

Agent 1

E F G K L MH I J

Specifications

- query variables

Agent 2

Agent 3

Agent 4

A naïve solution

separatesZ and X Y

Note that

whether Y is equal to:

• {K,L,M},

• {H,J,I}, or

• {E,F,G}.YY

Agent 1 must

represent many

variables!

How else could

the other agents

communicate their

evidence?

Likelihood given X and Z

of evidence on Y

Likelihood given Z

of evidence on Y

P(X,Z|eY)

separatesZ and X Y

X = {A,B}

Z = {C,D}

Y = {E,G,F}

P(Y|Z) = P(Y|X,Z)

→→

→ It is sufficient for agent 2 to send its posterior on Z

to agent 1 for the latter to compute its posterior on X

Agent 1

Agent 2

P(Y,Z|eY)

P(X,Z)

P(Z|eY)

P(Y,Z)

x P(X,Z)P(Z)-1

evidence eY

P(eY|Z)P(eY|X,Z)

P(X,Z|eY)

separatesZ and X Y

X = {A,B}

Z = {C,D}

Y = {E,G,F}

P(Y|X,Z) = P(Y|Z)

→→

Agent 1

Agent 2

P(Y,Z|eY)

P(Z|eY)

P(eY)-1

P(Z)-1P(X,Z) P(Z|eY)

P(Z)-1= P(X,Z) P(Z,eY)

P(eY)-1= P(X,Z) P(eY|Z)

P(eY)-1= P(X,Z) P(eY|X,Z)

P(X,Z|eY)P(eY)-1= P(X,Z,eY) =

Because:

P(eY|Z)P(eY|X,Z)

E F G K L MH I J

Specifications

• A Bayes net

- query variables

A naïve solution

E F G K L MH I J

C D C D C D

Using separation

• Agent 1 only needs to represent two extra variables

• Agent 1 may compute its posterior queries faster from CD than from EFGHIJK

• Communication lines need to transmit two variables instead of three

E F G K L MH I JE F G K L MH I J

C D C D C DC D C D C D

Query variable

Evidence variable

Other variable

1,2 2,

Let and be repectively the evidence and querry sets of two different agents.

In order for the agent querying to fuse evidence received on by the other agent

we wish to find a sequence ,

S S 3 1,..., such that 2 jS k

1, 1, , 1 1, 1,( | , ) ( | ) and ( | , ) ( | )k k k k k k k k k k k k kP Q S E P Q S P S S E P Q S

MatLab ToolMatLab Tool

i + [Enter] to Initialize,

v + [Enter] to view all variables (even those containing no information),

e + [Enter] to enter evidence,

c + [Enter] to perform a inter-agent communication,

p + [Enter] to go to the previous step,

n + [Enter] to go to the next step,a + [Enter] to add a sensor,

r + [Enter] to remove a sensor,t + [Enter] to turn true marginals view ON,

m + [Enter] to turn discrepancy marking OFF,s + [Enter] to save to a file,

q + [Enter] to quit.

Enter Command:

• Insert evidence into given agents and propagate their impact inside the subnet

• Initiate communication between agents, followed by the propagation of new information

• View the marginal distributions of the different agents at every step

• Step forward and backward

• Save eye-friendly logs to a file

Main functionality

MatLab Tool: DisplayMatLab Tool: Display* Configuration 2: After evidence L(e|C) = (2,5) has been entered into subnet number 2

The TRUTH:------------------ ------------------ ------------------ ------------------ ------------------ ------------------| A | Values | | C | Values | | B | Values | | E | Values | | D | Values | | F | Values | |----------------| |----------------| |----------------| |----------------| |----------------| |----------------| | True | 0.2005 | | True | 0.1434 | | True | 0.4403 | | True | 0.5426 | | True | 0.2780 | | True | 0.2901 | | False | 0.7995 | | False | 0.8566 | | False | 0.5597 | | False | 0.4574 | | False | 0.7220 | | False | 0.7099 | ------------------ ------------------ ------------------ ------------------ ------------------ ------------------

SUBNET 1 (adjacent to subnets 2): Err(ACB) = 0.0527.~~~~ AD = 0.0704 / ~~~~ AD = 0.1072 / ~~~~ AD = 0.0493 / ------------------ ------------------ ------------------/ A / Values / / C / Values / / B / Values / | E | Values | | D | Values | | F | Values | /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ |----------------| |----------------| |----------------| / True / 0.3000 / / True / 0.2950 / / True / 0.5100 / | True | ###### | | True | ###### | | True | ###### | / False / 0.7000 / / False / 0.7050 / / False / 0.4900 / | False | ###### | | False | ###### | | False | ###### | ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ------------------ ------------------ ------------------

SUBNET 2 (adjacent to subnets 1, 3):------------------ ------------------ ------------------ ------------------ ------------------ ------------------| A | Values | | C | Values | | B | Values | | E | Values | | D | Values | | F | Values | |----------------| |----------------| |----------------| |----------------| |----------------| |----------------| | True | ###### | | True | 0.1434 | | True | 0.4403 | | True | 0.5426 | | True | 0.2780 | | True | ###### | | False | ###### | | False | 0.8566 | | False | 0.5597 | | False | 0.4574 | | False | 0.7220 | | False | ###### | ------------------ ------------------ ------------------ ------------------ ------------------ ------------------

SUBNET 3 (adjacent to subnets 2): Err(EDF) = 0.0169.------------------ ------------------ ------------------ ~~~~ AD = 0.0429 / ~~~~ AD = 0.0025 / ~~~~ AD = 0.0048 / | A | Values | | C | Values | | B | Values | / E / Values / / D / Values / / F / Values / |----------------| |----------------| |----------------| /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ | True | ###### | | True | ###### | | True | ###### | / True / 0.4820 / / True / 0.2745 / / True / 0.2969 / | False | ###### | | False | ###### | | False | ###### | / False / 0.5180 / / False / 0.7255 / / False / 0.7031 / ------------------ ------------------ ------------------ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~

Enter a command (enter h + [Enter] for help):

Indicates step number and last action that was taken

Shows the marginal distributions that would have been obtained by infering on the entire Bayes Net

Shows the marginal distributions of the variables represented in each subnet

Prompts for new action

Communication Graph ConsiderationsCommunication Graph Considerations

Agent 6 receives info from agent 1 through both agent 4 and 5.

How should subnet 6 deal with possible redundancy?

A communication graphOne solution (often adopted) would be to impose a tree structure to the communication graph

• When choosing the communication graph, one should take into consideration

- The quality of the possible communication lines- The processing speed of the agents- The importance of given queries

If this is the key decision-making agent

...then this communication graph is more appropriate… than this one

Problem SpecificationProblem SpecificationGiven: • A prob. space on V={X1, ..., Xn}• A number of agents, each having:

– Qi: a set of query variables– Ei: a set of evidence variables

Problem SpecificationProblem SpecificationGiven: • A BN on V={X1, ..., Xn}

• A number of agents, each having:– Qi: a set of query variables

– Ei: a set of evidence variables

Determine: • An agent communication graph

• A subset Si of V for each agent

• An inference protocol that specifies – How to fuse evidence and messages

received from other agents– The content of messages between

agents

• A communication graph:– Nodes represent agents– Edges represent communication lines

• Each agent i has:– Qi: a set of query variables

– Ei: a set of evidence variables

– Pi(Si): a probability space on a subset Si of V

– An inference protocol. This includes a specification of• What to do with received evidence or messages?

• What messages must be sent to other agents?

Query variables

Evidence variables

Distributed Inference DesignDistributed Inference Design

Given a set of pairs (Q1, E1), ..., (Qk, Ek), we wish to find an inference scheme so that,

given any evidence e = e1, ..., ek

(where ei is the set of evidence received by subnet i), the agents may compute the correct posterior on their

query variables,

i.e. for all i, Pi (the probability space of agent i) must become consistent with P on its query variables

i.e. agent i must be able to compute, for all query variable Q of Qi, the probability Pi(Q|e) = P (Q|e)

Distributed Bayesian Inference ProblemDistributed Bayesian Inference Problem

More DefinitionsMore Definitions

Let X, Y and Z be subsets of V:

• If P is a prob. space on V, I(X,Y|Z)P is the statement

“X is independent of Y given Z,” i.e. P (X|Y,Z) = P (X|Z)

• If D is a DAG, I(X,Y|Z)D is the statement

“X and Y are d-separated by Z”

• If G a graph, I(X,Y|Z)G is the statement “X and Y are disconnected by Z”

• Theorem: If D is a Bayes Net for P and G is the moral

graph of the ancestor hull of XUYUZ, then

I(X,Y|Z)G ↔ I(X,Y|Z)D → I(X,Y|Z)P

Use of Conditional Independence Use of Conditional Independence in the Distributed Inference Problemin the Distributed Inference Problem

• What should S1 send to S2 so that Q2 so may “feel” the effect of evidence received by S1 on E1?

• We want S2 to be able to update its probability space so that P2(Q2 | e1) = P (Q2 | e1)

• Claim: If I(E1,Q2|A,B)P then P1(A,B|e1) = is sufficient information for S2 to update its probability space

• “Proof”: P (Q2 | E1,A,B) = P (Q2 | A,B)

Given a set of pairs (Q1, E1), ..., (Qk, Ek), we wish to find an inference scheme so that,

given any evidence e = e1, ..., ek

where ei is the set of evidence received by subnet i, the subnets may compute the correct posterior on their

query variables,

i.e. the Pi must become consistent with P on their query variables

i.e. subnet i must be able to compute, for all Q of Qi, the probability Pi(Q|e) = P (Q|e)

Distributed Bayesian Inference ProblemDistributed Bayesian Inference Problem

Distributed Bayesian Inference: Distributed Bayesian Inference: Inference ProtocolInference Protocol

• A message between two subnets is a joint distribution on a common subset of variables, computed from the probability space of the sender

• Subnets remember the last message that each subnet sent to it

• A subnet divides the new message by the old one and absorbs the result into its probability space

P(X,Z|eY)

Sufficient informationSufficient information

separatesZ and X Y

X = {A,B}

Z = {C,D}

Y = {E,G,F}

P(Y|Z) = P(Y|X,Z)

→→

of evidence on Y

Likelihood given Z

of evidence on Y=

Agent 1

Agent 2

P(Y,Z|eY)

P(X,Z)

P(Z|eY)

P(Y,Z)

x P(X,Z)P(Z)-1

evidence eY

P(X,Z|eY)

Sufficient informationSufficient information

separatesZ and X Y

X = {A,B}

Z = {C,D}

Y = {E,G,F}

P(Y|X,Z) = P(Y|Z)

→→

of evidence on Y

Likelihood given Z

of evidence on Y=

Agent 1

Agent 2

P(Y,Z|eY)

P(Z|eY)

P(eY)-1

P(Z)-1P(X,Z) P(Z|eY)

P(Z)-1= P(X,Z) P(Z,eY)

P(eY)-1= P(X,Z) P(eY|Z)

P(eY)-1= P(X,Z) P(eY|X,Z)

P(X,Z|eY)P(eY)-1= P(X,Z,eY) =

Because:

P(eY|Z)P(eY|X,Z)

• In a tree communication graph every edge is the only communication line

between two parts of the network• Hence it must deliver enough information so that the evidence received in

one part may convey its impact to the query variables of the other part• We restrict ourselves to the case where every node represented by an agent can be queried or receive evidence• In this case it is sufficient that the set of variables Z, that will be represented in any communication line, separates the set X of variables of one side of the network from the set Y of variables of the other side

on distributing probabilistic inference

Documents

probabilistic inference lecture 5

probabilistic inference lecture 3

probabilistic inference lecture 6 – part 2

doc493: data analysis and probabilistic inference

meta-amortized variational inference and...

probabilistic inference in multi-agent systems

246 approximating probabilistic inference in bayesian belief...

bayesian inference for nasa probabilistic

stochastic digital circuits for probabilistic inference

probabilistic modeling & bayesian inference

complex inference in neural circuits with probabilistic...

fourier theoretic probabilistic inference over permutations

probabilistic inference lecture 7

1 probabilistic inference and...

tractable representations probabilistic inference learning...

probabilistic inference in graphical...

probabilistic lexical models for textual inference

co902 probabilistic and statistical inference · co902...

02 probabilistic inference in graphical models

accelerating machine learning inference with probabilistic...