learning distributions and hypothesis testing via social ... · et al., kashyap et al. lavaei and...

102
UMich EECS 2015 1 / 48 Learning distributions and hypothesis testing via social learning Anand D. Sarwate Department of Electrical and Computer Engineering Rutgers, The State University of New Jersey September 29, 2015 (Joint work with Tara Javidi and Anusha Lalitha (UCSD)) Work sponsored by NSF under award CCF-1440033 Rutgers Sarwate

Upload: others

Post on 06-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 1 / 48

Learning distributions and hypothesis testing viasocial learning

Anand D. Sarwate

Department of Electrical and Computer EngineeringRutgers, The State University of New Jersey

September 29, 2015

(Joint work with Tara Javidi and Anusha Lalitha (UCSD))Work sponsored by NSF under award CCF-1440033

Rutgers Sarwate

Page 2: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 2 / 48

Introduction

Rutgers Sarwate

Page 3: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 3 / 48

Some philosophical questions

• How we (as a network of social agents) make common choices orinferences about the world?

• If I want to help you learn, should I tell you my evidence or justmy opinion?

• How much do we need to communicate with each other?

Rutgers Sarwate

Page 4: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 4 / 48

Which may have some applications (?)

• Distributed monitoring in networks (estimating a state).

• Hypothesis testing or detection using multi-modal sensors.

• Models for vocabulary evolution.

• Social learning in animals.

Rutgers Sarwate

Page 5: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 5 / 48

Estimation

First simple model: estimate a histogram of local data.

• Each agent starts with a single color.

• Pass message to learn the histogram of initial colors or samplefrom that histogram.

• Main focus: simple protocols with limited communication.

Rutgers Sarwate

Page 6: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 6 / 48

Hypothesis testing

Second simple model: estimate a global parameter θ∗.

• Each agent takes observations over time conditioned on θ∗.

• Can do local updates followed by communication with neighbors.

• Main focus: simple rule and rate of convergence.

Rutgers Sarwate

Page 7: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 7 / 48

Social learning

✓1 ✓2

✓3

✓i✓4

Social learning focuses on simple models for how (human) networkscan form consensus opinions:• Consensus-based DeGroot model: gossip, average consensus etc.• Bayesian social learning (Acemoglu et al., Bala and Goyal):

agents make decisions and are observed by other agents.• Opinion dynamics where agents change beliefs based on beliefs of

nearby neighbors.Rutgers Sarwate

Page 8: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 8 / 48

On limited messages

Both of our problems involve some sort of average consensus step. Inthe first part we are interested in exchanging approximate messages.

• Lots of work in quantized consensus (Aysal-Coates-Rabbat, Carliet al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastavaand Nedic, Zhu and Martinez)

• Time-varying network topologies (even more references).

• Pretty mature area at this point.

Rutgers Sarwate

Page 9: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 8 / 48

On limited messages

Both of our problems involve some sort of average consensus step. Inthe first part we are interested in exchanging approximate messages.

• Lots of work in quantized consensus (Aysal-Coates-Rabbat, Carliet al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastavaand Nedic, Zhu and Martinez)

• Time-varying network topologies (even more references).

• Pretty mature area at this point.

Rutgers Sarwate

Page 10: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 8 / 48

On limited messages

Both of our problems involve some sort of average consensus step. Inthe first part we are interested in exchanging approximate messages.

• Lots of work in quantized consensus (Aysal-Coates-Rabbat, Carliet al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastavaand Nedic, Zhu and Martinez)

• Time-varying network topologies (even more references).

• Pretty mature area at this point.

Rutgers Sarwate

Page 11: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 8 / 48

On limited messages

Both of our problems involve some sort of average consensus step. Inthe first part we are interested in exchanging approximate messages.

• Lots of work in quantized consensus (Aysal-Coates-Rabbat, Carliet al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastavaand Nedic, Zhu and Martinez)

• Time-varying network topologies (even more references).

• Pretty mature area at this point.

Rutgers Sarwate

Page 12: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 9 / 48

A roadmap

1

2

3

W3,1

W1,2W1,3

W2,1

Wji

Wiji

j

• “Social sampling” and estimating histograms

• Distributed hypothesis testing and network divergence

• Some ongoing work and future ideas.

Rutgers Sarwate

Page 13: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 9 / 48

A roadmap

1

2

3

W3,1

W1,2W1,3

W2,1

Wji

Wiji

j

• “Social sampling” and estimating histograms

• Distributed hypothesis testing and network divergence

• Some ongoing work and future ideas.

Rutgers Sarwate

Page 14: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 9 / 48

A roadmap

1

2

3

W3,1

W1,2W1,3

W2,1

Wji

Wiji

j

• “Social sampling” and estimating histograms

• Distributed hypothesis testing and network divergence

• Some ongoing work and future ideas.

Rutgers Sarwate

Page 15: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 9 / 48

A roadmap

1

2

3

W3,1

W1,2W1,3

W2,1

Wji

Wiji

j

• “Social sampling” and estimating histograms

• Distributed hypothesis testing and network divergence

• Some ongoing work and future ideas.

Rutgers Sarwate

Page 16: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 10 / 48

Social sampling and merging opinions

A.D. Sarwate, T. Javidi, Distributed Learning of Distributions via Social Sampling, IEEETransactions on Automatic Control 60(1): pp. 34–45, January 2015.

Rutgers Sarwate

Page 17: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 11 / 48

Consensus and dynamics in networks

• Collection of individuals or agents

• Agents observe part of a global phenomenon

• Network of connections for communication

Rutgers Sarwate

Page 18: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 11 / 48

Consensus and dynamics in networks

✓1 ✓2

✓3

✓i✓4

• Collection of individuals or agents

• Agents observe part of a global phenomenon

• Network of connections for communication

Rutgers Sarwate

Page 19: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 11 / 48

Consensus and dynamics in networks

✓1 ✓2

✓3

✓i✓4

• Collection of individuals or agents

• Agents observe part of a global phenomenon

• Network of connections for communication

Rutgers Sarwate

Page 20: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 12 / 48

Phenomena vs. protocols

35

7

2

8

1

6 48

76

23

5

4

9

Engineering:

• Focus on algorithms

• Minimize communication cost

• How much do we lose vs.centralized?

Phenomenological:

• Focus on modeling

• Simple protocols

• What behaviors emerge?

Rutgers Sarwate

Page 21: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 12 / 48

Phenomena vs. protocols

35

7

2

8

1

6 48

76

23

5

4

9Engineering:

• Focus on algorithms

• Minimize communication cost

• How much do we lose vs.centralized?

Phenomenological:

• Focus on modeling

• Simple protocols

• What behaviors emerge?

Rutgers Sarwate

Page 22: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 12 / 48

Phenomena vs. protocols

35

7

2

8

1

6 48

76

23

5

4

9Engineering:

• Focus on algorithms

• Minimize communication cost

• How much do we lose vs.centralized?

Phenomenological:

• Focus on modeling

• Simple protocols

• What behaviors emerge?

Rutgers Sarwate

Page 23: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 13 / 48

Why simple protocols?

✓1 ✓2

✓3

✓i✓4

We are more interested in developing simple models that can exhibitdifferent phenomena.

• Simple source models.

• Simple communication that uses fewer resources.

• Simple update rules that are easier to analyze.

Rutgers Sarwate

Page 24: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 14 / 48

Communication and graph

✓1 ✓2

✓3

✓i✓4

• The n agents are arranged in a connected graph G.

• Agent i broadcasts to neighbors Ni in the graph.

• Message Yi(t) lies in a discrete set.

Rutgers Sarwate

Page 25: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 14 / 48

Communication and graph

✓1 ✓2

✓3

✓i✓4

• The n agents are arranged in a connected graph G.

• Agent i broadcasts to neighbors Ni in the graph.

• Message Yi(t) lies in a discrete set.

Rutgers Sarwate

Page 26: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 14 / 48

Communication and graph

Yi(t)

Yi(t)

Yi(t)

Yi(t)Yi(t)

• The n agents are arranged in a connected graph G.

• Agent i broadcasts to neighbors Ni in the graph.

• Message Yi(t) lies in a discrete set.

Rutgers Sarwate

Page 27: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 14 / 48

Communication and graph

Y1(t)

Y2(t)

Y3(t)

Y4(t)Y5(t)

• The n agents are arranged in a connected graph G.

• Agent i broadcasts to neighbors Ni in the graph.

• Message Yi(t) lies in a discrete set.

Rutgers Sarwate

Page 28: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 15 / 48

The problem

• Each agent starts with θi ∈ {1, 2, . . . ,M}• Agent i knows θi (no noise)

• Maintain estimates Qi(t) of the empirical distribution Π of {θi}

Rutgers Sarwate

Page 29: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 15 / 48

The problem

• Each agent starts with θi ∈ {1, 2, . . . ,M}

• Agent i knows θi (no noise)

• Maintain estimates Qi(t) of the empirical distribution Π of {θi}

Rutgers Sarwate

Page 30: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 15 / 48

The problem

• Each agent starts with θi ∈ {1, 2, . . . ,M}• Agent i knows θi (no noise)

• Maintain estimates Qi(t) of the empirical distribution Π of {θi}

Rutgers Sarwate

Page 31: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 15 / 48

The problem

• Each agent starts with θi ∈ {1, 2, . . . ,M}• Agent i knows θi (no noise)

• Maintain estimates Qi(t) of the empirical distribution Π of {θi}

Rutgers Sarwate

Page 32: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 16 / 48

Social sampling

We model the messages as random samples from local estimates.

1 Update rule from Qi(t− 1) to Qi(t) :

Qi(t) = Wi (Qi(t− 1), Xi(t), Yi(t− 1), {Yj(t− 1) : j ∈ Ni}, t) .

2 Build a sampling distribution on {0, 1, . . . ,M}:

Pi(t) = Vi(Qi(t), t).

3 Sample message:Yi(t) ∼ Pi(t).

Rutgers Sarwate

Page 33: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 17 / 48

Social sampling

i j

Yi(t)

Yj(t)

Xi Xji j

Wij

Wji

trust

Qi(0) Qj(0)

messages

i j

Yi(t)

Yj(t)

update

Qi(1) Qj(1)

Qi(0) Qj(0)

i j

Wij

Wji

t ! 1

Qi(t) Qj(t)

Rutgers Sarwate

Page 34: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 18 / 48

Possible phenomena

Rutgers Sarwate

Page 35: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 18 / 48

Possible phenomena

0.00

0.25

0.50

0.75

1.00

0 50 100 150time

Nod

e es

timat

e elementQ1Q2Q3Q4

Estimate at a single node vs. time

Coalescence: all agents converge to singletons

Rutgers Sarwate

Page 36: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 18 / 48

Possible phenomena

0.00

0.25

0.50

0.75

1.00

0 2500 5000 7500 10000time

Nod

e es

timat

e

elementtruenode3node8node13node18node23

Node estimatess of a single bin vs. time

Consensus: agents converge to common Π̂ 6= Π

Rutgers Sarwate

Page 37: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 18 / 48

Possible phenomena

0.00

0.25

0.50

0.75

1.00

0 2500 5000 7500 10000time

Nod

e es

timat

e

elementtruenode3node8node13node18node23

Node estimatess of a single bin vs. time

Convergence: agents converge to Π

Rutgers Sarwate

Page 38: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 19 / 48

Linear update rule

Qi(t) = Ai(t)Qi(t− 1) +Bi(t)Yi(t− 1) +∑

j∈Ni

Wij(t)Yj(t− 1)

• Linear update rule combining Yi ∼ Pi and Qi.

• Exhibits different behavior depending on Ai(t), Bi(t), and W (t).

Rutgers Sarwate

Page 39: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 19 / 48

Linear update rule

Qi(t) = Ai(t)Qi(t− 1) +Bi(t)Yi(t− 1) +∑

j∈Ni

Wij(t)Yj(t− 1)

• Linear update rule combining Yi ∼ Pi and Qi.

• Exhibits different behavior depending on Ai(t), Bi(t), and W (t).

Rutgers Sarwate

Page 40: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 20 / 48

Convergence

Main idea : massage the update rule into matrix form:

Q(t+ 1) = Q(t) + δ(t)[H̄Q(t) + C(t) + M(t)

].

with

1 Step size δ(t) = 1/t

2 Perturbation C(t) = O(δ(t))

3 Martingale difference term M(t)

This is a stochastic approximation: converges to a fixed point of H̄.

Rutgers Sarwate

Page 41: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 21 / 48

Example: censored updates

Suppose we make distribution Pi(t) a censored version of Qi(t):

Pi,m(t) = Qi,m(t) · 1 (Qi,m(t) > δ(t)(1−Wii)))

Pi,0(t) =

M∑

m=1

Qi,m(t) · 1 (Qi,m(t) ≤ δ(t)(1−Wii))

Agent sends Yi(t) = 0 if it samples a “rare” element in Qi.

Rutgers Sarwate

Page 42: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 21 / 48

Example: censored updates

Suppose we make distribution Pi(t) a censored version of Qi(t):

Pi,m(t) = Qi,m(t) · 1 (Qi,m(t) > δ(t)(1−Wii)))

Pi,0(t) =

M∑

m=1

Qi,m(t) · 1 (Qi,m(t) ≤ δ(t)(1−Wii))

Agent sends Yi(t) = 0 if it samples a “rare” element in Qi.

Rutgers Sarwate

Page 43: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 22 / 48

Phenomena captured by censored model

0.00

0.25

0.50

0.75

1.00

0 2500 5000 7500 10000time

Nod

e es

timat

eelement

truenode3node8node13node18node23

Node estimatess of a single bin vs. time

• Censored distribution Pi(t) guards against “marginal opinions.”

• Sampled messages Yi(t) ∼ Pi(t) are simple messages.

• Decaying weights δ(t) represent solidifying of opinions.

Result : all estimates converge almost surely to Π.

Rutgers Sarwate

Page 44: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 22 / 48

Phenomena captured by censored model

0.00

0.25

0.50

0.75

1.00

0 2500 5000 7500 10000time

Nod

e es

timat

eelement

truenode3node8node13node18node23

Node estimatess of a single bin vs. time

• Censored distribution Pi(t) guards against “marginal opinions.”

• Sampled messages Yi(t) ∼ Pi(t) are simple messages.

• Decaying weights δ(t) represent solidifying of opinions.

Result : all estimates converge almost surely to Π.

Rutgers Sarwate

Page 45: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 22 / 48

Phenomena captured by censored model

0.00

0.25

0.50

0.75

1.00

0 2500 5000 7500 10000time

Nod

e es

timat

eelement

truenode3node8node13node18node23

Node estimatess of a single bin vs. time

• Censored distribution Pi(t) guards against “marginal opinions.”

• Sampled messages Yi(t) ∼ Pi(t) are simple messages.

• Decaying weights δ(t) represent solidifying of opinions.

Result : all estimates converge almost surely to Π.

Rutgers Sarwate

Page 46: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 22 / 48

Phenomena captured by censored model

0.00

0.25

0.50

0.75

1.00

0 2500 5000 7500 10000time

Nod

e es

timat

eelement

truenode3node8node13node18node23

Node estimatess of a single bin vs. time

• Censored distribution Pi(t) guards against “marginal opinions.”

• Sampled messages Yi(t) ∼ Pi(t) are simple messages.

• Decaying weights δ(t) represent solidifying of opinions.

Result : all estimates converge almost surely to Π.

Rutgers Sarwate

Page 47: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 22 / 48

Phenomena captured by censored model

0.00

0.25

0.50

0.75

1.00

0 2500 5000 7500 10000time

Nod

e es

timat

eelement

truenode3node8node13node18node23

Node estimatess of a single bin vs. time

• Censored distribution Pi(t) guards against “marginal opinions.”

• Sampled messages Yi(t) ∼ Pi(t) are simple messages.

• Decaying weights δ(t) represent solidifying of opinions.

Result : all estimates converge almost surely to Π.

Rutgers Sarwate

Page 48: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 23 / 48

Future directions

• Find the rate of convergence and dependence of the rate on theparameters

• Investigate the robustness of the update rule to noise andperturbations

• Continuous distributions?

• Other message passing algorithms?

• Distributed optimization?

Rutgers Sarwate

Page 49: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 23 / 48

Future directions

• Find the rate of convergence and dependence of the rate on theparameters

• Investigate the robustness of the update rule to noise andperturbations

• Continuous distributions?

• Other message passing algorithms?

• Distributed optimization?

Rutgers Sarwate

Page 50: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 23 / 48

Future directions

• Find the rate of convergence and dependence of the rate on theparameters

• Investigate the robustness of the update rule to noise andperturbations

• Continuous distributions?

• Other message passing algorithms?

• Distributed optimization?

Rutgers Sarwate

Page 51: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 23 / 48

Future directions

• Find the rate of convergence and dependence of the rate on theparameters

• Investigate the robustness of the update rule to noise andperturbations

• Continuous distributions?

• Other message passing algorithms?

• Distributed optimization?

Rutgers Sarwate

Page 52: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 23 / 48

Future directions

• Find the rate of convergence and dependence of the rate on theparameters

• Investigate the robustness of the update rule to noise andperturbations

• Continuous distributions?

• Other message passing algorithms?

• Distributed optimization?

Rutgers Sarwate

Page 53: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Estimating Histograms 23 / 48

Future directions

• Find the rate of convergence and dependence of the rate on theparameters

• Investigate the robustness of the update rule to noise andperturbations

• Continuous distributions?

• Other message passing algorithms?

• Distributed optimization?

Rutgers Sarwate

Page 54: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 24 / 48

“Non-Bayesian” social learning

A. Lalitha, T. Javidi, A. Sarwate, Social Learning and Distributed Hypothesis Testing,ArXiV report number arXiv:1410.4307 [math.ST], October, 2014.

Rutgers Sarwate

Page 55: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 25 / 48

Model

• Set of n nodes.

• Set of hypothesesΘ = {θ1, θ2, . . . , θM}.

• Observations X(t)i are i.i.d.

• Fixed known distributions{fi(·; θ1), fi(·; θ2), . . . , fi(·; θM )}.

• θ∗ ∈ Θ is fixed global unknownparameter

• X(t)i ∼ fi(·; θ∗).

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 56: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 25 / 48

Model

• Set of n nodes.

• Set of hypothesesΘ = {θ1, θ2, . . . , θM}.

• Observations X(t)i are i.i.d.

• Fixed known distributions{fi(·; θ1), fi(·; θ2), . . . , fi(·; θM )}.

• θ∗ ∈ Θ is fixed global unknownparameter

• X(t)i ∼ fi(·; θ∗).

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 57: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 25 / 48

Model

• Set of n nodes.

• Set of hypothesesΘ = {θ1, θ2, . . . , θM}.

• Observations X(t)i are i.i.d.

• Fixed known distributions{fi(·; θ1), fi(·; θ2), . . . , fi(·; θM )}.

• θ∗ ∈ Θ is fixed global unknownparameter

• X(t)i ∼ fi(·; θ∗).

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 58: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 25 / 48

Model

• Set of n nodes.

• Set of hypothesesΘ = {θ1, θ2, . . . , θM}.

• Observations X(t)i are i.i.d.

• Fixed known distributions{fi(·; θ1), fi(·; θ2), . . . , fi(·; θM )}.

• θ∗ ∈ Θ is fixed global unknownparameter

• X(t)i ∼ fi(·; θ∗).

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 59: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 25 / 48

Model

• Set of n nodes.

• Set of hypothesesΘ = {θ1, θ2, . . . , θM}.

• Observations X(t)i are i.i.d.

• Fixed known distributions{fi(·; θ1), fi(·; θ2), . . . , fi(·; θM )}.

• θ∗ ∈ Θ is fixed global unknownparameter

• X(t)i ∼ fi(·; θ∗).

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 60: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 25 / 48

Model

• Set of n nodes.

• Set of hypothesesΘ = {θ1, θ2, . . . , θM}.

• Observations X(t)i are i.i.d.

• Fixed known distributions{fi(·; θ1), fi(·; θ2), . . . , fi(·; θM )}.

• θ∗ ∈ Θ is fixed global unknownparameter

• X(t)i ∼ fi(·; θ∗).

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 61: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 25 / 48

Model

• Set of n nodes.

• Set of hypothesesΘ = {θ1, θ2, . . . , θM}.

• Observations X(t)i are i.i.d.

• Fixed known distributions{fi(·; θ1), fi(·; θ2), . . . , fi(·; θM )}.

• θ∗ ∈ Θ is fixed global unknownparameter

• X(t)i ∼ fi(·; θ∗).

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 62: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 26 / 48

Hypothesis Testing

If θ∗ is globally identifiable, then

collecting all observations

X(t) = {X(t)1 , X

(t)2 , . . . , X(t)

n }

at a central locations yields a

centralized hypothesis testing problem.

Exponentially fast convergence to the

true hypothesis

Can this be achieved locally with low

dimensional observations?

Rutgers Sarwate

Page 63: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 26 / 48

Hypothesis TestingIf θ∗ is globally identifiable, then

collecting all observations

X(t) = {X(t)1 , X

(t)2 , . . . , X(t)

n }

at a central locations yields a

centralized hypothesis testing problem.

Exponentially fast convergence to the

true hypothesis

Can this be achieved locally with low

dimensional observations?

Rutgers Sarwate

Page 64: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 26 / 48

Hypothesis TestingIf θ∗ is globally identifiable, then

collecting all observations

X(t) = {X(t)1 , X

(t)2 , . . . , X(t)

n }

at a central locations yields a

centralized hypothesis testing problem.

Exponentially fast convergence to the

true hypothesis

Can this be achieved locally with low

dimensional observations?

Rutgers Sarwate

Page 65: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 26 / 48

Hypothesis TestingIf θ∗ is globally identifiable, then

collecting all observations

X(t) = {X(t)1 , X

(t)2 , . . . , X(t)

n }

at a central locations yields a

centralized hypothesis testing problem.

Exponentially fast convergence to the

true hypothesis

Can this be achieved locally with low

dimensional observations?

Rutgers Sarwate

Page 66: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 27 / 48

Example: Low-dimensional Observations

If all observations are not collected centrally, node 1 individually cannot learnθ∗.

=⇒ nodes must communicate.

Rutgers Sarwate

Page 67: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 27 / 48

Example: Low-dimensional Observations

If all observations are not collected centrally, node 1 individually cannot learnθ∗. =⇒ nodes must communicate.

Rutgers Sarwate

Page 68: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 28 / 48

Distributed Hypothesis Testing

• Define Θ̄i = {θ ∈ Θ : fi(·; θ) =fi(·; θ∗)}.

• θ ∈ Θ̄i

=⇒ θ and θ∗ are observationallyequivalent for node i.

• Suppose{θ∗} = Θ̄1 ∩ Θ̄2 ∩ . . . ∩ Θ̄n.

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 69: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 28 / 48

Distributed Hypothesis Testing

• Define Θ̄i = {θ ∈ Θ : fi(·; θ) =fi(·; θ∗)}.

• θ ∈ Θ̄i

=⇒ θ and θ∗ are observationallyequivalent for node i.

• Suppose{θ∗} = Θ̄1 ∩ Θ̄2 ∩ . . . ∩ Θ̄n.

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 70: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 28 / 48

Distributed Hypothesis Testing

• Define Θ̄i = {θ ∈ Θ : fi(·; θ) =fi(·; θ∗)}.

• θ ∈ Θ̄i

=⇒ θ and θ∗ are observationallyequivalent for node i.

• Suppose{θ∗} = Θ̄1 ∩ Θ̄2 ∩ . . . ∩ Θ̄n.

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 71: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 28 / 48

Distributed Hypothesis Testing

• Define Θ̄i = {θ ∈ Θ : fi(·; θ) =fi(·; θ∗)}.

• θ ∈ Θ̄i

=⇒ θ and θ∗ are observationallyequivalent for node i.

• Suppose{θ∗} = Θ̄1 ∩ Θ̄2 ∩ . . . ∩ Θ̄n.

GOAL Parametric inference of unknown θ∗

Rutgers Sarwate

Page 72: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 29 / 48

Learning Rule

• At t = 0, node i begins with

initial estimate vector q(0)i > 0,

where components of q(t)i form a

probability distribution on Θ.

• At t > 0, node i draws X(t)i .

Rutgers Sarwate

Page 73: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 29 / 48

Learning Rule

• At t = 0, node i begins with

initial estimate vector q(0)i > 0,

where components of q(t)i form a

probability distribution on Θ.

• At t > 0, node i draws X(t)i .

Rutgers Sarwate

Page 74: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 29 / 48

Learning Rule

• At t = 0, node i begins with

initial estimate vector q(0)i > 0,

where components of q(t)i form a

probability distribution on Θ.

• At t > 0, node i draws X(t)i .

Rutgers Sarwate

Page 75: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 30 / 48

Learning Rule

• Node i computes belief vector,

b(t)i , via Bayesian update

b(t)i (θ) =

fi

(X

(t)i ; θ

)q

(t−1)i (θ)

∑θ′∈Θ fi

(X

(t)i ; θ′

)q

(t−1)i (θ′)

.

• Sends message Y(t)i = b

(t)i .

Rutgers Sarwate

Page 76: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 30 / 48

Learning Rule

• Node i computes belief vector,

b(t)i , via Bayesian update

b(t)i (θ) =

fi

(X

(t)i ; θ

)q

(t−1)i (θ)

∑θ′∈Θ fi

(X

(t)i ; θ′

)q

(t−1)i (θ′)

.

• Sends message Y(t)i = b

(t)i .

Rutgers Sarwate

Page 77: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 31 / 48

Learning Rule

• Receives messages from itsneighbors at the same time.

• Updates q(t)i via averaging of log

beliefs,

q(t)i (θ) =

exp(∑n

j=1Wij log b(t)j (θ)

)

∑θ′∈Θ exp

(∑nj=1Wij log b

(t)j (θ′)

) ,

where weight Wij denotes theinfluence of node j on estimateof node i.

• Put t = t+ 1 and repeat.

Rutgers Sarwate

Page 78: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 31 / 48

Learning Rule

• Receives messages from itsneighbors at the same time.

• Updates q(t)i via averaging of log

beliefs,

q(t)i (θ) =

exp(∑n

j=1Wij log b(t)j (θ)

)

∑θ′∈Θ exp

(∑nj=1Wij log b

(t)j (θ′)

) ,

where weight Wij denotes theinfluence of node j on estimateof node i.

• Put t = t+ 1 and repeat.

Rutgers Sarwate

Page 79: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 31 / 48

Learning Rule

• Receives messages from itsneighbors at the same time.

• Updates q(t)i via averaging of log

beliefs,

q(t)i (θ) =

exp(∑n

j=1Wij log b(t)j (θ)

)

∑θ′∈Θ exp

(∑nj=1Wij log b

(t)j (θ′)

) ,

where weight Wij denotes theinfluence of node j on estimateof node i.

• Put t = t+ 1 and repeat.

Rutgers Sarwate

Page 80: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 32 / 48

In a picture

BayesianUpdate

Xi(t) ⇠ fi(·|✓⇤)

bi(✓, t) =fi(Xi(t)|✓)P

✓0 fi(Xi(t)|✓)Qi(✓, t)

Qi(✓, t)

Qi(✓, t + 1) =exp

⇣Pnj=1 Wij log bj(✓, t)

P✓02⇥ exp

⇣Pnj=1 Wij log bj(✓0, t)

⌘ .

Average log-beliefs {bj(✓, t)} messages from

neighbors

local observations

Rutgers Sarwate

Page 81: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 33 / 48

An example

When connected in a network, using the proposed learning rule node 1 learnsθ∗.

Rutgers Sarwate

Page 82: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 33 / 48

An example

When connected in a network, using the proposed learning rule node 1 learnsθ∗.

Rutgers Sarwate

Page 83: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 34 / 48

Assumptions

Assumption 1

For every pair θ 6= θ∗, fi (·; θ∗) 6= fi (·; θ) for at least one node, i.e theKL-divergence D (fi (·; θ∗)‖ fi (·; θ)) > 0.

Assumption 2

The stochastic matrix W is irreducible.

Assumption 3

For all i ∈ [n], the initial estimate q(0)i (θ) > 0 for every θ ∈ Θ.

Rutgers Sarwate

Page 84: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 34 / 48

Assumptions

Assumption 1

For every pair θ 6= θ∗, fi (·; θ∗) 6= fi (·; θ) for at least one node, i.e theKL-divergence D (fi (·; θ∗)‖ fi (·; θ)) > 0.

Assumption 2

The stochastic matrix W is irreducible.

Assumption 3

For all i ∈ [n], the initial estimate q(0)i (θ) > 0 for every θ ∈ Θ.

Rutgers Sarwate

Page 85: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 34 / 48

Assumptions

Assumption 1

For every pair θ 6= θ∗, fi (·; θ∗) 6= fi (·; θ) for at least one node, i.e theKL-divergence D (fi (·; θ∗)‖ fi (·; θ)) > 0.

Assumption 2

The stochastic matrix W is irreducible.

Assumption 3

For all i ∈ [n], the initial estimate q(0)i (θ) > 0 for every θ ∈ Θ.

Rutgers Sarwate

Page 86: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 35 / 48

Convergence Results

• Let θ∗ be the unknown fixed parameter.

• Suppose assumptions 1− 3 hold.

• The eigenvector centrality v = [v1, v2, . . . , vn] is the left eigenvector ofW for eigenvalue 1.

Theorem: Rate of rejecting θ 6= θ∗

Every node i’s estimate of θ 6= θ∗ almost surely converges to 0 exponentiallyfast. Mathematically,

− limt→∞

1

tlog q

(t)i (θ) = K(θ∗, θ) P-a.s.

where K(θ∗, θ) =∑nj=1 vjD (fj (·; θ∗)‖ fj (·; θ)).

Rutgers Sarwate

Page 87: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 35 / 48

Convergence Results

• Let θ∗ be the unknown fixed parameter.

• Suppose assumptions 1− 3 hold.

• The eigenvector centrality v = [v1, v2, . . . , vn] is the left eigenvector ofW for eigenvalue 1.

Theorem: Rate of rejecting θ 6= θ∗

Every node i’s estimate of θ 6= θ∗ almost surely converges to 0 exponentiallyfast. Mathematically,

− limt→∞

1

tlog q

(t)i (θ) = K(θ∗, θ) P-a.s.

where K(θ∗, θ) =∑nj=1 vjD (fj (·; θ∗)‖ fj (·; θ)).

Rutgers Sarwate

Page 88: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 36 / 48

Example: Network-wide Learning

• Θ = {θ1, θ2, θ3, θ4} and θ∗ = θ1.

• If i and j are connected,Wij = 1

degree of node i , otherwise 0.

• v = [ 112 ,

18 ,

112 ,

18 ,

16 ,

18 ,

112 ,

18 ,

112 ].

Rutgers Sarwate

Page 89: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 37 / 48

Example

Θ̄1 = {θ∗}, Θ̄i = Θ i 6= 1 Θ̄5 = {θ∗}, Θ̄i = Θ i 6= 5

Rutgers Sarwate

Page 90: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 38 / 48

Corollaries

Theorem: Rate of rejecting θ 6= θ∗

Every node i’s estimate of θ 6= θ∗ almost surely converges to 0 exponentiallyfast. Mathematically,

− limt→∞

1

tlog q

(t)i (θ) = K(θ∗, θ) P-a.s.

where K(θ∗, θ) =∑nj=1 vjD (fj (·; θ∗)‖ fj (·; θ)).

Lower bound on rate of convergence to θ∗

For every node i, the rate at which error in the estimate of θ∗ goes to zerocan be lower bounded as

− limt→∞

1

tlog(

1− q(t)i (θ∗)

)= minθ 6=θ∗

K(θ∗, θ) P-a.s.

Rutgers Sarwate

Page 91: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 39 / 48

Corollaries

Lower bound on rate of learning

The rate of learning λ across the network can be lower bounded as,

λ ≥ minθ∗∈Θ

minθ 6=θ∗

K(θ∗, θ) P-a.s.

where,

λ = lim inft→∞

1

t| log et|,

and

et =1

2

n∑

i=1

||q(t)i (·)− 1θ∗(.)||1 =

n∑

i=1

θ 6=θ∗q

(t)i (θ).

Rutgers Sarwate

Page 92: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 40 / 48

Example: Periodicity

✓1 ✓2

✓3 ✓4

node 1 can distinguish

node

2 c

an

dist

ingu

ish

• Θ = {θ1, θ2, θ3, θ4}and θ∗ = θ1.

• Underlying graph isperiodic,

W =

(0 11 0

).

Rutgers Sarwate

Page 93: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 41 / 48

Example: Networks with Large Mixing Times

✓1 ✓2

✓3 ✓4

node 1 can distinguish

node

2 c

an

dist

ingu

ish

• Θ = {θ1, θ2, θ3, θ4}and θ∗ = θ1.

• Underlying graph isaperiodic,

W =

(0.9 0.10.4 0.6

).

Rutgers Sarwate

Page 94: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 42 / 48

Concentration Result

Assumption 4

For k ∈ [n], X ∈ Xk, and for any given θi, θj ∈ Θ such that θi 6= θj ,∣∣∣log fk(·;θi)fk(·;θj)

∣∣∣ is bounded, denoted by L.

Theorem

Under Assumptions 1–4, for every ε > 0 there exists a T such that for allt ≥ T and for every θ 6= θ∗ and i ∈ [n] we have

Pr(

log q(t)i (θ) ≥ −(K(θ∗, θ)− ε)t

)≤ γ(ε, L, t),

andPr(

log q(t)i (θ) ≤ −(K(θ∗, θ) + ε)t

)≤ γ(

ε

2, L, t),

where L is a finite constant and γ(ε, L, t) = 2 exp(− ε2t

2L2d

).

Rutgers Sarwate

Page 95: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 43 / 48

Related Work and Contribution

Jadbabaie et al. use local Bayesian update of beliefs followed by averaging the

beliefs.

• Show exponential convergence with no closed form of convergence rate. [’12]

• Provide an upper bound on learning rate. [’13]

We average the log beliefs instead.

• Provide a lower bound on learning rate λ̃.

• Lower bound on learning rate is greater than the upper bound

=⇒ Our learning rule converges faster.

Shahrampour and Jadbabaie, ’13 formulated a stochastic optimization learning

problem; obtained a dual-based learning rule for doubly stochastic W ,

• Provide closed-form lower bound on rate of identifying θ∗.

• Using our rule we achieve the same lower bound (from corollary 1)

minθ 6=θ∗

(1

n

n∑j=1

D(fj(·; θ∗)||fj(·; θ))

).

Rutgers Sarwate

Page 96: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 43 / 48

Related Work and Contribution

Jadbabaie et al. use local Bayesian update of beliefs followed by averaging the

beliefs.

• Show exponential convergence with no closed form of convergence rate. [’12]

• Provide an upper bound on learning rate. [’13]

We average the log beliefs instead.

• Provide a lower bound on learning rate λ̃.

• Lower bound on learning rate is greater than the upper bound

=⇒ Our learning rule converges faster.

Shahrampour and Jadbabaie, ’13 formulated a stochastic optimization learning

problem; obtained a dual-based learning rule for doubly stochastic W ,

• Provide closed-form lower bound on rate of identifying θ∗.

• Using our rule we achieve the same lower bound (from corollary 1)

minθ 6=θ∗

(1

n

n∑j=1

D(fj(·; θ∗)||fj(·; θ))

).

Rutgers Sarwate

Page 97: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 44 / 48

Related Work and Contribution

An update rule similar to ours was used in Rahnama Rad and Tahbaz-Salehi, 2010

to

• Show that node’s belief converges in probability to the true parameter.

• However, under certain analytic assumptions.

For general model and discrete parameter spaces we show almost-sure exponentially

fast convergence.

Shahrampour et. al. and Nedic et. al. (independently) showed that our learning

rule coincides with distributed stochastic optimization based learning rule (W

irreducible and aperiodic)

Rutgers Sarwate

Page 98: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Pseudo-Bayes 44 / 48

Related Work and Contribution

An update rule similar to ours was used in Rahnama Rad and Tahbaz-Salehi, 2010

to

• Show that node’s belief converges in probability to the true parameter.

• However, under certain analytic assumptions.

For general model and discrete parameter spaces we show almost-sure exponentially

fast convergence.

Shahrampour et. al. and Nedic et. al. (independently) showed that our learning

rule coincides with distributed stochastic optimization based learning rule (W

irreducible and aperiodic)

Rutgers Sarwate

Page 99: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Summing up 45 / 48

Social sampling to estimate histograms

i j

Yi(t)

Yj(t)

Xi Xji j

Wij

Wji

trust

Qi(0) Qj(0)

messages

i j

Yi(t)

Yj(t)

update

Qi(1) Qj(1)

Qi(0) Qj(0)

i j

Wij

Wji

t ! 1

Qi(t) Qj(t)

• Simple model of randomized message exchange.

• Unified analysis captures different qualitative behaviors.

• “Censoring rule” to achieve consensus to true histogram.

Rutgers Sarwate

Page 100: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Summing up 46 / 48

Hypothesis testing and “semi-Bayes”

LocalUpdate

SocialUpdate

SendMsg

GetData

Xi(t) {Yj(t)}

Qi(t) Qi(t + 1)

• Combination of local Bayesian updates and averaging.

• Network divergence: an intuitive measure for the rate ofconvergence.

• “Posterior consistency” gives a Bayesio-frequentist analysis.

Rutgers Sarwate

Page 101: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Summing up 47 / 48

Looking forward

✓1 ✓2

✓3

✓i✓4

• Continuous distributions and parameters.

• Applications to distributed optimization.

• Time-varying case.

Rutgers Sarwate

Page 102: Learning distributions and hypothesis testing via social ... · et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network

UMich EECS 2015 > Summing up 48 / 48

Thank You!

Rutgers Sarwate