and you can too!. 1972 2011 1992 sbs introduction evidence for statistics bays law informative...

39
THE BRAIN AS A STATISTICAL INFORMATION PROCESSOR And you can too!

Post on 19-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

THE BRAIN AS A STATISTICAL INFORMATION PROCESSOR

And you can too!

Page 2: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

My History and Ours

1972 20111992

SBS

Page 3: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

The Brain as a Statistical IP

Introduction Evidence for Statistics Bays Law Informative Priors Joint Models Inference Conclusion

Page 4: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Evidence for StatisticsTwo examples that seem to indicate that the brain is indeed processing statistical information

Page 5: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Statistics for Word Segmentation

Saffran, Aslin, Newport. “Statistical Learning in 8-Month-Old Infants”

The infants listen to strings of nonsense words with no auditory clues to word boundaries.

E.g., “bidakupa …” where “bidaku is the first word.

They learn to distinguish words from other combinations that occur (with less frequency) over word boundaries.

Page 6: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

They Pay More Attention toNon-Words

Speaker

LightChild

Page 7: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Statistics in Low-level Vision

Based on Rosenholtz et. al. (2011)

A B

Page 8: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Statistics in Low-level Vision

Based on Rosenholtz et. al. (2011)

A N O B E L

Page 9: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Are summary statistics a good choice of representation?

A much better idea than spatial subsampling

Original patch ~1000 pixels

Page 10: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Original patch

Are summary statistics a good choice of representation?

A rich set of statistics can capture a lot of useful information

Patch synthesized to match~1000 statistical parameters(Portilla & Simoncelli, 2000)

Page 11: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Discrimination based on P&S stats predicts crowded letter recognition

Balas, Nakano, & Rosenholtz, JoV, 2009

Page 12: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Bayes Law and Cognitive Science

To my mind, at least, it packs a lot of information

Page 13: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Bayes Law and Cognitive Science

P(M|E) = P(M) P(E|M)P(E)

M = Learned Model of the worldE = Learner’s environment (sensory input)

Page 14: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Bayes Law

P(M|E) =P(M) P(E|M)P(E)

It divides up responsibility correctly.It requires a generative model. (big, joint)It (obliquely) suggests that as far as

learning goes we ignore the programs that use the model.

But which M?

Page 15: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Bayes Law Does not Pick M

Don’t pick M. Integrate over all of them.

Pick the M that maximizes P(M)P(E|M).

Pick the average P(M) (Gibbs sampling).

P(E) = Σ P(M)P(E|M)M

Page 16: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

My Personal Opinion

Don’t sweat it.

Page 17: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Informative PriorsThree examples where they are critical

Page 18: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Parsing Visual Scenes(Sudderth, Jordan)

trees

skyscraper sky

bell

dome

templebuildings

sky

Page 19: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Spatially Dependent Pitman-Yor

• Cut random surfaces (samples from a GP) with thresholds(as in Level Set Methods)• Assign each pixel to the first surface which exceeds threshold(as in Layered Models)

Duan, Guindani, & Gelfand, Generalized Spatial DP, 2007

Page 20: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Samples from Spatial Prior

Comparison: Potts Markov Random Field

Page 21: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Prior for Word Segmentation

Based on the work of Goldwater et. al. Separate one “word” from the next in

child-directed speech.

E.g., yuwanttusiD6bUk You want to see the book

Page 22: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Bag of Words

Generative Story For each utterance: For each word w (or STOP) pick with probability P(w) If w=STOP break

If we pick M to maximize P(E|M) the model memorizes the data. I.e., It creates one “word” which is the concatenation of all the words in that sentence.

Page 23: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Results Using a Dirichlet Prior

Precision: 61.6 Recall: 47.6

Example: youwant to see thebook

Page 24: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Part-of-speech Induction

Primarily based on Clark (2003)

Given a sequence of words, deduce their parts of speech (e.g., DT, NN, etc.)

Generative story: For each word position (i) in the text 1) propose part-of-speech (t) p(t|t-1) 2) propose a word (w) using p(w|t)

Page 25: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Sparse Tag Distributions

We could put a Dirichlet prior on P(w|t) But what we really want is sparse P(t|w) Almost all words (by type) have only one

part-of-speech We do best by only allowing this. E.g., “can” is only a model verb (we

hope!) Putting a sparse prior on P(word-type|t)

also helps.

Page 26: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Joint Generative Modeling

Two examples that show the strengths of modeling many phenomena jointly.

Page 27: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Joint POS Tagging and Morphology

Clark pos tagger also includes something sort of like a morphology model.

It assumes POS tags are correlated with spelling.

True morphology would recognize that “ride” “riding” and “rides” share a root.

I do not know of any true joint tagging-morphology model.

Page 28: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Joint Reference and (Named) Entity Recognition

Based on Haghighi & Klein 2010

Weiner said the problems were all Facebook’s fault. They should never have given him an account.

(person) Type1 (organization) Type2

Obama, Weiner, fatherIBM, Facebook, company

Page 29: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

InferenceOtherwise know as hardware.

Page 30: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

It is not EM

More generally it is not any mechanism that requires tracking all expectations.

Consider the word boundary. Between every two phonemes there may or may not be a boundary.

abcde a|bcde ab|cde abc|de abcd|e a|b|cde …

Page 31: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Gibbs Sampling

Start out with random guesses. Do (roughly) forever: Pick a random point. Compute p(split) and p(join). Pick r, 0<r<1:

if p(split) > r split,p(split)+p(join)

else join.

Page 32: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Gibbs has Very Nice Properties

Page 33: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Gibbs has Very Nice Properties

Page 34: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

It is not Gibbs Either

First, the nice properties only hold for “exchangeable” distributions. It seems likely that most of the ones we care about are not (e.g., Haghighi & Klein)

But critically it assumes we have all the training data at once and go over it many times.

Page 35: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

It is Particle Filterning

Or something like it. At the level of detail here, just think

“beam search.”

Page 36: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Parsing and CKY

NPNNSDogs

VBSlike

NPNNSbones

VP

SInformation Barrier

Page 37: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

It is Particle Filterning

Or something like it. At the level of detail here, just think

“beam search.”(ROOT

(Root (S (NP (NNS Dogs)

(ROOT (NP (NNS Dogs)

(ROOT (S (NP (NNS Dogs)) (VP (VBS eat)

Page 38: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

Conclusion

The brain operates by manipulating probabilities.

World-model induction is governed by Bayes Law

This implies we have a large joint generative model

It seems overwhelmingly likely that we have a very informative prior.

Something like particle filtering is the inference/use mechanism.

Page 39: And you can too!. 1972 2011 1992 SBS  Introduction  Evidence for Statistics  Bays Law  Informative Priors  Joint Models  Inference  Conclusion

THANK YOU