primitive probabilistic auditory scene analysis -...

Primitive probabilistic auditory scene analysis

Richard E. Turner (ret26@cam.ac.uk)

Computational and Biological Learning Lab, University of Cambridge

Maneesh Sahani (maneesh@gatsby.ucl.ac.uk)

Gatsby Computational Neuroscience Unit, UCL, London

Motivation

object

parts of objects

structural primitives

sensoryinput

Motivation

object

parts of objects

sensoryinput

photoreceptoractivities

Motivation

object

parts of objects

sensoryinput

bird song

AM tones and noise

auditory filteractivities

Auditory Scene Analysis

object

parts of objects

sensoryinput

bird song

AM tones and noise

superschema−based

grouping

schema−basedgrouping

primitivegrouping

Generative model: Probabilistic Auditory Scene Synthesis

bird song

AM tones and noise

superschema−based

grouping

primitivegrouping

p(source)

p(parts|source)

p(primitive|part)

p(sound|primitives)

Recognition model: Probabilistic Auditory Scene Analysis

bird song

AM tones and noise

superschema−based

grouping

primitivegrouping

p(source)

p(parts|source)

p(primitive|part)

p(sound|primitives)

p(source|sound)

p(part|sound)

p(primitive|sound)

sound waveform

Primitive Probabilistic Auditory Scene Analysis

bird song

AM tones and noise

superschema−based

grouping

primitivegrouping

p(source)

p(parts|source)

p(primitive|part)

p(sound|primitives)

p(source|sound)

p(part|sound)

p(primitive|sound)

sound waveform

Part 1: Probabilistic generative model: primitive auditory scenesynthesis

What are the important low-level statistics of natural sounds?

Part 2: Probabilistic recognition model: primitive auditoryscene analysis

Provocative computational theory: Grouping rules arise frominferences based on the statistics of natural sounds.

Heuristic Analysis: Fire sound

−202

0.5 0.52 0.54 0.56 0.58 0.6 0.62−2

time /s

filter

−202

0.5 0.52 0.54 0.56 0.58 0.6 0.62−2

time /s

filter and demodulate

Heuristic Analysis: Water

Heuristic Analysis: Rain

Heuristic Analysis: Speech

Summary

sound → filter bank → demodulate → transformed envelopes

• Important statistics include

– energy in sub-bands (power-spectrum)– patterns of co-modulation (local power-sharing)– time-scale of the modulation– depth of the modulation (sparsity)

• Josh’s texture synthesis work tomorrow

• Formulate a probabilistic model to capture these statistics:

sound ← modulate carriers ← modulators ← transformed envelopes

Structural Primitives = Co-modulated narrow-band processes

Summary

Model Schematic

0 0.05 0.1

time0 0.05 0.1

0c 1,t

0c D,t

+ ... +

Model Schematic

0 0.05 0.1

time0 0.05 0.1

0c 1,t

0c D,t

0x 1,t

0x D,t

+ ... +

Probabilistic Model

• Transformed envelopes: slow and +/–

p(xe,1:T |Γe,1:T,1:T ) = Norm (xe,1:T ; 0,Γe,1:T,1:T )

Γe,t−t′ = σ2e exp

(− 1

(t− t′)2)

• Envelopes: positive mixture of transformed envelopes with controllable sparsity

ad,t = a

(E∑e=1

gd,exe,t + µd

), e.g. a(z) = log(1 + exp(z)).

• Carriers: band-limited Gaussian noise

p(cd,t|cd,t−1:t−2, θ) = Norm

(cd,t;

2∑t′=1

λd,t′cd,t−t′, σ2d

yt =D∑d=1

cd,tad,t + σyεt

Probabilistic Model

(− 1

(t− t′)2)

ad,t = a

(E∑e=1

gd,exe,t + µd

), e.g. a(z) = log(1 + exp(z)).

(cd,t;

2∑t′=1

yt =D∑d=1

cd,tad,t + σyεt

Probabilistic Model

(− 1

(t− t′)2)

ad,t = a

(E∑e=1

gd,exe,t + µd

), e.g. a(z) = log(1 + exp(z)).

(cd,t;

2∑t′=1

yt =D∑d=1

cd,tad,t + σyεt

Probabilistic Model

(− 1

(t− t′)2)

ad,t = a

(E∑e=1

gd,exe,t + µd

), e.g. a(z) = log(1 + exp(z)).

(cd,t;

2∑t′=1

yt =D∑d=1

cd,tad,t + σyεt

What do the parameters control?

λd control the centre frequency and bandwidth of each sub-band

σ2d control the power in each sub-band

τe controls time-scale of modulators

µd and σ2e control the modulation depth, skew and the sparsity

gd,e control the patterns of modulation across sub-bands

0 0.1 0.2 0.3 0.40

centre frequency

Intuitions for role of model parameters: Generation

Generation: Increasing sparsity

Generation: Increasing co-modulation

Generation: Decreasing time-scale

Inference and Learning

• Key Observation: Fix envelopes, posterior over carriers is Gaussian.

• Leads to MAP estimation of the envelopes (like HMMs, ICA, NMF)

XMAP = arg maxX

p(X|Y)

p(X|Y) =1Zp(X,Y) =

∫dCp(X,Y,C) =

1Zp(X)

∫dCp(Y|A,C)p(C)

• Compute integral efficiently using Kalman Smoothing

• Leads to gradient based optimisation for transformed amplitudes

• Learning: approximate Maximum Likelihood θ = arg maxθ p(Y|θ)

• See Turner and Sahani, ICASSP 2010.

Inference and Learning

• Key Observation: Fix envelopes, posterior over carriers is Gaussian.

• Leads to MAP estimation of the envelopes (like HMMs, ICA, NMF)

XMAP = arg maxX

p(X|Y)

p(X|Y) =1Zp(X,Y) =

∫dCp(X,Y,C) =

1Zp(X)

∫dCp(Y|A,C)p(C)

• Compute integral efficiently using Kalman Smoothing

• Leads to gradient based optimisation for transformed amplitudes

• Learning: approximate Maximum Likelihood θ = arg maxθ p(Y|θ)

• See Turner and Sahani, ICASSP 2010.

Synthetic Sounds

Stream Stream Stream Wind Wind Fire Fire Rain Twigs Speech

Orig Orig Orig Orig Orig Orig Orig Orig Orig Orig

Model Model Model Model Model Model Model Model Model Model

Introduction to the psychophysics

• Probe model with stimuli used to study psychophysics of auditory sceneanalysis

• Fix carriers to be ‘auditory filter’ like

• Assume perceptual access to:

– instantaneous frequency of the carriers (not the phase)– modulators

How does the model ‘hear’ these stimuli?

Continuity Illusion

yt = a1,tc1,t + a2,tc2,t

Continuity Illusion

0 0.5 1 1.5

time /s

Common Amplitude Modulation

yt = (a1,t + a3,t)c1,t + (a2,t + a3,t)c2,t

0 0.5 1 1.5 2 2.5 3 3.5 4

time /s

yt = (a1,t + a3,t)c1,t + (a2,t + a3,t)c2,t

0 0.5 1 1.5 2 2.5 3 3.5 4

time /s

yt = (a1,t + a3,t)c1,t + (a2,t + a3,t)c2,t

0 0.5 1 1.5 2 2.5 3 3.5 4

time /s

Old Plus New Heuristic

yt = a1,t(c1,t + c2,t + c3,t) + a2,t(c2,t + c4,t) + a3,tc2,t + a4,t(c1,t + c3,t)

−202

−101

0 0.5 1 1.5 2 2.5 3 3.5 4−2

time /s

Comodulation Masking Release

−4−2

signal 1

signal 2

−4−2

signal 1

signal 2

−4−2

signal 1

signal 2

−4−2

0 0.5 1 1.5 2−2

time /s0 0.5 1 1.5 2

time /s

Good Continuation

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 80

time /s

Good Continuation

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 80

time /s

Good Continuation

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5160

time /s

Proximity

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0

time /s

Proximity

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0

time /s

Proximity

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0

time /s

Conclusions

• Developed a model for natural sounds comprising quickly varying carriers andslowly varying modulators

• Inspired by natural sound statistics

• Inference replicates many of the characteristics of auditory scene analysis

Future work

• Train on a large corpus of natural sounds: quantitatively predictpsychophysical results

• Trading between grouping principles learned from natural sounds

• Psychophysics: prediction of grouping on new stimuli

• Experimental stimulus generation: samples from the model are controlled,but natural sounding

• Model extensions: Hierarchical (cascades of modulators), asymmetries, moregeneral filter-shapes

Additional Slides

Inference with fixed amplitudes, ad,t = 1

700 800 900 1000 1100 1200 1300 1400 1500 1600frequency /Hz

5.35 5.36 5.37 5.38 5.39 5.4time /s

primitive probabilistic auditory scene analysis -...

Documents

auditory perception april 9, 2009 auditory vs. acoustic so...

the processing of auditory deviance and auditory

auditory aspects of auditory imagery

embryology: development of digestive system ·...

primitive graphics

hubbard (2013) auditory aspects of auditory imagery

primitive tribes

geometri primitive

auditory splash - auditory processes at different timescales

university of southampton auditory implant...

primitive hut

auditory discrimination and auditory sensory behaviours in...

auditory spatial perception: auditory localization ·...

auditory-visual- - sahk1963.org.hk · auditory-visual-triad...

future primitive

auditory perception classes: critical bands/auditory filters

primitive auditory memory is correlated with spatial...

auditory pathway -...

primitive drawing

mechanism of sound transduction , auditory...