what do we hear for? seeing is knowing what is where by looking (david marr) seeing is predicting...

81
What do we hear for ? Seeing is knowing what is where by looking ( David Marr ) Seeing is predicting what is where, verified by looking, in order to drink that cup of coffee ( Reza Shadmehr )

Post on 21-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

What do we hear for?

Seeing is knowing what is where by looking

(David Marr)

Seeing is predicting what is where, verified by looking, in order to drink that cup of

coffee

(Reza Shadmehr)

What do we hear for?

Seeing is knowing what is where by looking

(David Marr)

Seeing is predicting what is where, verified by looking, in order to drink that cup of coffee

(Reza Shadmehr)

Hearing is predicting what will happen next, verified by listening, in order to know as much as

possible about what’s out there

(Eli Nelken)

Even simple sounds tell stories

A stupid story

The calm of the sea

Vox balaenae )Voice of the whale(For flute, cello and piano )cello and piano playing(George Crumb

A shout of despair

Wozzeck, orchestral transition between scenes 2 and 3 of act 3Alban Berg

Auditory worlds

• What are sounds?

• What do we hear?

• How do we hear?

Sound As a Pressure Wave

Vibrations of objects set up pressure waves in the surrounding air.The “elastic” property of air allows these pressure waves to propagate (spread).

Vibrations of objects set up pressure waves in the surrounding air.The “elastic” property of air allows these pressure waves to propagate (spread).

Structure of sounds

What happens without structure?

Introducing structure

The bird and Chopin

©Gabriel J. Arsante

Structure of sounds

©Gabriel J. Arsante

What are sounds?

• Structure at a lot of time scales

• Perceptual correlates:– Melodies )1 s(– Notes )0.1 s(– Pitch )much faster than 0.01 s(

Peripheral processing of sounds

Inner Ear

Middle Ear

Outer Ear

Inner Ear

Middle Ear

Outer Ear

Inner Ear

Middle Ear

Outer Ear

Inner Ear

Middle Ear

Outer Ear

Cross Section of Cochlea

“Travelling Wave” Along the Basilar Membrane

Von Békésy

Travelling Wave Peaks at Different Locations As the Frequency Changes

Outer Hair Cells

Inner Hair Cells

A simple neuron in the auditory system

BF

The auditory pathways

Responses of simple neurons to complex sounds

Orig Slow

A set of complex sounds

In consequence…

The neurogram

We get a very rich and precise representation of the incoming

sound at the level of the auditory nerve

The sound and its components

full

337

600

2000

Brahms, Geistlisches WiegenliedOp. 91 no. 2Kathleen Ferrier, Phyllis Spurr,Max Gilbert

Is that enough?

(do we hear the spectrogram)?

What are the perceptual qualities of sounds?

“The basic elements of any sound are loudness, pitch, contour, duration )or rhythm(, tempo, timbre, spatial location, and reverberation.”

)D.J. Levitin, This is Your Brain on Music: The Science of a Human Obsession, p.14(

The Long Road from Spectrogram to Perception

• How do we go from the ‘neurogram’ to ‘loudness, pitch, contour, duration )or rhythm(, tempo, timbre, spatial location, and reverberation’?

Relationships with low-level features…

• Loudness with sound intensity– Encoded by some population-averaged activity

• Pitch with periodicity

pure

Pur

e to

nes

Time

Filtered clicks

Fil

tere

d cl

icks

Iterated ripple noise

IRN

AM (3 kHz)

SA

M

Pitch: examples

Relationships with low-level features…

• Loudness with sound intensity– Encoded by some population-averaged activity

• Pitch with periodicity– Periodicity IS NOT frequency!

• Contour with slow amplitude modulations– Encoded in the range of 1-10 Hz very clearly at the level of A1

)e.g. Shamma and collaborators(– But not slower than that )probably(

• Duration/rhythm with ???• Tempo with ???• Timbre with spatial activation patterns )e.g. in A1(• Spatial location with ITD/ILD/spectral activation patterns

– Low-level information available at the CN/SOC– But requires integration

• Reverberation with ???????

The Long Road from Spectrogram to Perception

• Pitch, timbre, phonemic identity, and so on are ‘separable’ – they are independent of each other

• They represent high-level generalizations– Many different sounds have the same pitch )violin and

trumpet(, same timbre )trumpet on two different tones(, same phonemic identity )two different people talking(

– The neurograms of these pairs of sounds are very different from each other

• The generalizations should be derivable from the neurogram, but are not explicitly represented at that level

The Long Road from Spectrogram to Perception

Problem no. 1: we do not hear the physics of sounds, but rather their derived properties

)Reverse hierarchies – we perceive high representation levels unless we make

serious efforts to go down into the details(

The Long Road from Spectrogram to Perception

The Long Road from Spectrogram to Perception

Problem no. 2: In natural conditions, sounds rarely occur by themselves

We have to group and segregate ‘bits of sounds’ in order to form representations of

‘auditory objects’

What comes first, the sound or its properties?

• We may need to start by forming objects )solve problem no. 2( and only later assign properties to them )solve problem no. 1(

Hypothesis: the early auditory system )presumably up to the

level of primary auditory cortex( deals with the formation of

auditory objects

Evidence A:Object representation in primary

auditory cortex

The auditory pathways

Primary auditory cortex is a higher brain area!

Visual system:

Photoreceptors

Bipolar cells

Retinal ganglion cells

LGN

V1

IT

Face cells

Auditory system:

Hair cells

Auditory nerve fibers

Cochlear nucleus

Superior Olive

Inferior Colliculus

MGB

Auditory cortex

Frequency

Soun

d le

vel

Localization and binaural detection

Species-specific calls?

Auditory scene analysis?

The auditory pathways

A1 Neurons have a large variety of frequency response areas )FRAs(

98 98

Memory in primary auditory cortex

Neurons in auditory cortex represent the weak components

of sounds(evidence for the representation of

auditory objects in primary auditory cortex)

Strong effects of weak backgrounds…

0.1 40kHz

100

10

dB A

ttn

0 100ms

0 100ms

0 100ms

Some cortical neurons respond to weak noise in mixture with high-level tones

Tones in modulated and unmodulated background

Noise (bandwidth: BF, 10 Hz trapezoidal envelope)Noise (bandwidth: BF, 10 Hz trapezoidal envelope)Tone (BF)Noise (bandwidth: BF, 10 Hz trapezoidal envelope)Tone (BF)Tone+Noise

Weak tones in strong noise

Las et al. 2005

Responses to high-level tones in silence and to low-level tones in

noise are similar

Evidence B: coding of surprising events in

primary auditory cortex

Time

Low Freq.

High Freq.

Time

Low Freq.

High Freq.

Time

High Freq.

Low Freq.

95% 50% 5%

Low Freq. Low Freq.

Low Freq.

Low Freq. High Freq.

Deviant

Standard

SSA =

0.34 0.32

0.23

…Also with spikes…

Evidence C: Perceptual qualities such as pitch

are coded outside primary auditory cortex

Activation of auditory cortex by noise and pitched stimuli

Activation by intelligible speech

Take-home messages

• Auditory perception is far removed from the ‘physical’, low-level representation of sounds

• A major problem of early processing is the definition of the ‘objects’ to which properties will be assigned

• There is evidence that objects are defined first, properties are assigned in higher brain areas

Reverse Hierarchy Theory

• The hierarchical trade offs that dictate the relations between processing and perception

• We perceive the high-order constructs rather than the low-level physics

Interactions between high- and low-level representations

Interactions between high- and low-level representations

Interactions between high- and low-level representations

From Hochstein and Ahissar 2002

Change blindness

Name the color of the letters

נשר

אדום

כחול

Visual Reverse Hierarchy Theory )RHT(

(Ahissar & Hochstein, 1997; Hochstein & Ahissar, 2002)

Feedback re

verse hierarch

yFeed-fo

rward hierarch

y

Low levels are sensitive to fine temporal cues,

in a μs resolution

Phonological/semantic level

……

day bay

nightdream

Initial perception is based on high-levels,

which represent phonological entities

See: Nahum, Nelken and Ahissar, PLoS 2008

We can either hear the sounds or understand the words, but not

both at the same time