hst 725 lecture 6 temporal models for musical pitchpitch : basic properties • highly precise...

www.cariani.com

HST 725 Lecture 6 Temporal models for

musical pitch

Harvard-MIT Division of Health Sciences and TechnologyHST.725: Music Perception and CognitionProf. Peter Cariani

(Image removed due to copyright considerations.)

The search for the missing fundamental: theories & models of musical pitch

• Temporal theories of pitch– Residues: Beatings of unresolved harmonics (Schouten, 1940’s)– Problems with residues and envelopes– Temporal autocorrelation models (Licklider, 1951)– Interspike interval models (Moore, 1980)– Correlogram demonstration (Slaney & Lyon, Apple demo video)– Population-interval models (Meddis & Hewitt, Cariani & Delgutte)

• Interval-based representations of the spectrum– Vowels

• Competition between interspike interval patterns– Pitch multiplicity– Masking

• Problems & prospects– The fate of temporal information in the auditory system– Rethinking the problem of pitch: neural timing nets

• Pitch and auditory scene analysis• Formation and separation of auditory objects using temporal patterns

Synthetic vowelsWaveforms Power Spectra Autocorrelations

Pitch periods, 1/F0

0 1 2 3 4Frequency (kHz)

0 10 20Time (ms)

0 5 10 15Interval (ms)

100 Hz125 HzFormant-relatedVowel qualityTimbre

[ae]F0 = 100 Hz

[ae]F0 = 125 Hz

[er]F0 = 100 Hz

[er]F0 = 125 Hz

Synthetic vowels

Pitch : basic properties• Highly precise percepts

– Musical half step: 6% change F0– Minimum JND's: 0.2% at 1 kHz (20 usec time difference, comparable to ITD jnd)

• Highly robust percepts– Robust quality Salience is maintained at high stimulus intensities– Level invariant (pitch shifts < few % over 40 dB range)– Phase invariant (largely independent of phase spectrum, f < 2 kHz)

• Strong perceptual equivalence classes– Octave similarities are universally shared– Musical tonality (octaves, intervals, melodies) 30 Hz - 4 kHz

• Perceptual organization (“scene analysis”)– Fusion: Common F0 is a powerful factor for grouping of frequency components

• Two mechanisms? Temporal (interval-based) & place (rate-based)– Temporal: predominates for periodicities < 4 kH (level-independent, tonal)– Place: predominates for frequencies > 4 kHz(level-dependent, atonal)

Strongpitches

Weakerlow

pitches

Periodic sounds produce distinct pitches

Many different sounds produce the same pitches

Strong• Pure tones• Harmonic complexes• Iterated noise

Weaker• High harmonics• Narrowband noise

Very weak• AM noise• Repeated noise

Pitch as a perceptual emergent

0 200 400 600 800 1000 1200 1400 1600 1800 2000

0 5 10 15 200

5

10

0 5 10 15 20

600

Pure tone200 Hz

0 200 400 600 800 10001200140016001800200

MissingF0

Line spectra Autocorrelation (positive part)

Frequency ranges of (tonal) musical instruments

27 Hz 4 kHz262Hz

440Hz

880Hz

110Hz

10k86543210.50.25

Duplex time-place representations"Pitch is not simply frequency"

Musical tonality: octaves, intervals, melodies

Strong phase-locking (temporal information)

temporal representationlevel-invariant, precise

place representationlevel-dependent, coarse

30 100 1k 10k

Frequency (kHz)

Pitch height and pitch chroma

Please see Figure 1, 2, and 7 in Roger N. Shepard. "Geometrical Approximations to the Structureof Musical Pitch." Psychological Review 89 no. 4 (1982): 305-322.

Duplex time-place representations

temporal representationlevel-invariant

• strong (low fc, low n)• weak (high fc, high n; F0 < 100 Hz)

place-basedrepresentation

level-dependentcoarse

30 100 1k 10k

Similarity to place pattern

Similarity to

intervalpattern

cf. Terhardt'sspectral and virtual pitch

A "two-mechanism" perspective (popular with some psychophysicists)

unresolved harmonicsweak temporal mechanism

phase-dependent; first-order intervals

30 100 1k 10k

place-basedrepresentationslevel-dependent

coarseharm

onic

num

ber

place-basedrepresentation

level-independentfine

resolved harmonicsstrong spectral pattern mechanism

phase-independentrate-place? interval-place?

Dominance regionf, F0

n= 5-10

Rate-place

1/F0

Population interval

Interval

Population-interval

Central spectrum

CF

Central spectrum

CF

Synchrony-place

Interval-place

Local

Global

Somepossibleauditoryrepresentations

Peristimulus time (ms)

10

1

Cha

ract

eris

tic fr

eq. (

kHz)

0 5010 20 30 40

Masking phenomenaLoudness

Pure tone pitch JNDs: Goldstein

Complex tone pitch

Phase-place

Stages of integration

All-at-once

General theories of pitch

1. Distortion theories– reintroduce F0 as a cochlear distortion component (Helmholtz)–sound delivery equipment can reintroduce F0 through distortion–however, masking F0 region does not mask the low pitch (Licklider)–low pitch thresholds and growth of salience with level not consistent with distortion processes (Plomp, Small)–binaurally-created pitches exist

2. Spectral pattern theories–Operate in frequency domain–Recognize harmonic relationson resolved components

3. Temporal pattern theories–Operate in time domain–Analyze interspike interval dists.

AutocorrelationrepresentationTime domain

Stimulus

array of cochlear band-pass filters

interspike intervalsdischarge rates

F0 = 200 HzF0 = 160 HzF0 = 100 Hz

harmonic templates

frequency(linear scale)optimal

match

auditorynerve fibertuning curves

0 155 10Interval (ms)

# in

terv

als

1/F01/F1

Populationinterspike interval

distribution

Pitch → best fitting template

Population rate-place profile

0 3

dB

Frequency (kHz)

50

0

F0= 80 Hz

Power spectrumrepresentation

Frequency domain

Basic problems to be solved• "Hyperacuity problem" • Account for the precision of pitch discriminations given the relatively coarse tunings of auditory neurons (at all levels), especially lower-frequency ones (BFs < 2 kHz)

• "Dynamic range problem"• Account for the ability of listeners to discriminate small fractional changes (∆I/I) in intensity over a large dynamic range, and especially at high SPLs, where the vast majority of firing rates are saturated.

• "Level-invariance problem"• Account for the invariance (and precision) of auditory percepts over large dynamic ranges given the profound changes in neural response patterns that occur over those ranges (rate saturation, rate non-montonicities).

•Pitch equivalence•Account for the ability to precisely match pitches of pure and complex tones (pitch equivalence, metamery) given differences in spectra and under conditions where stimulus levels are roved 20 dB or more

•Multiple pitches•Account for the abilityof the auditory system to simultaneously represent multiple pitches (∆F0 > 10% apart)

•Relative nature of pitch & transpositional invariance•Account for the ability to precisely match pitches an octave apart (and/or to recognize patterns of pitch sequences) in the absence of an ability to identify absolute frequencies/periodicities. Account for ability to recognize transposed melodies as similar.

Neural tuning as a function

of CF

10

8

6

4

2

00

20

40

60

80

100

120

0.1

Characteristic frequency (kHz)

Q 1

0dB

Thre

shol

d to

ne le

vel (

dB S

PL)

1.0 10

Broad tuning and rate saturation at moderate levels in low-CF auditory nerve fibers confounds rate-based resolution of harmonics.

Low SR auditory nerve fiber

Alan Palmer

0300 500 1000

Frequency (Hz)

Spik

es/s

1500 2000

85 dB75 dB

65 dB

55 dB45 dB

35 dB

25 dB

95 dB

B.F. = 1700 Hz

Spon.act.

No sync.

2500 3000

50

100

150

• Schouten’s temporal theory (1940’s) depended on interactions between unresolved (high) harmonics. It was displaced by discovery of dominance region and binaural combination pitches in the 1960’s. The idea persists, however in the form of spectral mechanisms for resolved harmonics and temporal ones for unresolved harmonics.

Temporal pattern theoriesΣ First-order intervals(renewal density)

Please see van Noorden, L. Two channel pitch perception. In Music, Mind and Brain. Editedby M. Clynes. New York: Plenum.

Licklider (1951)Σ All-order intervals (temporal autocorrelation)

Please see Moore, BCJ. An introduction tothe psychology of hearing. 5th ed. San Diego:Academic Press. 2003.

Please see Figure 1 in Meddis, R., and M. J. Hewitt. Virtual pitch and phase sensitivity of acomputer model of the Auditory periphery. I. Pitch identification. J. Acoust. Soc. Am.89 no. 6 (1991): 2866-2882.

B1

C1

A

C2 C3 C4 C5 C6

D1 D2

T

D3 D4 D5 D6

B2 B3 B4 B5 B6 B7

Basic schema of neuronal autocorrelator. A is the inputneuron, B1, B2, B3, ..... is a delay chain.

.....

.....

Interval-basedtheories of pitch

First-order intervals(renewal density)

Please see van Noorden, L. Two channel pitch perception. In Music, Mind and Brain. Editedby M. Clynes. New York: Plenum.

Please see Moore, BCJ. An introduction to the psychology of hearing. 5th ed. San Diego:Academic Press. 2003.

All-order intervals(temporal autocorrelation)

Licklider (1951)

Please see Figure 1 in Meddis, R., and M. J. Hewitt. Virtual pitch and phase sensitivity of acomputer model of the Auditory periphery. I. Pitch identification. J. Acoust. Soc. Am. 89no. 6 (1991): 2866-2882.

B1

C1

A

C2 C3 C4 C5 C6

D1 D2

T

D3 D4 D5 D6

B2 B3 B4 B5 B6 B7


.....

.....

Licklider’s (1951) duplex model of pitch perception

J.C.R. Licklider“Three AuditoryTheories” In Psychology: A Study of a Science, Vol. 1, S. Koch, ed., McGraw-Hill, 1959, 41-144.

Licklider’s binaural triplex model

B1

C1

A

C2 C3 C4 C5 C6

D1 D2

T

D3 D4 D5 D6

B2 B3 B4 B5 B6 B7


.....

.....

(Image removed due to copyright considerations.)

Delay lines

B1

C1

A

C2 C3 C4 C5 C6

D1 D2

T

D3 D4 D5 D6

B2 B3 B4 B5 B6 B7


.....

.....

Autocorrelation and interspike intervals

1/F0

Fundamentalperiod

time lagτ

S(t ) S( t - τ)Corr(τ) =Σt

ShiftMultiplySum the productsfor each delay τto computeautocorrelationfunction

Autocorrelation functions = Histograms ofall-order intervals

Autocorrelationsof spike trains

0000001000000000100000000000000000000100000000001000000000000010000000001000000000000000000001000000000010000000

Please see Figure 6.16A-D in Lyon, Richard, and Shihab Shamma. "Auditory Representationsof Timbre and Pitch." In Auditory Computation. Edited by R. R. Fay. New York: Springer Verlag. 1996.

Correlograms: interval-place displays (Slaney & Lyon)

Please see Slaney, Malcolm, and Richard F. Lyon. "On the Importance of Time -A Temporal Representation of Sound." In Visual Representations of Speech Signals. Edited by M. Crawford. New York: John Wiley. 1993.

Correlograms

Please see Figure 6.17 in Slaney, Malcolm, and Richard F. Lyon. "On the Importance of Time -A Temporal Representation of Sound." In Visual Representations of Speech Signals. Editedby M. Crawford. New York: John Wiley. 1993.

Auditory nerve


10

1

Cha

ract

eris

tic fr

eq. (

kHz)

0 5010 20 30 40

Temporal codingin the auditory nerve

Work with Bertrand DelgutteCariani & Delgutte (1996ab)

Dial-anesthetized cats.100 presentations/fiber60 dB SPL

Population-interval distributions are compiled by summing together intervals from all auditory nerve fibers.

The most common intervals present in the auditory nerve are invariably related to the pitches heard at the fundamentals of harmonic complexes.

The population-interval distribution of the auditory nerveStimulus AutocorrelationStimulus waveform 1/F0

Pitch =1/F0

Interval (ms)151050

# in

terv

als

500


# sp

ikes

250

290280260240 270250

1/F0

.5

.2

1

2

5

Cha

ract

eris

tic fr

eque

ncy

(kH

z)Post-stimulus time histograms All-order interval histograms

Population-interval histogramPopulation-PST histogram

Stimulus Autocorrelation Pitch =1/F0

Interval (ms)151050

# in

terv

als

500

All-order interval histograms

Population-interval histogram

Cha

ract

eris

tic fr

eque

ncy

(kH

z)

.5

.2

1

2

5

1/F0

Fundamentalperiod

time lagτ

S( t ) S( t - τ)Corr(τ) =Σt

ShiftMultiplySum the productsfor each delay τto computeautocorrelationfunction

Autocorrelation functions

= Histograms ofall-order intervals

Autocorrelationsof spike trains

0000001000000000100000000000000000000100000000001000000000000010000000001000000000000000000001000000000010000000

Percept-driven search for neural codes:1. Use stimuli that produce equivalent percepts2. Look for commonalities in neural response3. Eliminate those aspects that are not invariant

Neuralresponsepattern

Metamericstimuli

(same percept,different power spectra)

Stimulus A(AM tone, fm = 200 Hz)

Stimulus B(Click train, F0 = 200 Hz)

Common aspectsof neural response

that covarywith the percept

of interest

Candidate"neural codes"

or representations

Other parametersfor which percept is

invariant

Neural codes, representations:those aspects of the

neural responsethat play a functional role

in subservingthe perceptual distinction

Intensity

Location

Duration

Populationinterval

distribution

Interspikeinterval (ms)

150

Six stimuli that produce a low pitch at 160 Hz

Pure tone 160 Hz

AM tone Fm:160 Hz Fc:640 Hz

Harms 6-12

AM tone

Click train

AM noise

F0 :160 Hz

Fm :160 HzFc:6400 Hz

Fm :160 Hz

Waveform Power spectrum Autocorrelation

Pitch frequency Pitch period

F0 : 160 Hz

Lag (ms)Frequency (Hz)TIme (ms)150

WEAKPITCHES

STRONGPITCHES

Pitch equivalence

Level-Invariance Pitch Equivalence

0

1

2

2

2

1

2

5

Interval (ms)

400

1000

150

0 25All-Order Interval (msec) Interval (ms)

10 15 0 5 10 15

# In

terv

als

40 db SPL

60 db SPL

80 db SPL

E Pure Tone F=160 Hz

Neural Data Autocorrelation

Fc=640 Hz, Fm=160Hz

Fm=160Hz

F0=160 Hz

F AM Tone

G Click Train

H AM Noise

The running population-interval distribution

Population intervalhistograms

(cross sections)

1000

0 155 10

Interval (ms)#

inte

rval

s A

Interval (ms)

1000

0 155 10# in

terv

als B


Please see Figure 1, 2, and 7 in Roger N. Shepard. "Geometrical approximations to the structureof musical pitch." Psychological Review 89 no. 4 (1982): 305-322.

Pitch shift of inharmonic complex tones

800

800

800

800 1 /F m

300 340320

260 300280

220 260240

* *

* *

Interval (ms)0 5 10 15

Lag (ms)0 5 10 15

Peristimulus time (ms)340 380360

1/Fm

Population intervaldistributionsStimulus autocorrelationStimulus waveform

1/Fm

0 1

Freq (kHz)0 1

0 1

0 1

Fm = 125 HzFc = 750 Hz

n = 5.86

n = 5.5

n = 5

n=6

Pitch shift of inharmonic complex tonesPhase-invariant nature of all-order interval code

Cochlear nucleus IV: Pitch shift

.

Unit 35-40 CF: 2.1 kHzThr: 5.3 SR: 17.7

5

10

15Chop-S (PVCN)

45-15-8 CF:4417Thr: -18, SR: 39 s/s

10

20

30

0 500100 200 300 400Peristimulus time (ms)

Pauser (DCN)45-17-4 CF: 408 HzThr: 21.3 SR: 159

Inte

rval

(ms)

0 500100 200 300 400

30

Primarylike(AVCN)


de Boer'srule

(pitches)15sub-

harmonicsof Fc

1/Fc

de Boer'srule

(pitches)

5

10

15

Inte

rval

(ms)

1/Fm

Fm = 125 Hz Fc = 500-750 Hz Pitch ~ de Boer's ruleV ariable-Fc AM t one

500 500750 Fc(Hz)

Pooled ANF (n=47)

0 100 200 300 4000 500100 200 300 400

Dominance region for pitch (harmonics 3-5 or partials 500-1500 Hz)

n = 6 5.86 5.5

AM tone Power spectrum Autocorrelation

PitchEquivalence

Pitch of the"missingfundamental"

Level invariance

Phase invariance

Dominance region

Pitch shift ofinharmonicAM tones

Pitch salienceV. weak pitchWeak pitchStrong pitch

40 dB SPL 60 80

Population-interval representation of pitchat the level of the auditory nerve

Pure tone AM tone Click train AM noise

AM tone

QFM tone

1/F06-12 1/F03-5

F03-5=160 Hz 240 Hz 320 Hz 480 Hz

Summary

Temporal theories - pros & consMake use of spike-timing properties of

elements in early processing (to midbrain at least)Interval-information is precise & robust & level-

insensitiveNo strong neurally-grounded theory of how this

information is used

Unified model: account for pitches of perceptually-resolved & unresolved harmonics in an elegant way (dominant periodicity)

Explain well existence region for F0 (albeit with limits on max interval durations)

Do not explain low pitches of unresolved harmonics

Interval analyzers require precise delays & short coincidence windows

Little or no neural evidence for required analyzers

Pitch classes and perceptual similarity

Innate mechanisms:Harmonic similarity relations

are direct consequencesof the inherent structure

of interval codes and the neurocomputational

mechanisms usedfor their analysis

Associationist theory:Build up harmonic

associationsfrom repeated

exposure to harmoniccomplex tones

A Harmony wheel of Boomsliter and Creel (1962)

Temporal coding and music: a few references

Boomsliter P, Creel W (1962) The long pattern hypothesis in harmony and hearing. JMusic Theory 5:2-31.

Cariani P (2002) Temporal codes, timing nets, and music perception. J New Music Res30:107-136.

McKinney MF, Delgutte B (1999) A possible neurophysiological basis of the octaveenlargement effect. J Acoust Soc Am 106:2679-2692.

McKinney MF, Tramo MJ, Delgutte B (2001) Neural correlates of musical dissonance inthe inferior colliculus. In: Physiological and Psychophysical Bases of AuditoryFunction (Houtsma AJM, ed), pp 71-77. Maastricht: Shaker Publishing.

Ohgushi K (1978) On the role of spatial and temporal cues in the perception of the pitchof complex tones. J Acoust Soc Am 64:764-771.

Ohgushi K (1983) The origin of tonality and a possible explanation for the octaveenlargement phenomenon. J Acoust Soc Am 73:1694-1700.

Patterson RD (1986) Spiral detection of periodicity and the spiral form of musical scales.Psychology of Music 14:44-61.

Rose JE (1980) Neural correlates of some psychoacoustical experiences. In: NeuralMechanisms of Behavior (McFadden D, ed), pp 1-33. New York: SpringerVerlag.

Tramo MJ, Cariani PA, Delgutte B, Braida LD (2001) Neurobiological foundations forthe theory of harmony in western tonal music. Ann N Y Acad Sci 930:92-116.

INTERVALDISTRIBUTIONS

ANDOCTAVE

SIMILARITY

Octavesimilarity

Please see Cariani, Peter. Journal of New Music Research 30, no. 2 (2001): 107-135.


Please see Figure 1, 2, and 7 in Roger N. Shepard. "Geometrical Approximations to the Structure of Musical Pitch." Psychological Review 89, no. 4 (1982): 305-322.


Musical tonal relations

Please see Shepard, Roger N. "Geometrical Approximations to the Structure of Musical Pitch." Psychological Review 89, no. 4 (1982): 305-322.


Note-chordrelations(Harms 1-12)


Vowel properties Waveforms Power Spectra Autocorrelations

Pitch periods, 1/F0

0 1 2 3 4Frequency (kHz)

0 10 20Time (ms)

0 5 10 15Interval (ms)

100 Hz 125 HzFormant-relatedVowel qualityTimbre

[ae]F0 = 100 Hz

[ae]F0 = 125 Hz

[er]F0 = 100 Hz

[er]F0 = 125 Hz

Phase-locking of discharges in the auditory nerve


10

1

Cha

ract

eris

tic fr

eq. (

kHz)

0 5010 20 30 40

Time domain analysis of auditory-nerve fiber firing rates.Hugh Secker-Walker & Campbell Searle, J. Acoust. Soc. 88(3), 1990Neural responses to /da/ @ 69 dB SPL from Miller and Sachs (1983)

High CFs

Low CFs

F1

F2

F3


Reprinted with permission from Secker-Walker HE, Searle CL. 1990. Time-domain analysis of auditory-nerve-fiber firing rates.J. Acoust. Soc. Am. 88 (3): 1427-36. Copyright 1990, Acoustical Society of America. Used with permission.

Vowels

.

Formant-structure

0 5 10 15

Voice pitch

1/F1

Interval (ms)

Formant-structure

# i n t e r v a l s

Population interval histogram0 5 10 15

01

Signal autocorrelation [ae]Voice pitch

Population-interval coding of timbre (vowel formant structure)

[ α]

0 5

[ u]

Interval (ms)

[ ∋]

[ æ]

0 5

Population-wide distributions ofshort intervals for 4 vowels

# i n t e r v a l s

m a g n i t u d e

Coding of vowel quality (timbre)(Reprinted with permission from Secker-Walker HE, Searle CL. 1990. Time-domain analysis of auditory-nerve-fiber firing rates.J. Acoust. Soc. Am. 88 (3): 1427-36. Copyright 1990, Acoustical Society of America.)

Population-interval representations thus far

• Periodicity: capable of accounting for a wide range of pure and complex tone pitches that are heard; salience model less developed.

• Spectrum: capable of robustly representing the stimulus spectrum below 4 kHz (our neural data, Palmer neural data Meddis simulations)

• Sound separation: multiple interval bands apparent for ∆F0 > 10%

Level ( dB SPL )

Avg. rate (s/s)

250

COMPRESSION

50

800

RECTIFICATION

1 100.1Frequency (kHz)

1INPUT SIGNAL FILTER BANK

1

100 1k 10k

Mag

LOW PASS FILTER

0 5 10 15 20 25 30 35 40 45 50100

1000

8000

500

Peris timulus t ime (ms)

Cen

ter f

requ

enc

PST NEUROGRAM

0 5 100

2

4

POPULATION- INTERVAL

DiSTRIBUTION

All-order interspike interval (ms)#

inte

rval

s/m

ean

Frequency (Hz)

15

Auditory nerve model

Basic features:Gammatone filtersAdaptive gain controlMapping of output amplitude to firing rates

Spontaneous activity

Please see Figure 60 and 61 in Keidel, D., Wolf., S. Kallert, M. Korth. The physiological basis of hearing: a review. Edited by Larry Humes. New York: Thieme-Stratton, Stuttgart, GeorgeThiemeVerlag, 1983. ISBN: 0865770727.

Please see Hind, J. E., D. J. Anderson, J. F. Brugge, and J. E. Rose. "Coding of Information Pertaining to Paired Low-frequency Tones in Single Auditory Nerve Fibers of the Squirrel Monkey." J Neurophysiol 30 (July, 1967): 794 -816.

Phase-locking to a 300 Hz Pure Tone

Spike

Stimulus Waveform (0.3 kHz)

Period Histogram(1100 Hz)

# Spikes

1.1 kHz90dB SPL

200

100

0

0 5 10 15 20ms

0

50

100

150

Num

ber

of I

nter

vals

Interval Duration (ms)

First-order Interval Histogram

1.5 kHz80dB SPL

Stimulus Period

Time

Temporal discharge patterns 40-90 dB SPL

Please see Rose, Hind, Anderson, Brugge J. Neurophysiology 34 (1971): 685-99. Reprinted inD. Keidel, Wolf., S. Kallert, and M. Korth. The Physiological Basis of Hearing: A Review. Edited by Larry Humes. New York: Thieme-Stratton, Stuttgart, George ThiemeVerlag, 1983. ISBN: 0865770727.

Medium-SR model fiber

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

-200

0

200

30 dB SPL

50 dB SPL

70 dB SPL

90 dB SPL

Period time (ms)

Temporal discharge patterns to pure tone dyad: 600 & 800 Hz

Please see Rose, Hind, Anderson, Brugge J. Neurophysiology 34 (1971): 685-99. Reprinted inD. Keidel, Wolf., S. Kallert, and M. Korth. The Physiological Basis of Hearing: A Review. Edited by Larry Humes. New York: Thieme-Stratton, Stuttgart, George ThiemeVerlag, 1983. ISBN: 0865770727.

Physiological vs. simulated activity

0 5 10 15 20 25 30 35 40 45 50100

1000


PST NEUROGRAM

1000 2000 3000 40000

50

100

150

200

CF (Hz)

rate

(s/s

)

RATE-PLACEPROFILE

0 5 10 15 20 250

1

2

3

4

5

Interval (ms)

POPULATION-INTERVALDISTRIBUTION

4000

500

SIMULATED RESPONSES

Single-formant vowelF0 = 80 HzF1 = 640 Hz60 dB SPL

CATDATA


1

10

.3

5010 20 30 400

0 255 10 15 20Interspike interval (ms)

1/F01/F1

Rate-placeprofile

Characteristic frequency (kHz)0.1 1 10

180

100

0

Frequency (kHz)

C

D

F1E

F2500

50

0

F0

Pow

e(d

B)

A

B

Pitch period

1/F0 = Pitch period

Simulated vs. observed auditory nerve activity We are mainly interested here in replicating basic phase-locked time-frequency patterns.

0 5 10 15 20 25 3010

2

103

104

Cha

ract

eris

tic fr

eque

ncy

(Hz)


NEUROGRAM

102

103

104

0

100

200

300

Characteristic frequency (Hz)

Dis

char

ge r

ate

(s/s

)

RATE-PLACE PROFILE

0 0.005 0.010

1

2

3

4

Interval (ms)

peak

/mea

n

POPULATION-INTERVAL DISTRIBUTION

Competition between soundsCompetition between respective interval patterns

Towards interval-based accounts of interacting sounds:• mutual masking• pitch multiplicity• roots of dyads and triads (chords)

Pitch masking

s/n: 8 dB p/b = 1.11

s/n: 20 dB p/b = 1.57

s/n: 32 dB p/b = 2.38

s/n: no noise p/b =3.26

Inte

rval

s (m

s)In

terv

als

(ms)

Variable-F0 click train in broad-band noise Click train: F0= 160-320 Hz, positive polarity, 80 dB SPL Peak/background ratio: intervals @ 1/ F0 ± 150 usec/mean over all intervals Informal pitch thresholds, s/n dB: 12 (MT), 13 ( PC), 16 ( BD)


What degree of temporal correlation is necessary for pitch to become audible?

Click trains were presented in white noise (0-11 kHz, 70 dB total SPL) at different s/n ratios. My threshold for pitch detection with low-isolation headphones is -6dB.

Masking audiograms

Wegel & Lane, 1924

f1 + f2 together40

60

80

100

120

140

160

180

1

1.5

2

2.5

1

1.5

2

2.5

1000 1100 1200 1300 1400 1500

1

1.5

2

2.5

0

1

2

x 106

0

1

2

x 106

0 5 10 150

1

2

x 106

We assume that a pitch can be heard if its salience is >1.3.Masking functions can be then estimated for those conditions

that involve pitch detection (vs. ∆ loudness)

0 20 40 60 80 100 120 140 160 180 200102

103

104

Characteristic frequency (Hz)

Dis

char

ge ra

te (s

/s)

RATE-PLACE PROFILE

0 0.01 0.02 0.030

1

2

3

Interval (ms)

peak

/mea

n

POPULATION-INTERVAL DISTRIBUTION

500 Hz tone 65 dB SPL

Cochlear dead zone < 2 kHz

0 5 10 1505

10x 105

Interval (ms)

30

50

55

60

65

70

90

SPL

0 5 10 150

10x 105

Interval (ms)

CFs > 2kHz CFs > 100 Hz

Evolution of population-interval models

• Analysis of peaks -> whole histograms

• Comparisons between whole histograms– Pattern similarity (octave, probe-tone:chord, FF timing nets)

• Importance of limited temporal windows– Finite frequency resolution of autocorrelation analysis

– Carlyon's experiments: low F0's of high harmonics

– Lower limits of F0-pitch (30 Hz, Pressnitzer & Patterson)

• Pitch multiplicity and salience estimation

• CF-dependent temporal analysis windows• Recurrent timing nets - periodicity & object formation/sep.

are intimately linked mechanisms

Divergences between autocorrelation & interval models

0 20 40 0 10 20 30 0 10 20 300123

Masking of click patterns by random clicks

ABX 1.28KX 1.30KXX 1.21KXXX 1.20

Physiological and functional representations

Rate-place profiles

Interval-place Spatiotemporal pattern

(Place & time)Global interval

Central spectrum Frequency-domain

representation

Central autocorrelation Time-domain

representation

Different representations can support analogous strategies for pitch extraction, recognition, and comparison

Central spectrum Frequency-domain

representation

Central autocorrelation Time-domain

representation

Explicitidentificationof individualharmonics &

deduction of F0 via common

subharmonics or pattern recognition

Template-based global recognition of

pitch-related patterns (neural networks,

harmonic templates, interval sieves)

Relative pitchcomparison mechanisms (matching,

octaves, musical intervals)

Spectral patterns Temporal patterns

Neural timing nets

RECURRENT TIMING NETS• Build up pattern invariances• Detect periodic patterns• Separate auditory objects

FEED-FORWARD TIMING NETS• Temporal sieves• Extract (embedded) similarities• Multiply autocorrelations

Relative delay τ

Timet

Sj(t)

Si(t)

Si(t) Sj(t - τ)individual

multiplicativeterm

Si(tm) Sj(tm - t)Στconvolution

time-seriesterm m

two setsof input

spike trains

Input time sequence

All time delays presentτ0

τ1

τ2

τ3

Coincidenceunits

Direct inputs

Recurrent,indirect inputs

Time patterns reverberate through delay loops

0lagi leads j j leads i

Delay τRelative delay between inputs i, j

Sj(t)

Si(t)

Στ Si(tm) Sj(tm - τ)

Σt

Si(t) Sj(t - τn)

Timet

cross-correlationlag term n

Si(t) Sj(t - τ)individual

multiplicativeterm

convolutiontime-series

term m

coincidencesof two nearlysimultaneous

pulses requiredto produce

output spikein coincidence

detectors

population autocorrelation

Σt

Στ

[Si(t) Sj(t - τ) Si(t-Τ) Sj(t - τ - Τ)]

Feedforward coincidence net

Relative delay τ

Timet

Sj(t)

Si(t)

Si(t) Sj(t - τ)

convolution time-series

term

two sets of input

spike trainsM = 2 τ0 intervals N = 3 τ0 intervals

τ0

Population interval outputM*N= 2*3 = 6 τ0 intervals

C Higher-order interval

sequence

B

Individual output lines contain higher ordersequence if and only if present in both inputs

• Networks act as temporal sieves, passing only common patterns• These are the only networks for which it is irrelevant which

particular neurons are activated• Signals are no longer tied to particular transmission lines• Processing on the level of ensemble mass-statistics is made possible• Extraction of complex, embedded patterns is made possible• Time-domain multiplexing is greatly facilitated

Common timbre

0 5 10 15 0 5 10 15

[ae] - 100

[ae] - 125

[er] - 125

[er] - 100

Interval (ms)0 5 10 15 0 5 10 15

Population autocorrelations of the outputof a coincidence array for all vowel combinations

Same vowel (same F0s & formants)

Same F0s, different formants

Same formants, different F0s

Different F0s and formants

[ae] - 100

[ae] - 125

[er] - 100

[er] - 125

Detection of arbitrary periodic patternsPeriodic patterns invariably build up indelay loops whose recurrence times equals the period of the pattern and its multiples.

τ0

τ2

τ3

Input pattern

1010110010110101100101101011001011010...

τ1 = 11 ms = recurrence time of input pattern10101100101

τ1

Time patterns reverberate throughdelays loops

Input waveform

Direct inputs

All time delay present

Recurrent, indirect inputs

Coincidence units

τ3

τ1 = 11 timesteps = recurrence timeof repeating 10101100101 pattern

Buildupof activationin loop withrecurrence timeof 11 timesteps

1010110010110101100101101011001011010...Timesteps

0 200100

30

1

Cha

nnel

recu

rren

ce ti

me

Figure 8. Recurrent tming nets. Top. Behavior of a simple recurrent timing net for periodic pulsetrain patterns. The network generates many sets of expectations. The delay loop whose recurrencetime equals the period of the pattern builds up that pattern. Below. Response of a recurrent timingnet to the beat pattern of La Marseillaise. Arrows indicate periodic subpatterns at 16, 32, and 64timesteps that are built up by the network. The example points to potential applications of recurrenttiming nets for rhythm induction and analysis. From Cariani, 1999, working paper on timing netsand rhythm.

τ0

τ1

Timesteps (samples)

1100110001000100010001000000110011001100010000000100110000000000...

Delay loop(recurrence t ime,

in samples)

La Marseillaise rhyt hm

Periodicpattern builds up

Rhythmicsubpatterns

Build-up and separation of two auditory objects

Two vowels with different fundamental frequencies (F0s) are added togetherand passed through the simple recurrent timing net. The two patterns build upIn the delay loops that have recurrence times that correspond to their periods.

Vowel [er]F0 = 125 HzPeriod = 8 ms

Period = 10 ms

Vowel [ae]F0 = 100 Hz

Time (ms)

Cha

ract

eris

tic d

elay

cha

nnel

(ms)

0 10 20 30 40 50Time (msec)

Pitch and auditory scene analysis

Two different strategies for scene analysis:1. Group channels by common properties (local features)2. Build up invariant temporal patterns as objects (invariant relations)

Separation of multiple recurrent time patterns:two vowels with different F0's

0 10 20 30 40 50Time (msec)

Vowel [ae]F0 = 100 Hz

Double vowel[ae]+[er]

Vowel [er]F0 = 125 Hz

Recurrenttiming netwith single input channel

Revised buildup rule:Min(direct, circulating) plus a fraction of their absolute difference

Auditory "pop-out"phenomena suggest a period-by-periodwaveform comparison

Last 2 periods - first 2

transientdisparity

ongoingdisparity

Traditional approach (Frequency domain)

Segregate frequency channels

Assign channels to objects

AE-ER

AEER

Double vowels drive the same frequency channelsin the auditory nerve

FREQUENCY CHANNELS

0 50Peristimulus time (ms)

PST NEUROGRAM ae100-er106

1k

4k

125

CF (Hz)

The information needed to separate sounds on the basis of F0 differences

is multiplexed in the phase-locked fine time structure of the neural firing patterns

(Traditional spectrographic approaches throw awaythis information at an early stage.)

FREQUENCY CHANNELS



1k

4k

125

CF (Hz)

How might a neural system (e.g. the auditory system) exploit this information?

Strongly agree that we shouldn't be limited by ignorance of auditory system, but we need to reverse-

engineer that system if we are to find new functional principles

Behavior of the loop array

Activity in the loop arrayis described in terms of3 dimensions:• frequency channel• delay loop• peristimulus time

15

Summary autocorrelogram(collapse over all CFs)

LoopDelay(ms)

0 50 100Peristimulus time (ms)

10

8ae100(10 ms period)

6er106

(9.4 ms period)

Neural spectrogram(collapse over all delay loops)

CF(Hz)

4k

125

time

Frequency

Loopdelay

Processing of tonotopically-organized inputs

Σ Σ Σ Σ

τ

FREQUENCYCHANNELS

0

0.05

0

0.05

0

er106

0 5 10 15

Interval (ms)0

5

ae100

Loop delays = 9.7 ms(period of er106)

Loop delays = 10 ms(period of ae100)

COMPUTECORRELATIONCOEFFICIENT

FORSHORT

INTERVALS

0 5 10 15

Interval (ms)

POPULATION-INTERVALDISTRIBUTIONS FORDIFFERENT DELAYS

COMBINE INTERVALSFOR EACH DELAYFROM ALL FREQ'S

BUILD UP CORRELATIONPATTERNS

IN DELAY LOOPS FREQUENCY-BY-FREQUENCY

POPULATION-INTERVALDISTRIBUTIONS FOR

CONSTITUENT SINGLE VOWELSPRESENTED INDIVIDUALLY



1k

4k

125

CF(Hz)

2

3

2000 Peristimulus time (ms)

CF(Hz)

H(t,t+t) = min(X(CF, t), H(t,t)) + 0.1*abs(X(CF, t) - H(t,t));

X(CF, t)H(τ,t) H(τ,t+τ)

BUILDUP RULE:pattern entering the delay loop is theminimum of direct & circulating inputs,plus a fraction of their difference

direct

circulating

loopsignalH(τ,t)

loopsignal

Concurrent vowels

WAVEFORM SPECTRUM AUTOCORRELATION

0 5 10 15 20Time (ms)

Am

plitu

de

102 103-20

-10

0

10

20

30

Autocorrelation lag (ms)

Mag

nitu

de

ae-100

er-124

ae-100 +

er-124

Separation of objects with increasing ∆F0

∆F0 increases0 10.5semitones 4

Those sets of delay loops whoserecurrence times equal thefundamental periods of the twovowels are invariably activated themost.

As time goes on, the signalspresent in these sets of delay loopsare amplified.

When the two vowels have the samefundamental (100 Hz), only one setof delay loops is activated (10 msdelay).

When the two vowels havefundamentals separated by asemitone or more (ae100-er106 &ae100-er126) two sets of delay loopsare activated. The recurrence times ofthese sets of delay loops equal therespective vowel periods (10, 9.4, 7.9ms) .

ae100-er100

0 5 10 150

0.1

ae100-er103

0 5 10 15

Loop delay (ms)

ae100-er106

0 5 10 15

ae100-er126

0 5 10 15

10-30 ms pst

30-50 ms pst

80-100 ms pst

Simulation performance vs. psychophysics

0 1 2 3 40

0.2

0.4

0.6

0.8

1

0 1 2 3 40

0.2

0.4

0.6

0.8

1

0 1 2 3 40

0.2

0.4

0.6

0.8

1

0 1 2 3 40

0.2

0.4

0.6

0.8

1

0 1 2 3 40

0.2

0.4

0.6

0.8

1

0 1 2 3 40

0.2

0.4

0.6

0.8

1

² F0 (semitones) ² F0 (semitones)

coef

ficie

nt

ae-ah ae-ee

ae-er ah-ee

ah-er er-ee

2nd vowel

1st vowel

0-200 MS INTEGRATION TIME

For 200 ms integrations, one (dominant) vowelproduced high correlations for all ²F0's.

Correlations of nondominant vowels increasedas ²F0's increased to a semitone, then levelling off.

Assmann &Summerfield

ScheffersB

oth

vow

els

corr

ect (

%)

Mea

ns o

ver v

owel

set

sZwicker

0 1 2 3 4

100

80

60

40

20

0

F0 Separation (semitones)

Identification of double vowels

Simulations results Psychophysical observations

Listeners almost invariably get one vowel correct.

Correct identification of both vowels is slightlyenhanced (~15%) by F0 differences of a semitone ormore, where performance saturates.

Potential applications

• Pitch tracking of speakers& recovery of waveforms

• Separation of multiple speakers for speech recognition

• Enhance voiced speech segments in background noise

• Separation of tonal music from background noise

• Analogous 2-D nets for image stabilization and object formation from correlational invariants

• Reverberating temporal memories via transiently-facilitated synapses: periodic stimuli dynamically create loops on the fly

Conclusions

• Population-wide distributions of all-order interspike intervals in early stations of the auditory pathway explain many diverse aspects of pitch and timbre perception: level-invariance, discriminatory precision, pitch equivalence, octave similarity, pitch fusion.

• These distributions form autocorrelation-like temporal representations that convey spectral information up to the limits of strong phase-locking (~5 kHz).

• The mechanisms by which the central auditory system might make use of this information remain enigmatic.

• Neural timing nets can be envisioned that operate entirely in the time domain to carry out musical interval processing and scene analysis.

Reading/assignment for next meeting

• Thursday. Feb. 26• Timbre

• Reading:Moore Chapter 8, pp. 269-273Chapter in Deutsch by Risset & Wessel on Timbre

(first sections up to p. 113-118)Handel, chapter on Identification

hst 725 lecture 6 temporal models for musical pitchpitch : basic properties • highly precise...

Documents