hst 725 lecture 6 temporal models for musical pitchpitch : basic properties • highly precise...
TRANSCRIPT
www.cariani.com
HST 725 Lecture 6 Temporal models for
musical pitch
Harvard-MIT Division of Health Sciences and TechnologyHST.725: Music Perception and CognitionProf. Peter Cariani
(Image removed due to copyright considerations.)
The search for the missing fundamental: theories & models of musical pitch
• Temporal theories of pitch– Residues: Beatings of unresolved harmonics (Schouten, 1940’s)– Problems with residues and envelopes– Temporal autocorrelation models (Licklider, 1951)– Interspike interval models (Moore, 1980)– Correlogram demonstration (Slaney & Lyon, Apple demo video)– Population-interval models (Meddis & Hewitt, Cariani & Delgutte)
• Interval-based representations of the spectrum– Vowels
• Competition between interspike interval patterns– Pitch multiplicity– Masking
• Problems & prospects– The fate of temporal information in the auditory system– Rethinking the problem of pitch: neural timing nets
• Pitch and auditory scene analysis• Formation and separation of auditory objects using temporal patterns
Synthetic vowelsWaveforms Power Spectra Autocorrelations
Pitch periods, 1/F0
0 1 2 3 4Frequency (kHz)
0 10 20Time (ms)
0 5 10 15Interval (ms)
100 Hz125 HzFormant-relatedVowel qualityTimbre
[ae]F0 = 100 Hz
[ae]F0 = 125 Hz
[er]F0 = 100 Hz
[er]F0 = 125 Hz
Synthetic vowels
Pitch : basic properties• Highly precise percepts
– Musical half step: 6% change F0– Minimum JND's: 0.2% at 1 kHz (20 usec time difference, comparable to ITD jnd)
• Highly robust percepts– Robust quality Salience is maintained at high stimulus intensities– Level invariant (pitch shifts < few % over 40 dB range)– Phase invariant (largely independent of phase spectrum, f < 2 kHz)
• Strong perceptual equivalence classes– Octave similarities are universally shared– Musical tonality (octaves, intervals, melodies) 30 Hz - 4 kHz
• Perceptual organization (“scene analysis”)– Fusion: Common F0 is a powerful factor for grouping of frequency components
• Two mechanisms? Temporal (interval-based) & place (rate-based)– Temporal: predominates for periodicities < 4 kH (level-independent, tonal)– Place: predominates for frequencies > 4 kHz(level-dependent, atonal)
Strongpitches
Weakerlow
pitches
Periodic sounds produce distinct pitches
Many different sounds produce the same pitches
Strong• Pure tones• Harmonic complexes• Iterated noise
Weaker• High harmonics• Narrowband noise
Very weak• AM noise• Repeated noise
Pitch as a perceptual emergent
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0 5 10 15 200
5
10
0 5 10 15 20
600
Pure tone200 Hz
0 200 400 600 800 10001200140016001800200
MissingF0
Line spectra Autocorrelation (positive part)
Frequency ranges of (tonal) musical instruments
27 Hz 4 kHz262Hz
440Hz
880Hz
110Hz
10k86543210.50.25
Duplex time-place representations"Pitch is not simply frequency"
Musical tonality: octaves, intervals, melodies
Strong phase-locking (temporal information)
temporal representationlevel-invariant, precise
place representationlevel-dependent, coarse
30 100 1k 10k
Frequency (kHz)
Pitch height and pitch chroma
Please see Figure 1, 2, and 7 in Roger N. Shepard. "Geometrical Approximations to the Structureof Musical Pitch." Psychological Review 89 no. 4 (1982): 305-322.
Duplex time-place representations
temporal representationlevel-invariant
• strong (low fc, low n)• weak (high fc, high n; F0 < 100 Hz)
place-basedrepresentation
level-dependentcoarse
30 100 1k 10k
Similarity to place pattern
Similarity to
intervalpattern
cf. Terhardt'sspectral and virtual pitch
A "two-mechanism" perspective (popular with some psychophysicists)
unresolved harmonicsweak temporal mechanism
phase-dependent; first-order intervals
30 100 1k 10k
place-basedrepresentationslevel-dependent
coarseharm
onic
num
ber
place-basedrepresentation
level-independentfine
resolved harmonicsstrong spectral pattern mechanism
phase-independentrate-place? interval-place?
Dominance regionf, F0
n= 5-10
Rate-place
1/F0
Population interval
Interval
Population-interval
Central spectrum
CF
Central spectrum
CF
Synchrony-place
Interval-place
Local
Global
Somepossibleauditoryrepresentations
Peristimulus time (ms)
10
1
Cha
ract
eris
tic fr
eq. (
kHz)
0 5010 20 30 40
Masking phenomenaLoudness
Pure tone pitch JNDs: Goldstein
Complex tone pitch
Phase-place
Stages of integration
All-at-once
General theories of pitch
1. Distortion theories– reintroduce F0 as a cochlear distortion component (Helmholtz)–sound delivery equipment can reintroduce F0 through distortion–however, masking F0 region does not mask the low pitch (Licklider)–low pitch thresholds and growth of salience with level not consistent with distortion processes (Plomp, Small)–binaurally-created pitches exist
2. Spectral pattern theories–Operate in frequency domain–Recognize harmonic relationson resolved components
3. Temporal pattern theories–Operate in time domain–Analyze interspike interval dists.
AutocorrelationrepresentationTime domain
Stimulus
array of cochlear band-pass filters
interspike intervalsdischarge rates
F0 = 200 HzF0 = 160 HzF0 = 100 Hz
harmonic templates
frequency(linear scale)optimal
match
auditorynerve fibertuning curves
0 155 10Interval (ms)
# in
terv
als
1/F01/F1
Populationinterspike interval
distribution
Pitch → best fitting template
Population rate-place profile
0 3
dB
Frequency (kHz)
50
0
F0= 80 Hz
Power spectrumrepresentation
Frequency domain
Basic problems to be solved• "Hyperacuity problem" • Account for the precision of pitch discriminations given the relatively coarse tunings of auditory neurons (at all levels), especially lower-frequency ones (BFs < 2 kHz)
• "Dynamic range problem"• Account for the ability of listeners to discriminate small fractional changes (∆I/I) in intensity over a large dynamic range, and especially at high SPLs, where the vast majority of firing rates are saturated.
• "Level-invariance problem"• Account for the invariance (and precision) of auditory percepts over large dynamic ranges given the profound changes in neural response patterns that occur over those ranges (rate saturation, rate non-montonicities).
•Pitch equivalence•Account for the ability to precisely match pitches of pure and complex tones (pitch equivalence, metamery) given differences in spectra and under conditions where stimulus levels are roved 20 dB or more
•Multiple pitches•Account for the abilityof the auditory system to simultaneously represent multiple pitches (∆F0 > 10% apart)
•Relative nature of pitch & transpositional invariance•Account for the ability to precisely match pitches an octave apart (and/or to recognize patterns of pitch sequences) in the absence of an ability to identify absolute frequencies/periodicities. Account for ability to recognize transposed melodies as similar.
Neural tuning as a function
of CF
10
8
6
4
2
00
20
40
60
80
100
120
0.1
Characteristic frequency (kHz)
Q 1
0dB
Thre
shol
d to
ne le
vel (
dB S
PL)
1.0 10
Broad tuning and rate saturation at moderate levels in low-CF auditory nerve fibers confounds rate-based resolution of harmonics.
Low SR auditory nerve fiber
Alan Palmer
0300 500 1000
Frequency (Hz)
Spik
es/s
1500 2000
85 dB75 dB
65 dB
55 dB45 dB
35 dB
25 dB
95 dB
B.F. = 1700 Hz
Spon.act.
No sync.
2500 3000
50
100
150
ANFs
• Schouten’s temporal theory (1940’s) depended on interactions between unresolved (high) harmonics. It was displaced by discovery of dominance region and binaural combination pitches in the 1960’s. The idea persists, however in the form of spectral mechanisms for resolved harmonics and temporal ones for unresolved harmonics.
Temporal pattern theoriesΣ First-order intervals(renewal density)
Please see van Noorden, L. Two channel pitch perception. In Music, Mind and Brain. Editedby M. Clynes. New York: Plenum.
Licklider (1951)Σ All-order intervals (temporal autocorrelation)
Please see Moore, BCJ. An introduction tothe psychology of hearing. 5th ed. San Diego:Academic Press. 2003.
Please see Figure 1 in Meddis, R., and M. J. Hewitt. Virtual pitch and phase sensitivity of acomputer model of the Auditory periphery. I. Pitch identification. J. Acoust. Soc. Am.89 no. 6 (1991): 2866-2882.
B1
C1
A
C2 C3 C4 C5 C6
D1 D2
T
D3 D4 D5 D6
B2 B3 B4 B5 B6 B7
Basic schema of neuronal autocorrelator. A is the inputneuron, B1, B2, B3, ..... is a delay chain.
.....
.....
Interval-basedtheories of pitch
First-order intervals(renewal density)
Please see van Noorden, L. Two channel pitch perception. In Music, Mind and Brain. Editedby M. Clynes. New York: Plenum.
Please see Moore, BCJ. An introduction to the psychology of hearing. 5th ed. San Diego:Academic Press. 2003.
All-order intervals(temporal autocorrelation)
Licklider (1951)
Please see Figure 1 in Meddis, R., and M. J. Hewitt. Virtual pitch and phase sensitivity of acomputer model of the Auditory periphery. I. Pitch identification. J. Acoust. Soc. Am. 89no. 6 (1991): 2866-2882.
B1
C1
A
C2 C3 C4 C5 C6
D1 D2
T
D3 D4 D5 D6
B2 B3 B4 B5 B6 B7
Basic schema of neuronal autocorrelator. A is the inputneuron, B1, B2, B3, ..... is a delay chain.
.....
.....
Licklider’s (1951) duplex model of pitch perception
J.C.R. Licklider“Three AuditoryTheories” In Psychology: A Study of a Science, Vol. 1, S. Koch, ed., McGraw-Hill, 1959, 41-144.
Licklider’s binaural triplex model
B1
C1
A
C2 C3 C4 C5 C6
D1 D2
T
D3 D4 D5 D6
B2 B3 B4 B5 B6 B7
Basic schema of neuronal autocorrelator. A is the inputneuron, B1, B2, B3, ..... is a delay chain.
.....
.....
(Image removed due to copyright considerations.)
Delay lines
B1
C1
A
C2 C3 C4 C5 C6
D1 D2
T
D3 D4 D5 D6
B2 B3 B4 B5 B6 B7
Basic schema of neuronal autocorrelator. A is the inputneuron, B1, B2, B3, ..... is a delay chain.
.....
.....
Autocorrelation and interspike intervals
1/F0
Fundamentalperiod
time lagτ
S(t ) S( t - τ)Corr(τ) =Σt
ShiftMultiplySum the productsfor each delay τto computeautocorrelationfunction
Autocorrelation functions = Histograms ofall-order intervals
Autocorrelationsof spike trains
0000001000000000100000000000000000000100000000001000000000000010000000001000000000000000000001000000000010000000
Please see Figure 6.16A-D in Lyon, Richard, and Shihab Shamma. "Auditory Representationsof Timbre and Pitch." In Auditory Computation. Edited by R. R. Fay. New York: Springer Verlag. 1996.
Correlograms: interval-place displays (Slaney & Lyon)
Please see Slaney, Malcolm, and Richard F. Lyon. "On the Importance of Time -A Temporal Representation of Sound." In Visual Representations of Speech Signals. Edited by M. Crawford. New York: John Wiley. 1993.
Correlograms
Please see Figure 6.17 in Slaney, Malcolm, and Richard F. Lyon. "On the Importance of Time -A Temporal Representation of Sound." In Visual Representations of Speech Signals. Editedby M. Crawford. New York: John Wiley. 1993.
Auditory nerve
Peristimulus time (ms)
10
1
Cha
ract
eris
tic fr
eq. (
kHz)
0 5010 20 30 40
Temporal codingin the auditory nerve
Work with Bertrand DelgutteCariani & Delgutte (1996ab)
Dial-anesthetized cats.100 presentations/fiber60 dB SPL
Population-interval distributions are compiled by summing together intervals from all auditory nerve fibers.
The most common intervals present in the auditory nerve are invariably related to the pitches heard at the fundamentals of harmonic complexes.
The population-interval distribution of the auditory nerveStimulus AutocorrelationStimulus waveform 1/F0
Pitch =1/F0
Interval (ms)151050
# in
terv
als
500
Peristimulus time (ms)
# sp
ikes
250
290280260240 270250
1/F0
.5
.2
1
2
5
Cha
ract
eris
tic fr
eque
ncy
(kH
z)Post-stimulus time histograms All-order interval histograms
Population-interval histogramPopulation-PST histogram
Stimulus Autocorrelation Pitch =1/F0
Interval (ms)151050
# in
terv
als
500
All-order interval histograms
Population-interval histogram
Cha
ract
eris
tic fr
eque
ncy
(kH
z)
.5
.2
1
2
5
1/F0
Fundamentalperiod
time lagτ
S( t ) S( t - τ)Corr(τ) =Σt
ShiftMultiplySum the productsfor each delay τto computeautocorrelationfunction
Autocorrelation functions
= Histograms ofall-order intervals
Autocorrelationsof spike trains
0000001000000000100000000000000000000100000000001000000000000010000000001000000000000000000001000000000010000000
Percept-driven search for neural codes:1. Use stimuli that produce equivalent percepts2. Look for commonalities in neural response3. Eliminate those aspects that are not invariant
Neuralresponsepattern
Metamericstimuli
(same percept,different power spectra)
Stimulus A(AM tone, fm = 200 Hz)
Stimulus B(Click train, F0 = 200 Hz)
Common aspectsof neural response
that covarywith the percept
of interest
Candidate"neural codes"
or representations
Other parametersfor which percept is
invariant
Neural codes, representations:those aspects of the
neural responsethat play a functional role
in subservingthe perceptual distinction
Intensity
Location
Duration
Populationinterval
distribution
Interspikeinterval (ms)
150
Six stimuli that produce a low pitch at 160 Hz
Pure tone 160 Hz
AM tone Fm:160 Hz Fc:640 Hz
Harms 6-12
AM tone
Click train
AM noise
F0 :160 Hz
Fm :160 HzFc:6400 Hz
Fm :160 Hz
Waveform Power spectrum Autocorrelation
Pitch frequency Pitch period
F0 : 160 Hz
Lag (ms)Frequency (Hz)TIme (ms)150
WEAKPITCHES
STRONGPITCHES
Pitch equivalence
Level-Invariance Pitch Equivalence
0
1
2
2
2
1
2
5
Interval (ms)
400
1000
150
0 25All-Order Interval (msec) Interval (ms)
10 15 0 5 10 15
# In
terv
als
40 db SPL
60 db SPL
80 db SPL
E Pure Tone F=160 Hz
Neural Data Autocorrelation
Fc=640 Hz, Fm=160Hz
Fm=160Hz
F0=160 Hz
F AM Tone
G Click Train
H AM Noise
The running population-interval distribution
Population intervalhistograms
(cross sections)
1000
0 155 10
Interval (ms)#
inte
rval
s A
Interval (ms)
1000
0 155 10# in
terv
als B
Pitch height and pitch chroma
Please see Figure 1, 2, and 7 in Roger N. Shepard. "Geometrical approximations to the structureof musical pitch." Psychological Review 89 no. 4 (1982): 305-322.
Pitch shift of inharmonic complex tones
800
800
800
800 1 /F m
300 340320
260 300280
220 260240
* *
* *
Interval (ms)0 5 10 15
Lag (ms)0 5 10 15
Peristimulus time (ms)340 380360
1/Fm
Population intervaldistributionsStimulus autocorrelationStimulus waveform
1/Fm
0 1
Freq (kHz)0 1
0 1
0 1
Fm = 125 HzFc = 750 Hz
n = 5.86
n = 5.5
n = 5
n=6
Pitch shift of inharmonic complex tonesPhase-invariant nature of all-order interval code
Cochlear nucleus IV: Pitch shift
.
Unit 35-40 CF: 2.1 kHzThr: 5.3 SR: 17.7
5
10
15Chop-S (PVCN)
45-15-8 CF:4417Thr: -18, SR: 39 s/s
10
20
30
0 500100 200 300 400Peristimulus time (ms)
Pauser (DCN)45-17-4 CF: 408 HzThr: 21.3 SR: 159
Inte
rval
(ms)
0 500100 200 300 400
30
Primarylike(AVCN)
Peristimulus time (ms)
de Boer'srule
(pitches)15sub-
harmonicsof Fc
1/Fc
de Boer'srule
(pitches)
5
10
15
Inte
rval
(ms)
1/Fm
Fm = 125 Hz Fc = 500-750 Hz Pitch ~ de Boer's ruleV ariable-Fc AM t one
500 500750 Fc(Hz)
Pooled ANF (n=47)
0 100 200 300 4000 500100 200 300 400
Dominance region for pitch (harmonics 3-5 or partials 500-1500 Hz)
n = 6 5.86 5.5
AM tone Power spectrum Autocorrelation
PitchEquivalence
Pitch of the"missingfundamental"
Level invariance
Phase invariance
Dominance region
Pitch shift ofinharmonicAM tones
Pitch salienceV. weak pitchWeak pitchStrong pitch
40 dB SPL 60 80
Population-interval representation of pitchat the level of the auditory nerve
Pure tone AM tone Click train AM noise
AM tone
QFM tone
1/F06-12 1/F03-5
F03-5=160 Hz 240 Hz 320 Hz 480 Hz
Summary
Temporal theories - pros & consMake use of spike-timing properties of
elements in early processing (to midbrain at least)Interval-information is precise & robust & level-
insensitiveNo strong neurally-grounded theory of how this
information is used
Unified model: account for pitches of perceptually-resolved & unresolved harmonics in an elegant way (dominant periodicity)
Explain well existence region for F0 (albeit with limits on max interval durations)
Do not explain low pitches of unresolved harmonics
Interval analyzers require precise delays & short coincidence windows
Little or no neural evidence for required analyzers
Pitch classes and perceptual similarity
Innate mechanisms:Harmonic similarity relations
are direct consequencesof the inherent structure
of interval codes and the neurocomputational
mechanisms usedfor their analysis
Associationist theory:Build up harmonic
associationsfrom repeated
exposure to harmoniccomplex tones
A Harmony wheel of Boomsliter and Creel (1962)
Temporal coding and music: a few references
Boomsliter P, Creel W (1962) The long pattern hypothesis in harmony and hearing. JMusic Theory 5:2-31.
Cariani P (2002) Temporal codes, timing nets, and music perception. J New Music Res30:107-136.
McKinney MF, Delgutte B (1999) A possible neurophysiological basis of the octaveenlargement effect. J Acoust Soc Am 106:2679-2692.
McKinney MF, Tramo MJ, Delgutte B (2001) Neural correlates of musical dissonance inthe inferior colliculus. In: Physiological and Psychophysical Bases of AuditoryFunction (Houtsma AJM, ed), pp 71-77. Maastricht: Shaker Publishing.
Ohgushi K (1978) On the role of spatial and temporal cues in the perception of the pitchof complex tones. J Acoust Soc Am 64:764-771.
Ohgushi K (1983) The origin of tonality and a possible explanation for the octaveenlargement phenomenon. J Acoust Soc Am 73:1694-1700.
Patterson RD (1986) Spiral detection of periodicity and the spiral form of musical scales.Psychology of Music 14:44-61.
Rose JE (1980) Neural correlates of some psychoacoustical experiences. In: NeuralMechanisms of Behavior (McFadden D, ed), pp 1-33. New York: SpringerVerlag.
Tramo MJ, Cariani PA, Delgutte B, Braida LD (2001) Neurobiological foundations forthe theory of harmony in western tonal music. Ann N Y Acad Sci 930:92-116.
INTERVALDISTRIBUTIONS
ANDOCTAVE
SIMILARITY
Octavesimilarity
Please see Cariani, Peter. Journal of New Music Research 30, no. 2 (2001): 107-135.
Pitch height and pitch chroma
Please see Figure 1, 2, and 7 in Roger N. Shepard. "Geometrical Approximations to the Structure of Musical Pitch." Psychological Review 89, no. 4 (1982): 305-322.
Please see Cariani, Peter. Journal of New Music Research 30, no. 2 (2001): 107-135.
Musical tonal relations
Please see Shepard, Roger N. "Geometrical Approximations to the Structure of Musical Pitch." Psychological Review 89, no. 4 (1982): 305-322.
Please see Cariani, Peter. Journal of New Music Research 30, no. 2 (2001): 107-135.
Note-chordrelations(Harms 1-12)
Please see Cariani, Peter. Journal of New Music Research 30, no. 2 (2001): 107-135.
Vowel properties Waveforms Power Spectra Autocorrelations
Pitch periods, 1/F0
0 1 2 3 4Frequency (kHz)
0 10 20Time (ms)
0 5 10 15Interval (ms)
100 Hz 125 HzFormant-relatedVowel qualityTimbre
[ae]F0 = 100 Hz
[ae]F0 = 125 Hz
[er]F0 = 100 Hz
[er]F0 = 125 Hz
Phase-locking of discharges in the auditory nerve
Peristimulus time (ms)
10
1
Cha
ract
eris
tic fr
eq. (
kHz)
0 5010 20 30 40
Time domain analysis of auditory-nerve fiber firing rates.Hugh Secker-Walker & Campbell Searle, J. Acoust. Soc. 88(3), 1990Neural responses to /da/ @ 69 dB SPL from Miller and Sachs (1983)
High CFs
Low CFs
F1
F2
F3
Peristimulus time (ms)
Reprinted with permission from Secker-Walker HE, Searle CL. 1990. Time-domain analysis of auditory-nerve-fiber firing rates.J. Acoust. Soc. Am. 88 (3): 1427-36. Copyright 1990, Acoustical Society of America. Used with permission.
Vowels
.
Formant-structure
0 5 10 15
Voice pitch
1/F1
Interval (ms)
Formant-structure
# i n t e r v a l s
Population interval histogram0 5 10 15
01
Signal autocorrelation [ae]Voice pitch
Population-interval coding of timbre (vowel formant structure)
[ α]
0 5
[ u]
Interval (ms)
[ ∋]
[ æ]
0 5
Population-wide distributions ofshort intervals for 4 vowels
# i n t e r v a l s
m a g n i t u d e
Coding of vowel quality (timbre)(Reprinted with permission from Secker-Walker HE, Searle CL. 1990. Time-domain analysis of auditory-nerve-fiber firing rates.J. Acoust. Soc. Am. 88 (3): 1427-36. Copyright 1990, Acoustical Society of America.)
Population-interval representations thus far
• Periodicity: capable of accounting for a wide range of pure and complex tone pitches that are heard; salience model less developed.
• Spectrum: capable of robustly representing the stimulus spectrum below 4 kHz (our neural data, Palmer neural data Meddis simulations)
• Sound separation: multiple interval bands apparent for ∆F0 > 10%
Level ( dB SPL )
Avg. rate (s/s)
250
COMPRESSION
50
800
RECTIFICATION
1 100.1Frequency (kHz)
1INPUT SIGNAL FILTER BANK
1
100 1k 10k
Mag
LOW PASS FILTER
0 5 10 15 20 25 30 35 40 45 50100
1000
8000
500
Peris timulus t ime (ms)
Cen
ter f
requ
enc
PST NEUROGRAM
0 5 100
2
4
POPULATION- INTERVAL
DiSTRIBUTION
All-order interspike interval (ms)#
inte
rval
s/m
ean
Frequency (Hz)
15
Auditory nerve model
Basic features:Gammatone filtersAdaptive gain controlMapping of output amplitude to firing rates
Spontaneous activity
Please see Figure 60 and 61 in Keidel, D., Wolf., S. Kallert, M. Korth. The physiological basis of hearing: a review. Edited by Larry Humes. New York: Thieme-Stratton, Stuttgart, GeorgeThiemeVerlag, 1983. ISBN: 0865770727.
Please see Hind, J. E., D. J. Anderson, J. F. Brugge, and J. E. Rose. "Coding of Information Pertaining to Paired Low-frequency Tones in Single Auditory Nerve Fibers of the Squirrel Monkey." J Neurophysiol 30 (July, 1967): 794 -816.
Phase-locking to a 300 Hz Pure Tone
Spike
Stimulus Waveform (0.3 kHz)
Period Histogram(1100 Hz)
# Spikes
1.1 kHz90dB SPL
200
100
0
0 5 10 15 20ms
0
50
100
150
Num
ber
of I
nter
vals
Interval Duration (ms)
First-order Interval Histogram
1.5 kHz80dB SPL
Stimulus Period
Time
Temporal discharge patterns 40-90 dB SPL
Please see Rose, Hind, Anderson, Brugge J. Neurophysiology 34 (1971): 685-99. Reprinted inD. Keidel, Wolf., S. Kallert, and M. Korth. The Physiological Basis of Hearing: A Review. Edited by Larry Humes. New York: Thieme-Stratton, Stuttgart, George ThiemeVerlag, 1983. ISBN: 0865770727.
Medium-SR model fiber
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
-200
0
200
30 dB SPL
50 dB SPL
70 dB SPL
90 dB SPL
Period time (ms)
Temporal discharge patterns to pure tone dyad: 600 & 800 Hz
Please see Rose, Hind, Anderson, Brugge J. Neurophysiology 34 (1971): 685-99. Reprinted inD. Keidel, Wolf., S. Kallert, and M. Korth. The Physiological Basis of Hearing: A Review. Edited by Larry Humes. New York: Thieme-Stratton, Stuttgart, George ThiemeVerlag, 1983. ISBN: 0865770727.
Physiological vs. simulated activity
0 5 10 15 20 25 30 35 40 45 50100
1000
Peristimulus time (ms)
PST NEUROGRAM
1000 2000 3000 40000
50
100
150
200
CF (Hz)
rate
(s/s
)
RATE-PLACEPROFILE
0 5 10 15 20 250
1
2
3
4
5
Interval (ms)
POPULATION-INTERVALDISTRIBUTION
4000
500
SIMULATED RESPONSES
Single-formant vowelF0 = 80 HzF1 = 640 Hz60 dB SPL
CATDATA
Peristimulus time (ms)
1
10
.3
5010 20 30 400
0 255 10 15 20Interspike interval (ms)
1/F01/F1
Rate-placeprofile
Characteristic frequency (kHz)0.1 1 10
180
100
0
Frequency (kHz)
C
D
F1E
F2500
50
0
F0
Pow
e(d
B)
A
B
Pitch period
1/F0 = Pitch period
Simulated vs. observed auditory nerve activity We are mainly interested here in replicating basic phase-locked time-frequency patterns.
0 5 10 15 20 25 3010
2
103
104
Cha
ract
eris
tic fr
eque
ncy
(Hz)
Peristimulus time (ms)
NEUROGRAM
102
103
104
0
100
200
300
Characteristic frequency (Hz)
Dis
char
ge r
ate
(s/s
)
RATE-PLACE PROFILE
0 0.005 0.010
1
2
3
4
Interval (ms)
peak
/mea
n
POPULATION-INTERVAL DISTRIBUTION
Competition between soundsCompetition between respective interval patterns
Towards interval-based accounts of interacting sounds:• mutual masking• pitch multiplicity• roots of dyads and triads (chords)
Pitch masking
s/n: 8 dB p/b = 1.11
s/n: 20 dB p/b = 1.57
s/n: 32 dB p/b = 2.38
s/n: no noise p/b =3.26
Inte
rval
s (m
s)In
terv
als
(ms)
Variable-F0 click train in broad-band noise Click train: F0= 160-320 Hz, positive polarity, 80 dB SPL Peak/background ratio: intervals @ 1/ F0 ± 150 usec/mean over all intervals Informal pitch thresholds, s/n dB: 12 (MT), 13 ( PC), 16 ( BD)
Peristimulus time (ms)
What degree of temporal correlation is necessary for pitch to become audible?
Click trains were presented in white noise (0-11 kHz, 70 dB total SPL) at different s/n ratios. My threshold for pitch detection with low-isolation headphones is -6dB.
Masking audiograms
Wegel & Lane, 1924
f1 + f2 together40
60
80
100
120
140
160
180
1
1.5
2
2.5
1
1.5
2
2.5
1000 1100 1200 1300 1400 1500
1
1.5
2
2.5
0
1
2
x 106
0
1
2
x 106
0 5 10 150
1
2
x 106
We assume that a pitch can be heard if its salience is >1.3.Masking functions can be then estimated for those conditions
that involve pitch detection (vs. ∆ loudness)
0 20 40 60 80 100 120 140 160 180 200102
103
104
Characteristic frequency (Hz)
Dis
char
ge ra
te (s
/s)
RATE-PLACE PROFILE
0 0.01 0.02 0.030
1
2
3
Interval (ms)
peak
/mea
n
POPULATION-INTERVAL DISTRIBUTION
500 Hz tone 65 dB SPL
Cochlear dead zone < 2 kHz
0 5 10 1505
10x 105
Interval (ms)
30
50
55
60
65
70
90
SPL
0 5 10 150
10x 105
Interval (ms)
CFs > 2kHz CFs > 100 Hz
Evolution of population-interval models
• Analysis of peaks -> whole histograms
• Comparisons between whole histograms– Pattern similarity (octave, probe-tone:chord, FF timing nets)
• Importance of limited temporal windows– Finite frequency resolution of autocorrelation analysis
– Carlyon's experiments: low F0's of high harmonics
– Lower limits of F0-pitch (30 Hz, Pressnitzer & Patterson)
• Pitch multiplicity and salience estimation
• CF-dependent temporal analysis windows• Recurrent timing nets - periodicity & object formation/sep.
are intimately linked mechanisms
Divergences between autocorrelation & interval models
0 20 40 0 10 20 30 0 10 20 300123
Masking of click patterns by random clicks
ABX 1.28KX 1.30KXX 1.21KXXX 1.20
Physiological and functional representations
Rate-place profiles
Interval-place Spatiotemporal pattern
(Place & time)Global interval
Central spectrum Frequency-domain
representation
Central autocorrelation Time-domain
representation
Different representations can support analogous strategies for pitch extraction, recognition, and comparison
Central spectrum Frequency-domain
representation
Central autocorrelation Time-domain
representation
Explicitidentificationof individualharmonics &
deduction of F0 via common
subharmonics or pattern recognition
Template-based global recognition of
pitch-related patterns (neural networks,
harmonic templates, interval sieves)
Relative pitchcomparison mechanisms (matching,
octaves, musical intervals)
Spectral patterns Temporal patterns
Neural timing nets
RECURRENT TIMING NETS• Build up pattern invariances• Detect periodic patterns• Separate auditory objects
FEED-FORWARD TIMING NETS• Temporal sieves• Extract (embedded) similarities• Multiply autocorrelations
Relative delay τ
Timet
Sj(t)
Si(t)
Si(t) Sj(t - τ)individual
multiplicativeterm
Si(tm) Sj(tm - t)Στconvolution
time-seriesterm m
two setsof input
spike trains
Input time sequence
All time delays presentτ0
τ1
τ2
τ3
Coincidenceunits
Direct inputs
Recurrent,indirect inputs
Time patterns reverberate through delay loops
0lagi leads j j leads i
Delay τRelative delay between inputs i, j
Sj(t)
Si(t)
Στ Si(tm) Sj(tm - τ)
Σt
Si(t) Sj(t - τn)
Timet
cross-correlationlag term n
Si(t) Sj(t - τ)individual
multiplicativeterm
convolutiontime-series
term m
coincidencesof two nearlysimultaneous
pulses requiredto produce
output spikein coincidence
detectors
population autocorrelation
Σt
Στ
[Si(t) Sj(t - τ) Si(t-Τ) Sj(t - τ - Τ)]
Feedforward coincidence net
Relative delay τ
Timet
Sj(t)
Si(t)
Si(t) Sj(t - τ)
convolution time-series
term
two sets of input
spike trainsM = 2 τ0 intervals N = 3 τ0 intervals
τ0
Population interval outputM*N= 2*3 = 6 τ0 intervals
C Higher-order interval
sequence
B
Individual output lines contain higher ordersequence if and only if present in both inputs
• Networks act as temporal sieves, passing only common patterns• These are the only networks for which it is irrelevant which
particular neurons are activated• Signals are no longer tied to particular transmission lines• Processing on the level of ensemble mass-statistics is made possible• Extraction of complex, embedded patterns is made possible• Time-domain multiplexing is greatly facilitated
Common timbre
0 5 10 15 0 5 10 15
[ae] - 100
[ae] - 125
[er] - 125
[er] - 100
Interval (ms)0 5 10 15 0 5 10 15
Population autocorrelations of the outputof a coincidence array for all vowel combinations
Same vowel (same F0s & formants)
Same F0s, different formants
Same formants, different F0s
Different F0s and formants
[ae] - 100
[ae] - 125
[er] - 100
[er] - 125
Detection of arbitrary periodic patternsPeriodic patterns invariably build up indelay loops whose recurrence times equals the period of the pattern and its multiples.
τ0
τ2
τ3
Input pattern
1010110010110101100101101011001011010...
τ1 = 11 ms = recurrence time of input pattern10101100101
τ1
Time patterns reverberate throughdelays loops
Input waveform
Direct inputs
All time delay present
Recurrent, indirect inputs
Coincidence units
τ3
τ1 = 11 timesteps = recurrence timeof repeating 10101100101 pattern
Buildupof activationin loop withrecurrence timeof 11 timesteps
1010110010110101100101101011001011010...Timesteps
0 200100
30
1
Cha
nnel
recu
rren
ce ti
me
Figure 8. Recurrent tming nets. Top. Behavior of a simple recurrent timing net for periodic pulsetrain patterns. The network generates many sets of expectations. The delay loop whose recurrencetime equals the period of the pattern builds up that pattern. Below. Response of a recurrent timingnet to the beat pattern of La Marseillaise. Arrows indicate periodic subpatterns at 16, 32, and 64timesteps that are built up by the network. The example points to potential applications of recurrenttiming nets for rhythm induction and analysis. From Cariani, 1999, working paper on timing netsand rhythm.
τ0
τ1
Timesteps (samples)
1100110001000100010001000000110011001100010000000100110000000000...
Delay loop(recurrence t ime,
in samples)
La Marseillaise rhyt hm
Periodicpattern builds up
Rhythmicsubpatterns
Build-up and separation of two auditory objects
Two vowels with different fundamental frequencies (F0s) are added togetherand passed through the simple recurrent timing net. The two patterns build upIn the delay loops that have recurrence times that correspond to their periods.
Vowel [er]F0 = 125 HzPeriod = 8 ms
Period = 10 ms
Vowel [ae]F0 = 100 Hz
Time (ms)
Cha
ract
eris
tic d
elay
cha
nnel
(ms)
0 10 20 30 40 50Time (msec)
Pitch and auditory scene analysis
Two different strategies for scene analysis:1. Group channels by common properties (local features)2. Build up invariant temporal patterns as objects (invariant relations)
Separation of multiple recurrent time patterns:two vowels with different F0's
0 10 20 30 40 50Time (msec)
Vowel [ae]F0 = 100 Hz
Double vowel[ae]+[er]
Vowel [er]F0 = 125 Hz
Recurrenttiming netwith single input channel
Revised buildup rule:Min(direct, circulating) plus a fraction of their absolute difference
Auditory "pop-out"phenomena suggest a period-by-periodwaveform comparison
Last 2 periods - first 2
transientdisparity
ongoingdisparity
Traditional approach (Frequency domain)
Segregate frequency channels
Assign channels to objects
AE-ER
AEER
Double vowels drive the same frequency channelsin the auditory nerve
FREQUENCY CHANNELS
0 50Peristimulus time (ms)
PST NEUROGRAM ae100-er106
1k
4k
125
CF (Hz)
The information needed to separate sounds on the basis of F0 differences
is multiplexed in the phase-locked fine time structure of the neural firing patterns
(Traditional spectrographic approaches throw awaythis information at an early stage.)
FREQUENCY CHANNELS
0 50Peristimulus time (ms)
PST NEUROGRAM ae100-er106
1k
4k
125
CF (Hz)
How might a neural system (e.g. the auditory system) exploit this information?
Strongly agree that we shouldn't be limited by ignorance of auditory system, but we need to reverse-
engineer that system if we are to find new functional principles
Behavior of the loop array
Activity in the loop arrayis described in terms of3 dimensions:• frequency channel• delay loop• peristimulus time
15
Summary autocorrelogram(collapse over all CFs)
LoopDelay(ms)
0 50 100Peristimulus time (ms)
10
8ae100(10 ms period)
6er106
(9.4 ms period)
Neural spectrogram(collapse over all delay loops)
CF(Hz)
4k
125
time
Frequency
Loopdelay
Processing of tonotopically-organized inputs
Σ Σ Σ Σ
τ
FREQUENCYCHANNELS
0
0.05
0
0.05
0
er106
0 5 10 15
Interval (ms)0
5
ae100
Loop delays = 9.7 ms(period of er106)
Loop delays = 10 ms(period of ae100)
COMPUTECORRELATIONCOEFFICIENT
FORSHORT
INTERVALS
0 5 10 15
Interval (ms)
POPULATION-INTERVALDISTRIBUTIONS FORDIFFERENT DELAYS
COMBINE INTERVALSFOR EACH DELAYFROM ALL FREQ'S
BUILD UP CORRELATIONPATTERNS
IN DELAY LOOPS FREQUENCY-BY-FREQUENCY
POPULATION-INTERVALDISTRIBUTIONS FOR
CONSTITUENT SINGLE VOWELSPRESENTED INDIVIDUALLY
0 50Peristimulus time (ms)
PST NEUROGRAM ae100-er106
1k
4k
125
CF(Hz)
2
3
2000 Peristimulus time (ms)
CF(Hz)
H(t,t+t) = min(X(CF, t), H(t,t)) + 0.1*abs(X(CF, t) - H(t,t));
X(CF, t)H(τ,t) H(τ,t+τ)
BUILDUP RULE:pattern entering the delay loop is theminimum of direct & circulating inputs,plus a fraction of their difference
direct
circulating
loopsignalH(τ,t)
loopsignal
Concurrent vowels
WAVEFORM SPECTRUM AUTOCORRELATION
0 5 10 15 20Time (ms)
Am
plitu
de
102 103-20
-10
0
10
20
30
Autocorrelation lag (ms)
Mag
nitu
de
ae-100
er-124
ae-100 +
er-124
Separation of objects with increasing ∆F0
∆F0 increases0 10.5semitones 4
Those sets of delay loops whoserecurrence times equal thefundamental periods of the twovowels are invariably activated themost.
As time goes on, the signalspresent in these sets of delay loopsare amplified.
When the two vowels have the samefundamental (100 Hz), only one setof delay loops is activated (10 msdelay).
When the two vowels havefundamentals separated by asemitone or more (ae100-er106 &ae100-er126) two sets of delay loopsare activated. The recurrence times ofthese sets of delay loops equal therespective vowel periods (10, 9.4, 7.9ms) .
ae100-er100
0 5 10 150
0.1
ae100-er103
0 5 10 15
Loop delay (ms)
ae100-er106
0 5 10 15
ae100-er126
0 5 10 15
10-30 ms pst
30-50 ms pst
80-100 ms pst
Simulation performance vs. psychophysics
0 1 2 3 40
0.2
0.4
0.6
0.8
1
0 1 2 3 40
0.2
0.4
0.6
0.8
1
0 1 2 3 40
0.2
0.4
0.6
0.8
1
0 1 2 3 40
0.2
0.4
0.6
0.8
1
0 1 2 3 40
0.2
0.4
0.6
0.8
1
0 1 2 3 40
0.2
0.4
0.6
0.8
1
² F0 (semitones) ² F0 (semitones)
coef
ficie
nt
ae-ah ae-ee
ae-er ah-ee
ah-er er-ee
2nd vowel
1st vowel
0-200 MS INTEGRATION TIME
For 200 ms integrations, one (dominant) vowelproduced high correlations for all ²F0's.
Correlations of nondominant vowels increasedas ²F0's increased to a semitone, then levelling off.
Assmann &Summerfield
ScheffersB
oth
vow
els
corr
ect (
%)
Mea
ns o
ver v
owel
set
sZwicker
0 1 2 3 4
100
80
60
40
20
0
F0 Separation (semitones)
Identification of double vowels
Simulations results Psychophysical observations
Listeners almost invariably get one vowel correct.
Correct identification of both vowels is slightlyenhanced (~15%) by F0 differences of a semitone ormore, where performance saturates.
Potential applications
• Pitch tracking of speakers& recovery of waveforms
• Separation of multiple speakers for speech recognition
• Enhance voiced speech segments in background noise
• Separation of tonal music from background noise
• Analogous 2-D nets for image stabilization and object formation from correlational invariants
• Reverberating temporal memories via transiently-facilitated synapses: periodic stimuli dynamically create loops on the fly
Conclusions
• Population-wide distributions of all-order interspike intervals in early stations of the auditory pathway explain many diverse aspects of pitch and timbre perception: level-invariance, discriminatory precision, pitch equivalence, octave similarity, pitch fusion.
• These distributions form autocorrelation-like temporal representations that convey spectral information up to the limits of strong phase-locking (~5 kHz).
• The mechanisms by which the central auditory system might make use of this information remain enigmatic.
• Neural timing nets can be envisioned that operate entirely in the time domain to carry out musical interval processing and scene analysis.
Reading/assignment for next meeting
• Thursday. Feb. 26• Timbre
• Reading:Moore Chapter 8, pp. 269-273Chapter in Deutsch by Risset & Wessel on Timbre
(first sections up to p. 113-118)Handel, chapter on Identification