EE1411
Hearing and SpeechHearing and Speech
Janusz A. Starzyk
Based on book Cognition, Brain and Consciousness ed. Bernard J. Baars
Cognitive ArchitecturesCognitive Architectures
EE1412
Sound and hearing basicsSound and hearing basics
Complex sound signals can be decomposed into a series of sinewave signals of various frequencies.
Human auditory system detects sounds in the range of 20 Hz to 20 kHz bats and whales can hear up to 100 kHz
Musicians can detect the difference between 1000 Hz and 1001 Hz
Time domain sinewave signal and thesame signal in time-frequency domain
EE1413
Sound and hearing basicsSound and hearing basics 20 msec is needed for
the onset of a consonant 200 msec is time of an
average syllable And 2000 msec is
needed for a sentence These various time
scales and other parameters of the sound like timbre or intensity must be properly processed to recognize speech or music.
A spectrogram of a speech signal – frequency is represented on the y-axis
EE1414
Sound and hearing basicsSound and hearing basics
Dynamic range of human hearing system is very broad from 1 SPL (sound pressure level where hearing is accruing) to 1015 SPL or 150 dB SPL.
Human and cat hearing sensitivity
Near total silence - 0 dB A whisper - 15 dB Normal conversation - 60 dB A lawnmower - 90 dB A car horn - 110 dB A rock concert - 120 dB
A gunshot - 140 dB
EE1415
Sound and hearing basicsSound and hearing basics
Sound wave caused by vibrating objects moves through the air and enters external auditory canal reaching membrane or eardrum.
Vibrations propagate through the middle ear through mechanical action of three bones the hammer, anvil and stirrup (or malleus, incus and stapes).
Because of the length of the ear canal, it is capable of amplifying sounds with frequencies of approximately 3000 Hz.
There are two cochlear windows – oval and round.
Stapes coveys sound vibrations through oval window to inner ear fluids.
EE1416
Sound and hearing basicsSound and hearing basics
The cochlea and the semicircular canals are filled with a water-like fluid.
Cochlea in the inner ear contains a basilar membrane. Traveling wave of sound moves across the basilar membrane
moving the small hair-like nerve cells.
EE1417
Sound and hearing basicsSound and hearing basics
The inner surface of the cochlea is lined with over 16 000 hair-like nerve cells which perform one of the most critical roles in our ability to hear.
Each hair cell has a natural sensitivity to a particular frequency of vibration.
The brain decodes the sound frequencies based on which hair cells along the basilar membrane are activated this is known as place principle.
Pathways at the auditory brainstem
EE1418From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000
Inner ear detailsInner ear details
EE1419From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
Inner ear detailsInner ear details
EE14110From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000
Inner ear detailsInner ear details
EE14111From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
Inner ear detailsInner ear details
EE14112From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
Inner ear detailsInner ear details
EE14113
Figure 30-5
From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
Inner ear detailsInner ear details
EE14114
Figure 30-5
From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
Inner ear detailsInner ear details
EE14115From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
Inner ear detailsInner ear details
EE14116
The central auditory systemThe central auditory system The auditory system has
many stages from the ear, to the brainstem, to subcortical nuclei, and to cortex.
Ascending (affarent) pathways transmit information from the periphery to cortex.
The neuron signals travel starting from the auditory nerve to the lower (ventral) cochlear nucleus.
Then signal travels through lateral lemniscus, inferior colliculus, thalamus, to auditory cortex.
A key task of the ascending pathway is to localize sound in space.
EE14117
The central auditory systemThe central auditory system The descending (efferent)
pathways from auditory system cortex go down to periphery under cortical control.
This control extends all the way to hair cells in the cochlea.
Descending pathway provides ‘top down’ information critical for selective attention and perception in a noisy environment.
Besides ascending and descending pathways there is connection between left and right auditory pathways through corpus callosum and other brain regions.
EE14118
Auditory cortexAuditory cortex Auditory cortex
specializes in sound processing.
It serves as a hub for sound processing and interacts with other systems within cortex and back down the descending path to the cochlea.
These processes provide a wide range of perceptual abilities like selecting a single person's voice in a crowded space or recognizing melody even when it is played off-key.
EE14119
Auditory cortexAuditory cortex In humans primary auditory
cortex is located within Heschl’s gyrus. Heschl’s gyrus corresponds to
Brodmann’s area 41. Another important region in
auditory cortex is planum temporale located posterior to Heschl’s gyrus. Planum temporale is much
larger in the left hemisphere (up to 10 times) in right handed individuals.
It plays important role in language understanding.
Posterior to planum temporale is Broadmann area 22 that Carl Wernicke associated with speech comprehension (Wernicke area).
EE14120
Auditory cortexAuditory cortex
Main cells of cochlear nucleus and their corresponding post stimulus time (PST) histograms.
Sound stimulus used is typically 25 ms tone bursts at the center frequency and sound level 30 dB above threshold.
There are several types of neurons in the auditory system.
They have different response properties for coding frequency, intensity, and timing information in sounds as well as encoding spatial information for localizing sounds in space.
EE14121
Auditory cortexAuditory cortex Receptive fields of auditory neurons
have different sensitivity to the location of the sound source (in azimuth angle) and its loudness (in dB).
The top neuron sensitivity is to a broad range of sound intensity located to the right with larger sensitivity to louder signals.
The lower neuron sensitivity is more narrowly tuned to sounds level 30-60 dB located slightly to the left of center.
Broadly tuned neurons are useful for detection of the sound source, while narrowly tuned give more precise information needed to locate the sound source like more precise direction of the sound and its loudness level.
EE14122
Auditory cortexAuditory cortex Auditory tonotopic cortical fields of a
cat. a) lateral view b) lateral view “unfolded’ to show
parts hidden within sulci. The four tonotopic fields are:
Anterior (A) Primary (AI) Posterior (P) and Ventroposterior (VP)
Positions of the lowest and highest center frequencies in these fields are indicated in (b)
Other cortical areas have a little tonotopy: seconday (AII), ventral (V), temporal (T), and dorsoposterior (DP).
EE14123
The planum temporale (PT) location close to Wernicke’s area for speech comprehension, points towards its role as the site for auditory speech and language processing.
However neuroimaging studies of PT provide evidence that functional role of PT is not limited to speech.
PT is a hub for auditory scene analysis, decoding sensory inputs and comparing them to memories and past experiences.
PT further directs cortical processing to decode spatial location and auditory object identification.
Planum temporale and its major associations: lateral superior temporal gyrus (STG), superior temporal sulcus (STS), middle temporal gyrus (MTG),
parieto-temporal operculum (PTO), inferior parietal lobe (IPL).
Functional mapping of auditory processingFunctional mapping of auditory processing
EE14124
PT as a hub for auditory and spatial analysis.
In a crowded environment it is important to decode auditory objects such as friend’s voice, alarm signal or a squeaking wheel.
To do so, auditory system must determine where sounds are occurring in space, and what they represent.
All these will be associated with other sensory inputs like vision, smell, or feel and memory associations.
Functional mapping of auditory processingFunctional mapping of auditory processing
EE14125
To determine where the sound is coming from, two cues are used: Interaural (between ear) time difference Interaural level difference
Sensitivity to time difference must be smaller than millisecond. The head produces a ‘sound shadow’ so that the sound reaching farther
ear is slightly weaker.
Functional mapping of auditory processingFunctional mapping of auditory processing
Neurons’ response to interaural time difference (ITD) and interaural level difference (ILD)
Abbreviations:CN – cochlear nucleusMSO – medial superior oliveLSO – lateral superior oliveMNTB – medial nucleus of the trapezoidal body
EE14126
It was demonstrated that musical conductors were able to better locate sound sources in a musical score
They demonstrated higher sensitivity to sounds presented in peripheral listening than other groups including other musicians.
Functional mapping of auditory processingFunctional mapping of auditory processing
EE14127
Auditory objects are categorized into human voices, musical instruments, animal sounds, etc.
Auditory objects are learned over our lifetime, and associations are stored in the memory.
Auditory areas in superior temporal cortex are activated both by recognized and unrecognized sounds.
Recognized sounds also activate superior temporal sulcus and middle temporal gyrus (MTG).
Functional mapping of auditory processingFunctional mapping of auditory processing
Fig. (c) shows difference betweenActivations for recognized sounds and unrecognized sounds
EE14128
Binder and colleagues propose that middle temporal gyrus (MTG) is the region that associates sounds and images.
This is in agreement with case studies of patients who suffered from auditory agnosia (inability to recognize sounds).
Research results showed that auditory object perception is a complex process and involves multiple brain regions in both hemispheres.
Functional mapping of auditory processingFunctional mapping of auditory processing
Brain activities in auditory processing – cross sections at different depth
EE14129
How auditory system separates sounds coming from different sources?
Bregman (1990) proposed a model for such segregation.
It contains four elements: The source The stream Grouping Stream segregation
The source is the sound signal. It represents physical features like frequency, intensity, spatial location.
The stream is the percept of the sound and represents psychological aspects depending on individual.
Grouping – creates stream Simultaneous grouping e.g. instruments in the orchestra Sequential grouping e.g. grouping sounds across time
Stream segregation into objects.
Cocktail party effectCocktail party effect
EE14130
Bergman grouping principles: Proximity: sounds that are
close in time are grouped. Closure: if a sound does not
belong to the stream (like cough during a lecture) are excluded.
Good continuation: sounds that follow smoothly each other (similar to proximity).
Common fate: sounds that come from the same location or coincide in time (orchestra).
Exclusive allocation – selective listening (focus on one stream).
Cocktail party effectCocktail party effect
Cortical areas of auditory stream analysis: intraparietal sulcus (IPS) is involved in binding of multimodal information (vision, touch, sound)
EE14131
There is a growing evidence that like in visual stream cortical networks for decoding ‘what’ and ‘where’ information in sound are processed in separate but highly interactive processing streams.
Cocktail party effectCocktail party effect
Audio (blue) and visual (pink) processing areas in macaque brain, and ‘what’, ‘where’ audio processing streams
Human brain processing:Blue – language specific phonological structureLilac – phonetic cues and speech featuresPurple – intelligible speechPink – verbal short term memoryGreen – auditory spatial tasks
EE14132
There is no agreement how speech is coded in the brain. What are the speech ‘building blocks’?
A natural way would be to code words based on phonemes. Word ‘dig’ would be obtained by identifying a sequence of
phonemes Perhaps a syllable is the appropriate unit? We must decode not only ‘what ‘ but ‘who’ and ‘when’
as well to understand temporal order of phonemes, syllables, words, and sentences. The speech signal must be evaluated on the scale of times
from 20 ms to 2000 ms independently of the pitch (high for a child, low for a man), loud or quiet, fast or slow.
Speech perceptionSpeech perception
EE14133
Early attempts in simplifying the speech processing were done in Bell Labs by Homer Dudley who developed vocoder: Vocoder (voice + coder) was able to reduce speech signal for a transmission
over long telephone circuits by analyzing and recoding speech. Cochlear implants that stimulate auditory system are based on the vocoder
technology for some types of hearing loss.
Speech perceptionSpeech perception
EE14134
A second invention spectrograph developed in Bell Labs during World War II produced voice picture with frequency on y-axis, time on x-axis and intensity as a level of grey.
Problems in analyzing spectrograms: Gaps or silences do not mark when the word begins and ends. Individual phonemes change depending on what phonemes were
before and after them.
Speech perceptionSpeech perception
EE14135
What is wrong with the short-term spectrum?What is wrong with the short-term spectrum?
Inconsistent (same message, different representation)Shannon (1998) showed that a minimum information for speech decoding is included in the shape of the speech signal called temporal envelope
frequency
short-term spectrum
EE14136
Lack of invariant features in speech spectrogram forced researchers to look for other ways of speech perception.
The motor theory developed by Liberman (1985) assumes domain-specific approach to speech. This theory suggests that speech perception is tightly coupled with
speech production While acoustics of phonemes lack invariance, the motor gestures to
produce the speech is invariant and can be accessed in speech perception.
Another theory developed by Tallal assumes that speech and language are domain-general. In this theory left-hemisphere language organization is not result of
domain-specific development, but results from domain general bias of the left hemisphere for decoding rapidly changing sounds (such as those contained in speech).
It is likely that the neural system uses a combination of domain-specific and domain-general processing for speech perception.
Speech perceptionSpeech perception
EE14137
A process model for word comprehension.
Language areas.
Speech perceptionSpeech perception
EE14138
Binder and colleagues (1997) studied activation of brain areas to words, reverse speech and pseudowords and found that Heschl’s gyrus and the planum temporale were activated similarly for all stimuli.
This supports the notion of hierarchical processing of sounds with Heschl’s gyrus representing early sensory analysis.
Speech signals activated larger portion of auditory cortex than non-speech sounds in posterior superior temporal gyrus and superior temporal sulcus, but there was no difference in activation between words, pseudowords and reversed speech. The conclusion is that these regions do not reflect semantic processing
of the words but reflect phonological processing of the speech sounds.
Speech perceptionSpeech perception
Brain response to: Words
Pseudowords
Reversed speech
EE14139
Speech perception and production are tightly coupled. One explanation is that when we speak
we hear our voice. Wernicke proposed a model for language
processing that links a pathway from auditory speech perception to motor speech production
The verbal signal enters the primary cortex (A) and then Wernicke’s area (WA) The response will be formulated in Broca’s
area (B) and the primary motor cortex (M).
We can listen and respond to our own speech using the same brain regions.
Producing internal response to a question will result in silent speaking to ourselves.
Speech perception and productionSpeech perception and production
EE14140
Damage to speech perceptual system may be caused by strokes that block the blood flow to the brain area and cause death of neurons.
When the stroke impairs the language functions it is called aphasia. Paul Broca discovered aphasia in the region in frontal lobe important
for speech production. Carl Wernicke discovered a region in temporal lobe important for
speech perception. Experiments by Blumstein tested phonetic deficits and semantic
deficits by providing patients with four choices in the test: correct word, semantic foil, phonetic foil and unrelated foil (e.g. peas,
carrots, keys, and calculator)
Damage to speech perceptual systemDamage to speech perceptual system
Phonetic foils
EE14141
An important theme in studying human cognition is to find out how new information is encoded during learning and how the brain adapts – plasticity.
Much of what is known about plasticity of the auditory system is due to deprivation in animal study.
Both cochlea and brainstem are organized tonotopically and this organization is reflected in auditory cortex.
After cochlea or brainstem are lesioned some frequencies are no longer transmitted to auditory cortex and then cortex is studied for changes reflecting neural plasticity.
Changes in neural response in auditory cortex were observed in human after sudden hearing loss.
Children with hearing loss showed some maturational lag comparing to typical development, however after having cochlear implants, their auditory system continued to mature in a typical fashion.
This indicates plasticity of the auditory cortex.
Learning and plasticityLearning and plasticity
EE14142
Plasticity due to learning was observed in laboratory animals using classical conditioning – presented tones were paired with mild electrical shock so the animal learned sounds more relevant to survival (avoiding shock).
Plasticity related changes were more pronounced for higher motivational levels. Trained tones were
4.1-8kHz and motivational levels
were high (red) medium (black) and
low (blue)
Learning and plasticityLearning and plasticity
Cortical area change for the desired signal frequency for different motivational levels
Untrained Trained
EE14143
Auditory system is the last to fall asleep and the first to wake up. People in sleep respond to their names better than to other sounds. Figure compares responses in auditory cortex during awaken and
sleep states.
Auditory awarenessAuditory awareness
EE14144
Sounds are played in our head all day even if we do not hear them. Some are voluntary and uncalled for like a melody or your inner voice. Some are planned like when you rehearse a verse or a telephone
number in your head. Halpern and colleagues (2004) showed that non-primary auditory
cortex is active during imagined (and not heard) sounds.
Auditory imageryAuditory imagery
Brain areas active for imagined sounds
EE14145
A related results were obtained by Jancke and colleagues (2005). They used fMRI images to compare neural responses to real sounds
and to imagined sounds. Imagined sounds activate similar regions in auditory cortex as the real ones.
Auditory imageryAuditory imagery
EE14146
We discussed organization of the acoustic system Learned sound and hearing basics Traced auditory pathways Analyzed organization of auditory cortex Observed functional mapping of auditory processing Discussed sound and music perception Effect of learning on sound processing Research on animals confirmed existence of ‘what’ and
‘where’ pathways in auditory system, however these pathways may be organized differently in humans.
When you hear uncalled melody in your head, think which of your brain areas are activated.
SummarySummary