Download - EE141 1 Hearing and Speech Janusz A. Starzyk Based on book Cognition, Brain and Consciousness ed. Bernard J. Baars Cognitive Architectures

EE1411

Hearing and SpeechHearing and Speech

Janusz A. Starzyk

Based on book Cognition, Brain and Consciousness ed. Bernard J. Baars

Cognitive ArchitecturesCognitive Architectures

EE1412

Sound and hearing basicsSound and hearing basics

Complex sound signals can be decomposed into a series of sinewave signals of various frequencies.

Human auditory system detects sounds in the range of 20 Hz to 20 kHz bats and whales can hear up to 100 kHz

Musicians can detect the difference between 1000 Hz and 1001 Hz

Time domain sinewave signal and thesame signal in time-frequency domain

EE1413

Sound and hearing basicsSound and hearing basics 20 msec is needed for

the onset of a consonant 200 msec is time of an

average syllable And 2000 msec is

needed for a sentence These various time

scales and other parameters of the sound like timbre or intensity must be properly processed to recognize speech or music.

A spectrogram of a speech signal – frequency is represented on the y-axis

EE1414


Dynamic range of human hearing system is very broad from 1 SPL (sound pressure level where hearing is accruing) to 1015 SPL or 150 dB SPL.

Human and cat hearing sensitivity

Near total silence - 0 dB A whisper - 15 dB Normal conversation - 60 dB A lawnmower - 90 dB A car horn - 110 dB A rock concert - 120 dB

A gunshot - 140 dB

EE1415


Sound wave caused by vibrating objects moves through the air and enters external auditory canal reaching membrane or eardrum.

Vibrations propagate through the middle ear through mechanical action of three bones the hammer, anvil and stirrup (or malleus, incus and stapes).

Because of the length of the ear canal, it is capable of amplifying sounds with frequencies of approximately 3000 Hz.

There are two cochlear windows – oval and round.

Stapes coveys sound vibrations through oval window to inner ear fluids.

EE1416


The cochlea and the semicircular canals are filled with a water-like fluid.

Cochlea in the inner ear contains a basilar membrane. Traveling wave of sound moves across the basilar membrane

moving the small hair-like nerve cells.

EE1417


The inner surface of the cochlea is lined with over 16 000 hair-like nerve cells which perform one of the most critical roles in our ability to hear.

Each hair cell has a natural sensitivity to a particular frequency of vibration.

The brain decodes the sound frequencies based on which hair cells along the basilar membrane are activated this is known as place principle.

Pathways at the auditory brainstem

EE1418From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000

Inner ear detailsInner ear details

EE1419From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.


EE14110From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000


EE14113

Figure 30-5

From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.


EE14114

Figure 30-5

From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.


EE14116

The central auditory systemThe central auditory system The auditory system has

many stages from the ear, to the brainstem, to subcortical nuclei, and to cortex.

Ascending (affarent) pathways transmit information from the periphery to cortex.

The neuron signals travel starting from the auditory nerve to the lower (ventral) cochlear nucleus.

Then signal travels through lateral lemniscus, inferior colliculus, thalamus, to auditory cortex.

A key task of the ascending pathway is to localize sound in space.

EE14117

The central auditory systemThe central auditory system The descending (efferent)

pathways from auditory system cortex go down to periphery under cortical control.

This control extends all the way to hair cells in the cochlea.

Descending pathway provides ‘top down’ information critical for selective attention and perception in a noisy environment.

Besides ascending and descending pathways there is connection between left and right auditory pathways through corpus callosum and other brain regions.

EE14118

Auditory cortexAuditory cortex Auditory cortex

specializes in sound processing.

It serves as a hub for sound processing and interacts with other systems within cortex and back down the descending path to the cochlea.

These processes provide a wide range of perceptual abilities like selecting a single person's voice in a crowded space or recognizing melody even when it is played off-key.

EE14119

Auditory cortexAuditory cortex In humans primary auditory

cortex is located within Heschl’s gyrus. Heschl’s gyrus corresponds to

Brodmann’s area 41. Another important region in

auditory cortex is planum temporale located posterior to Heschl’s gyrus. Planum temporale is much

larger in the left hemisphere (up to 10 times) in right handed individuals.

It plays important role in language understanding.

Posterior to planum temporale is Broadmann area 22 that Carl Wernicke associated with speech comprehension (Wernicke area).

EE14120

Auditory cortexAuditory cortex

Main cells of cochlear nucleus and their corresponding post stimulus time (PST) histograms.

Sound stimulus used is typically 25 ms tone bursts at the center frequency and sound level 30 dB above threshold.

There are several types of neurons in the auditory system.

They have different response properties for coding frequency, intensity, and timing information in sounds as well as encoding spatial information for localizing sounds in space.

EE14121

Auditory cortexAuditory cortex Receptive fields of auditory neurons

have different sensitivity to the location of the sound source (in azimuth angle) and its loudness (in dB).

The top neuron sensitivity is to a broad range of sound intensity located to the right with larger sensitivity to louder signals.

The lower neuron sensitivity is more narrowly tuned to sounds level 30-60 dB located slightly to the left of center.

Broadly tuned neurons are useful for detection of the sound source, while narrowly tuned give more precise information needed to locate the sound source like more precise direction of the sound and its loudness level.

EE14122

Auditory cortexAuditory cortex Auditory tonotopic cortical fields of a

cat. a) lateral view b) lateral view “unfolded’ to show

parts hidden within sulci. The four tonotopic fields are:

Anterior (A) Primary (AI) Posterior (P) and Ventroposterior (VP)

Positions of the lowest and highest center frequencies in these fields are indicated in (b)

Other cortical areas have a little tonotopy: seconday (AII), ventral (V), temporal (T), and dorsoposterior (DP).

EE14123

The planum temporale (PT) location close to Wernicke’s area for speech comprehension, points towards its role as the site for auditory speech and language processing.

However neuroimaging studies of PT provide evidence that functional role of PT is not limited to speech.

PT is a hub for auditory scene analysis, decoding sensory inputs and comparing them to memories and past experiences.

PT further directs cortical processing to decode spatial location and auditory object identification.

Planum temporale and its major associations: lateral superior temporal gyrus (STG), superior temporal sulcus (STS), middle temporal gyrus (MTG),

parieto-temporal operculum (PTO), inferior parietal lobe (IPL).

Functional mapping of auditory processingFunctional mapping of auditory processing

EE14124

PT as a hub for auditory and spatial analysis.

In a crowded environment it is important to decode auditory objects such as friend’s voice, alarm signal or a squeaking wheel.

To do so, auditory system must determine where sounds are occurring in space, and what they represent.

All these will be associated with other sensory inputs like vision, smell, or feel and memory associations.


EE14125

To determine where the sound is coming from, two cues are used: Interaural (between ear) time difference Interaural level difference

Sensitivity to time difference must be smaller than millisecond. The head produces a ‘sound shadow’ so that the sound reaching farther

ear is slightly weaker.


Neurons’ response to interaural time difference (ITD) and interaural level difference (ILD)

Abbreviations:CN – cochlear nucleusMSO – medial superior oliveLSO – lateral superior oliveMNTB – medial nucleus of the trapezoidal body

EE14126

It was demonstrated that musical conductors were able to better locate sound sources in a musical score

They demonstrated higher sensitivity to sounds presented in peripheral listening than other groups including other musicians.


EE14127

Auditory objects are categorized into human voices, musical instruments, animal sounds, etc.

Auditory objects are learned over our lifetime, and associations are stored in the memory.

Auditory areas in superior temporal cortex are activated both by recognized and unrecognized sounds.

Recognized sounds also activate superior temporal sulcus and middle temporal gyrus (MTG).


Fig. (c) shows difference betweenActivations for recognized sounds and unrecognized sounds

EE14128

Binder and colleagues propose that middle temporal gyrus (MTG) is the region that associates sounds and images.

This is in agreement with case studies of patients who suffered from auditory agnosia (inability to recognize sounds).

Research results showed that auditory object perception is a complex process and involves multiple brain regions in both hemispheres.


Brain activities in auditory processing – cross sections at different depth

EE14129

How auditory system separates sounds coming from different sources?

Bregman (1990) proposed a model for such segregation.

It contains four elements: The source The stream Grouping Stream segregation

The source is the sound signal. It represents physical features like frequency, intensity, spatial location.

The stream is the percept of the sound and represents psychological aspects depending on individual.

Grouping – creates stream Simultaneous grouping e.g. instruments in the orchestra Sequential grouping e.g. grouping sounds across time

Stream segregation into objects.

Cocktail party effectCocktail party effect

EE14130

Bergman grouping principles: Proximity: sounds that are

close in time are grouped. Closure: if a sound does not

belong to the stream (like cough during a lecture) are excluded.

Good continuation: sounds that follow smoothly each other (similar to proximity).

Common fate: sounds that come from the same location or coincide in time (orchestra).

Exclusive allocation – selective listening (focus on one stream).


Cortical areas of auditory stream analysis: intraparietal sulcus (IPS) is involved in binding of multimodal information (vision, touch, sound)

EE14131

There is a growing evidence that like in visual stream cortical networks for decoding ‘what’ and ‘where’ information in sound are processed in separate but highly interactive processing streams.


Audio (blue) and visual (pink) processing areas in macaque brain, and ‘what’, ‘where’ audio processing streams

Human brain processing:Blue – language specific phonological structureLilac – phonetic cues and speech featuresPurple – intelligible speechPink – verbal short term memoryGreen – auditory spatial tasks

EE14132

There is no agreement how speech is coded in the brain. What are the speech ‘building blocks’?

A natural way would be to code words based on phonemes. Word ‘dig’ would be obtained by identifying a sequence of

phonemes Perhaps a syllable is the appropriate unit? We must decode not only ‘what ‘ but ‘who’ and ‘when’

as well to understand temporal order of phonemes, syllables, words, and sentences. The speech signal must be evaluated on the scale of times

from 20 ms to 2000 ms independently of the pitch (high for a child, low for a man), loud or quiet, fast or slow.

Speech perceptionSpeech perception

EE14133

Early attempts in simplifying the speech processing were done in Bell Labs by Homer Dudley who developed vocoder: Vocoder (voice + coder) was able to reduce speech signal for a transmission

over long telephone circuits by analyzing and recoding speech. Cochlear implants that stimulate auditory system are based on the vocoder

technology for some types of hearing loss.


EE14134

A second invention spectrograph developed in Bell Labs during World War II produced voice picture with frequency on y-axis, time on x-axis and intensity as a level of grey.

Problems in analyzing spectrograms: Gaps or silences do not mark when the word begins and ends. Individual phonemes change depending on what phonemes were

before and after them.


EE14135

What is wrong with the short-term spectrum?What is wrong with the short-term spectrum?

Inconsistent (same message, different representation)Shannon (1998) showed that a minimum information for speech decoding is included in the shape of the speech signal called temporal envelope

frequency

short-term spectrum

EE14136

Lack of invariant features in speech spectrogram forced researchers to look for other ways of speech perception.

The motor theory developed by Liberman (1985) assumes domain-specific approach to speech. This theory suggests that speech perception is tightly coupled with

speech production While acoustics of phonemes lack invariance, the motor gestures to

produce the speech is invariant and can be accessed in speech perception.

Another theory developed by Tallal assumes that speech and language are domain-general. In this theory left-hemisphere language organization is not result of

domain-specific development, but results from domain general bias of the left hemisphere for decoding rapidly changing sounds (such as those contained in speech).

It is likely that the neural system uses a combination of domain-specific and domain-general processing for speech perception.


EE14137

A process model for word comprehension.

Language areas.


EE14138

Binder and colleagues (1997) studied activation of brain areas to words, reverse speech and pseudowords and found that Heschl’s gyrus and the planum temporale were activated similarly for all stimuli.

This supports the notion of hierarchical processing of sounds with Heschl’s gyrus representing early sensory analysis.

Speech signals activated larger portion of auditory cortex than non-speech sounds in posterior superior temporal gyrus and superior temporal sulcus, but there was no difference in activation between words, pseudowords and reversed speech. The conclusion is that these regions do not reflect semantic processing

of the words but reflect phonological processing of the speech sounds.


Brain response to: Words

Pseudowords

Reversed speech

EE14139

Speech perception and production are tightly coupled. One explanation is that when we speak

we hear our voice. Wernicke proposed a model for language

processing that links a pathway from auditory speech perception to motor speech production

The verbal signal enters the primary cortex (A) and then Wernicke’s area (WA) The response will be formulated in Broca’s

area (B) and the primary motor cortex (M).

We can listen and respond to our own speech using the same brain regions.

Producing internal response to a question will result in silent speaking to ourselves.

Speech perception and productionSpeech perception and production

EE14140

Damage to speech perceptual system may be caused by strokes that block the blood flow to the brain area and cause death of neurons.

When the stroke impairs the language functions it is called aphasia. Paul Broca discovered aphasia in the region in frontal lobe important

for speech production. Carl Wernicke discovered a region in temporal lobe important for

speech perception. Experiments by Blumstein tested phonetic deficits and semantic

deficits by providing patients with four choices in the test: correct word, semantic foil, phonetic foil and unrelated foil (e.g. peas,

carrots, keys, and calculator)

Damage to speech perceptual systemDamage to speech perceptual system

Phonetic foils

EE14141

An important theme in studying human cognition is to find out how new information is encoded during learning and how the brain adapts – plasticity.

Much of what is known about plasticity of the auditory system is due to deprivation in animal study.

Both cochlea and brainstem are organized tonotopically and this organization is reflected in auditory cortex.

After cochlea or brainstem are lesioned some frequencies are no longer transmitted to auditory cortex and then cortex is studied for changes reflecting neural plasticity.

Changes in neural response in auditory cortex were observed in human after sudden hearing loss.

Children with hearing loss showed some maturational lag comparing to typical development, however after having cochlear implants, their auditory system continued to mature in a typical fashion.

This indicates plasticity of the auditory cortex.

Learning and plasticityLearning and plasticity

EE14142

Plasticity due to learning was observed in laboratory animals using classical conditioning – presented tones were paired with mild electrical shock so the animal learned sounds more relevant to survival (avoiding shock).

Plasticity related changes were more pronounced for higher motivational levels. Trained tones were

4.1-8kHz and motivational levels

were high (red) medium (black) and

low (blue)

Learning and plasticityLearning and plasticity

Cortical area change for the desired signal frequency for different motivational levels

Untrained Trained

EE14143

Auditory system is the last to fall asleep and the first to wake up. People in sleep respond to their names better than to other sounds. Figure compares responses in auditory cortex during awaken and

sleep states.

Auditory awarenessAuditory awareness

EE14144

Sounds are played in our head all day even if we do not hear them. Some are voluntary and uncalled for like a melody or your inner voice. Some are planned like when you rehearse a verse or a telephone

number in your head. Halpern and colleagues (2004) showed that non-primary auditory

cortex is active during imagined (and not heard) sounds.

Auditory imageryAuditory imagery

Brain areas active for imagined sounds

EE14145

A related results were obtained by Jancke and colleagues (2005). They used fMRI images to compare neural responses to real sounds

and to imagined sounds. Imagined sounds activate similar regions in auditory cortex as the real ones.

Auditory imageryAuditory imagery

EE14146

We discussed organization of the acoustic system Learned sound and hearing basics Traced auditory pathways Analyzed organization of auditory cortex Observed functional mapping of auditory processing Discussed sound and music perception Effect of learning on sound processing Research on animals confirmed existence of ‘what’ and

‘where’ pathways in auditory system, however these pathways may be organized differently in humans.

When you hear uncalled melody in your head, think which of your brain areas are activated.

SummarySummary

Download - EE141 1 Hearing and Speech Janusz A. Starzyk Based on book Cognition, Brain and Consciousness ed. Bernard J. Baars Cognitive Architectures

Top Related