audio workgroup neuro-inspired speech recognition group members ismail uysalyoojin chung ramin...

Audio WorkgroupAudio Workgroup

Neuro-inspired Speech RecognitionNeuro-inspired Speech Recognition

Group MembersGroup MembersIsmail Uysal Yoojin ChungRamin Pichevar Rich Hammett Tarek Massoud Ross GaylorDavid Anderson Shihab ShammaHynek Hermanski Shih-Chii LiuGiacomo Indiveri Malcolm Slaney


Audio ProjectsAudio Projects

LocalizationLocalization

Speech Speech RecognitionRecognition

Speech Speech RecognitionRecognition More ASRMore ASRMore ASRMore ASR


Shihab is RunningShihab is Running

See http://www.hardrock100.com/index.asp

Shihab arriving in Telluride in 2004

(should happen around 4PM today)


Localization EffortLocalization Effort

Interaural Time Difference (ITD)

Estimated from time difference between spikes of two matching channels.

Interaural Intensity Difference (IID)

Difference of spike counts between two cochleae.

Azimuth: Combination of ITD and IID

ITD estimation from pure tones

Azimuth estimation from music

Speaker

Microphones


Localization EffortLocalization Effort


FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition



Field Programmable Analog Array (FPAA)—based analog cochlea (non-spiking) with envelope detection.

MOTE—based pattern matching using matched filtering with “receptive fields”

Robosapien—listens to the spoken commands….



Status:Status:FPAA – (we are using a new FPAA) 2nd-order sections synthesized but a full auditory filter bank is not yet up.

MOTE – real-time communication with Matlab and sampling operational.


Relational Network (Simple)Relational Network (Simple)

X Y

Z

MM

X

M

Y

M

Z

m

Patches of neurons

Each measureone quantity

Bidirectionalrelations for feedback/feedforward

Thanks to Rodney Douglas


Relational Network (example)Relational Network (example)

Input here

RelationalFeedback

Relational specification

Relational feedback


ASR Relational NetworkASR Relational Network

Cochlea

Delay

Phone Recognizer

Word Recognizer

A patch of neurons(one of N output)Note: We don’t know

how to represent delays

Phone Recognizer

Bidirectional links enforce

phoneme/word constraints


Relational AdvantagesRelational Advantages

Not an HMMHMMs are great, but…

Incorporate other knowledgeBottom-up perception

Top-down word hypothesis

HallucinateBased on experience

Hear “ba..” and know thatBad, bat, bar, bass, band follow

>


Inner hair cells

Silicon CochleaSilicon Cochlea

Ganglion cells

Basilar membrane

highfrequency

lowfrequency

(van Schaik, Liu, 2004)

BASILAR MEMBRANE

INNER HAIR CELLS

GANGLION CELLS


Silicon Frequency ResponseSilicon Frequency Response

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Tone ramps into two cochleas




Cochlear Rate ProfilesCochlear Rate Profiles

Left Cochlea Right Cochlea

Spi

kes

per

utte

ranc

e


Learning AlgorithmsLearning Algorithms

StatisticalSAS (Pick best channels for decision)

Least squares (for software demo)

Liquid State MachineTake input to high dimensions with spiking net

Spike Timing Dependent Plasticity (STDP)Giocomo/Srinjoy Chip

Brader/Fusi

0 0.05 0.1 0.15 0.2 0.250

0.5

1

1.5

2

2.5

V1

V2

Vowel 1

Vowel 2

LSM Spiking Output


Phoneme 1 Phoneme 2 Phoneme 2

Learning Chip ArchitectureLearning Chip Architecture

ImmediateCochlea

Pla

stic

sy

naps

esDelayedCochlea

Phoneme 1

Cochlea Chip

Learning ChipNeurons

Relational Network

Non

plas

tic

syna

pses

Exc

it.

Inhi

b.

Bin

ary

syna

ptic

w

eigh

ts:

, ,


QuickTime™ and aTIFF (LZW) decompressor




Tone ResultsTone Results

Tone recognitionSpike input from silicon cochlea

TrainingTwo tones

Duplicated input

Positive and negative examples

Testing


Phoneme recognitionSpike input from silicon cochlea

TrainingTwo phonemes

Duplicated inputs

Positive and negative examples

Testing





Phoneme ResultsPhoneme Results


Behind the CurtainBehind the Curtain


Hardware OverviewHardware Overview

Cochlea

Learning

LearningLearning

PhonemeWord

PCI-AER (for remapping)

PCI-AER (for remapping)

Cochlea

Shih-Chii LiuGiacomo Indiveri

Implemented in MATLAB


Infrastructure DifficultiesInfrastructure Difficulties

RemapperEnsuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow)

PowerThe unpredictable problem caused by the variation in supply voltage as much as 1V.

Sharing chipsThe learning chip had to be shared with two other workgroups.

PC replacement


Impedance DifficultiesImpedance Difficulties

Cochlear firing ratesCochlea: 6M spikes/second

30k channels, 200 spikes/second

Silicon Cochlea: 30k spikes/second30 channels, 1k spike/second

Learning Chip: 3k spikes/second30 channels, 100 spikes/second

Dynamic range


Desired ResultsDesired Results

/A/ Phoneme Patch

/I/ Phoneme Patch

AI Word Patch

IA Word Patch

A A A IPhoneme Input

Relational Feedback

Without With




SimulationSimulation




Simulation 2Simulation 2


Simulation 3Simulation 3




Great Job!Great Job!

Student MembersStudent MembersIsmail Uysal Yoojin ChungRamin Pichevar Rich Hammett Tarek Massoud Ross Gaylor


Silicon CochleaSilicon Cochlea

0 20 40 60 800

0.5

1

1.5

2x 10

5

Channel Number

Mean firing rate

Mean firing rates in response to two tones

/a//i/

0 2 4 6 8 10 12 14

x 105

10

15

20

25

30

35

40

45

50

55Raster plots for two different tones

Time in microseconds

Channel number

200Hz1000Hz

Raster plot for two different tone inputs

Mean firing rates for two different vowel inputs

Channel Number

Cha

nnel

Num

ber

Time in microseconds


Word RecognizerWord Recognizer

Four example raster plot (silence, A_, A_ with relational, AI)


Software SimulationSoftware Simulation


Behind the CurtainBehind the Curtain

audio workgroup neuro-inspired speech recognition group members ismail uysalyoojin chung ramin...

Documents

audio workgroup slide

audio workgroup shihab

ai slide

asr slide

curtain slide

cochleas slide

utterance slide

atlab slide