audio workgroup neuro-inspired speech recognition group members ismail uysalyoojin chung ramin...

34
Audio Audio Workgroup Workgroup Neuro-inspired Speech Neuro-inspired Speech Recognition Recognition Group Members Group Members Ismail Uysal Yoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson Shihab Shamma Hynek Hermanski Shih-Chii Liu Giacomo Indiveri Malcolm Slaney

Upload: jacob-mcelroy

Post on 27-Mar-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Neuro-inspired Speech RecognitionNeuro-inspired Speech Recognition

Group MembersGroup MembersIsmail Uysal Yoojin ChungRamin Pichevar Rich Hammett Tarek Massoud Ross GaylorDavid Anderson Shihab ShammaHynek Hermanski Shih-Chii LiuGiacomo Indiveri Malcolm Slaney

Page 2: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Audio ProjectsAudio Projects

LocalizationLocalization

Speech Speech RecognitionRecognition

Speech Speech RecognitionRecognition More ASRMore ASRMore ASRMore ASR

Page 3: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Shihab is RunningShihab is Running

See http://www.hardrock100.com/index.asp

Shihab arriving in Telluride in 2004

(should happen around 4PM today)

Page 4: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Localization EffortLocalization Effort

Interaural Time Difference (ITD)

Estimated from time difference between spikes of two matching channels.

Interaural Intensity Difference (IID)

Difference of spike counts between two cochleae.

Azimuth: Combination of ITD and IID

ITD estimation from pure tones

Azimuth estimation from music

Speaker

Microphones

Page 5: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Localization EffortLocalization Effort

Page 6: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition

Page 7: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition

Field Programmable Analog Array (FPAA)—based analog cochlea (non-spiking) with envelope detection.

MOTE—based pattern matching using matched filtering with “receptive fields”

Robosapien—listens to the spoken commands….

Page 8: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition

Status:Status:FPAA – (we are using a new FPAA) 2nd-order sections synthesized but a full auditory filter bank is not yet up.

MOTE – real-time communication with Matlab and sampling operational.

Page 9: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Relational Network (Simple)Relational Network (Simple)

X Y

Z

MM

X

M

Y

M

Z

m

Patches of neurons

Each measureone quantity

Bidirectionalrelations for feedback/feedforward

Thanks to Rodney Douglas

Page 10: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Relational Network (example)Relational Network (example)

Input here

RelationalFeedback

Relational specification

Relational feedback

Page 11: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

ASR Relational NetworkASR Relational Network

Cochlea

Delay

Phone Recognizer

Word Recognizer

A patch of neurons(one of N output)Note: We don’t know

how to represent delays

Phone Recognizer

Bidirectional links enforce

phoneme/word constraints

Page 12: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Relational AdvantagesRelational Advantages

Not an HMMHMMs are great, but…

Incorporate other knowledgeBottom-up perception

Top-down word hypothesis

HallucinateBased on experience

Hear “ba..” and know thatBad, bat, bar, bass, band follow

>

Page 13: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Inner hair cells

Silicon CochleaSilicon Cochlea

Ganglion cells

Basilar membrane

highfrequency

lowfrequency

(van Schaik, Liu, 2004)

BASILAR MEMBRANE

INNER HAIR CELLS

GANGLION CELLS

Page 14: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Silicon Frequency ResponseSilicon Frequency Response

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Tone ramps into two cochleas

Page 15: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Cochlear Rate ProfilesCochlear Rate Profiles

Left Cochlea Right Cochlea

Spi

kes

per

utte

ranc

e

Page 16: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Learning AlgorithmsLearning Algorithms

StatisticalSAS (Pick best channels for decision)

Least squares (for software demo)

Liquid State MachineTake input to high dimensions with spiking net

Spike Timing Dependent Plasticity (STDP)Giocomo/Srinjoy Chip

Brader/Fusi

0 0.05 0.1 0.15 0.2 0.250

0.5

1

1.5

2

2.5

V1

V2

Vowel 1

Vowel 2

LSM Spiking Output

Page 17: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Phoneme 1 Phoneme 2 Phoneme 2

Learning Chip ArchitectureLearning Chip Architecture

ImmediateCochlea

Pla

stic

sy

naps

esDelayedCochlea

Phoneme 1

Cochlea Chip

Learning ChipNeurons

Relational Network

Non

plas

tic

syna

pses

Exc

it.

Inhi

b.

Bin

ary

syna

ptic

w

eigh

ts:

, ,

Page 18: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Tone ResultsTone Results

Tone recognitionSpike input from silicon cochlea

TrainingTwo tones

Duplicated input

Positive and negative examples

Testing

Page 19: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Phoneme recognitionSpike input from silicon cochlea

TrainingTwo phonemes

Duplicated inputs

Positive and negative examples

Testing

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Phoneme ResultsPhoneme Results

Page 20: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Behind the CurtainBehind the Curtain

Page 21: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Hardware OverviewHardware Overview

Cochlea

Learning

LearningLearning

PhonemeWord

PCI-AER (for remapping)

PCI-AER (for remapping)

Cochlea

Shih-Chii LiuGiacomo Indiveri

Implemented in MATLAB

Page 22: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Infrastructure DifficultiesInfrastructure Difficulties

RemapperEnsuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow)

PowerThe unpredictable problem caused by the variation in supply voltage as much as 1V.

Sharing chipsThe learning chip had to be shared with two other workgroups.

PC replacement

Page 23: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Impedance DifficultiesImpedance Difficulties

Cochlear firing ratesCochlea: 6M spikes/second

30k channels, 200 spikes/second

Silicon Cochlea: 30k spikes/second30 channels, 1k spike/second

Learning Chip: 3k spikes/second30 channels, 100 spikes/second

Dynamic range

Page 24: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Desired ResultsDesired Results

/A/ Phoneme Patch

/I/ Phoneme Patch

AI Word Patch

IA Word Patch

A A A IPhoneme Input

Relational Feedback

Without With

Page 25: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

SimulationSimulation

Page 26: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Simulation 2Simulation 2

Page 27: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Simulation 3Simulation 3

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 28: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Great Job!Great Job!

Student MembersStudent MembersIsmail Uysal Yoojin ChungRamin Pichevar Rich Hammett Tarek Massoud Ross Gaylor

Page 29: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 30: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Silicon CochleaSilicon Cochlea

0 20 40 60 800

0.5

1

1.5

2x 10

5

Channel Number

Mean firing rate

Mean firing rates in response to two tones

/a//i/

0 2 4 6 8 10 12 14

x 105

10

15

20

25

30

35

40

45

50

55Raster plots for two different tones

Time in microseconds

Channel number

200Hz1000Hz

Raster plot for two different tone inputs

Mean firing rates for two different vowel inputs

Channel Number

Cha

nnel

Num

ber

Time in microseconds

Page 31: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Word RecognizerWord Recognizer

Four example raster plot (silence, A_, A_ with relational, AI)

Page 32: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Software SimulationSoftware Simulation

Page 33: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Software SimulationSoftware Simulation

Page 34: Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson

Audio WorkgroupAudio Workgroup

Behind the CurtainBehind the Curtain