automatic speech attribute transcription (asat) project period: 10/01/04 – 9/30/08 the asat team...

10
Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team Mark Clements ([email protected]) Sorin Dusan ([email protected]) Eric Fosler-Lussier ([email protected]) Keith Johnson ([email protected]) Fred Juang ([email protected]) Larry Rabiner ([email protected]) Chin Lee (Coordinator, [email protected]) NSF HLC Program Director: ([email protected])

Upload: sharyl-hunter

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements (clements@ece.gatech.edu) –Sorin Dusan

Automatic Speech Attribute Transcription (ASAT)

• Project Period: 10/01/04 – 9/30/08• The ASAT Team

– Mark Clements ([email protected])

– Sorin Dusan ([email protected])

– Eric Fosler-Lussier ([email protected])

– Keith Johnson ([email protected])

– Fred Juang ([email protected])

– Larry Rabiner ([email protected])

– Chin Lee (Coordinator, [email protected])

• NSF HLC Program Director: ([email protected])

Page 2: Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements (clements@ece.gatech.edu) –Sorin Dusan

ASAT Paradigm and SoW

4

1

2 3

5. Overall System Prototypes and Common Platform

Page 3: Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements (clements@ece.gatech.edu) –Sorin Dusan

1. Bank of Speech Attribute Detectors

• Each detected attribute is represented by a time series (event)– An example: frame-based detector (0-1 simulating posterior probability)

• ANN-based Attribute Detectors– An example: nasal and stop detectors

• Sound-specific parameters and feature detectors– An example: “VOT” for V/UV stop discrimination

• Biologically-motivated processors and detectors– Analog detectors, short-term and long-term detectors

• Perceptually-motivated processors and detectors– Converting speech into neural activity level functions

• Others?

Page 4: Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements (clements@ece.gatech.edu) –Sorin Dusan

Nasal

j+ve d+ing z+ii j+i g+ong h+e g+uo d+e m+ing +vn

Stop

Vowel

XX

An Example: More Visible than Spectrogram?

Early acoustic to linguistic mapping !!

Page 5: Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements (clements@ece.gatech.edu) –Sorin Dusan

2. Event Merger

• Merge multiple time series into another time series– Maintaining the same detector output characteristics

• Combine temporal events– An example: combining phones into words (word detectors)

• Combine spatial events– An example: combining vowel and nasal features into

nasalized vowels

• Extreme: Build a 20K-word recognizer by implementing 20K keyword detectors

• Others: OOV, partial recognition

Page 6: Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements (clements@ece.gatech.edu) –Sorin Dusan

3. Evidence Verifier

• Provide confidence measures to events and evidences– Utterance verification algorithms can be used

• Output recognized evidences (words and others)– Hypothesis testing is needed in every stage

• Prune event and evidence lattices– Pruning threshold decisions

• Minimum verification error (MVE) verifiers• Many new theories can be developed• Others?

Page 7: Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements (clements@ece.gatech.edu) –Sorin Dusan

Word and Phone Verifiers(/w/+/ /+/n/ = “one”)

Page 8: Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements (clements@ece.gatech.edu) –Sorin Dusan

4. Knowledge Sources: Definition & Evaluation

• Explore large body of speech science literature• Define training, evaluation and testing databases• Develop Objective Evaluation Methodology

– Defining detectors, mergers, verifiers, recognizers

– Defining/collecting evaluation data for all

• Document all pieces on the web

Page 9: Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements (clements@ece.gatech.edu) –Sorin Dusan

5. Prototype ASR Systems and Platform

• Continuous Phone Recognition: TIMIT?• Continuous Speech Recognition

– Connected digit recognition– Wall Street Journal– Switchboard?

• Establishment of a collaborative platform– Implementing divide-’n’-conquer strategy– Developing a user community

Page 10: Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements (clements@ece.gatech.edu) –Sorin Dusan

Summary

• ASAT Goal: Go beyond state-of-the-art• ASAT Spirit: Work for team excellence• ASAT team member responsibilities

– MAC: Event Fusion– SD: Perception-based processing– EF: Knowledge Integration (Event Merger)– KJ: Acoustic Phonetics– BHJ: Evidence Verifier– LRR: Attribute Detector– CHL: Overall