automatic speech attribute transcription (asat) project period: 10/01/04 – 9/30/08 the asat team...

Post on 14-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Automatic Speech Attribute Transcription (ASAT)

• Project Period: 10/01/04 – 9/30/08• The ASAT Team

– Mark Clements (clements@ece.gatech.edu)

– Sorin Dusan (sdusan@speech.rutgers.edu)

– Eric Fosler-Lussier (fosler@cse.ohio-state.edu)

– Keith Johnson (kjohnson@ling.ohio-state.edu)

– Fred Juang (juang@ece.gatech.edu)

– Larry Rabiner (lrr@caip.rutgers.edu)

– Chin Lee (Coordinator, chl@ece.gatech.edu)

• NSF HLC Program Director: (mharper@nsf.gov)

ASAT Paradigm and SoW

4

1

2 3

5. Overall System Prototypes and Common Platform

1. Bank of Speech Attribute Detectors

• Each detected attribute is represented by a time series (event)– An example: frame-based detector (0-1 simulating posterior probability)

• ANN-based Attribute Detectors– An example: nasal and stop detectors

• Sound-specific parameters and feature detectors– An example: “VOT” for V/UV stop discrimination

• Biologically-motivated processors and detectors– Analog detectors, short-term and long-term detectors

• Perceptually-motivated processors and detectors– Converting speech into neural activity level functions

• Others?

Nasal

j+ve d+ing z+ii j+i g+ong h+e g+uo d+e m+ing +vn

Stop

Vowel

XX

An Example: More Visible than Spectrogram?

Early acoustic to linguistic mapping !!

2. Event Merger

• Merge multiple time series into another time series– Maintaining the same detector output characteristics

• Combine temporal events– An example: combining phones into words (word detectors)

• Combine spatial events– An example: combining vowel and nasal features into

nasalized vowels

• Extreme: Build a 20K-word recognizer by implementing 20K keyword detectors

• Others: OOV, partial recognition

3. Evidence Verifier

• Provide confidence measures to events and evidences– Utterance verification algorithms can be used

• Output recognized evidences (words and others)– Hypothesis testing is needed in every stage

• Prune event and evidence lattices– Pruning threshold decisions

• Minimum verification error (MVE) verifiers• Many new theories can be developed• Others?

Word and Phone Verifiers(/w/+/ /+/n/ = “one”)

4. Knowledge Sources: Definition & Evaluation

• Explore large body of speech science literature• Define training, evaluation and testing databases• Develop Objective Evaluation Methodology

– Defining detectors, mergers, verifiers, recognizers

– Defining/collecting evaluation data for all

• Document all pieces on the web

5. Prototype ASR Systems and Platform

• Continuous Phone Recognition: TIMIT?• Continuous Speech Recognition

– Connected digit recognition– Wall Street Journal– Switchboard?

• Establishment of a collaborative platform– Implementing divide-’n’-conquer strategy– Developing a user community

Summary

• ASAT Goal: Go beyond state-of-the-art• ASAT Spirit: Work for team excellence• ASAT team member responsibilities

– MAC: Event Fusion– SD: Perception-based processing– EF: Knowledge Integration (Event Merger)– KJ: Acoustic Phonetics– BHJ: Evidence Verifier– LRR: Attribute Detector– CHL: Overall

top related