automatic speech attribute transcription (asat) project period: 10/01/04 – 9/30/08 the asat team...
TRANSCRIPT
Automatic Speech Attribute Transcription (ASAT)
• Project Period: 10/01/04 – 9/30/08• The ASAT Team
– Mark Clements ([email protected])
– Sorin Dusan ([email protected])
– Eric Fosler-Lussier ([email protected])
– Keith Johnson ([email protected])
– Fred Juang ([email protected])
– Larry Rabiner ([email protected])
– Chin Lee (Coordinator, [email protected])
• NSF HLC Program Director: ([email protected])
ASAT Paradigm and SoW
4
1
2 3
5. Overall System Prototypes and Common Platform
1. Bank of Speech Attribute Detectors
• Each detected attribute is represented by a time series (event)– An example: frame-based detector (0-1 simulating posterior probability)
• ANN-based Attribute Detectors– An example: nasal and stop detectors
• Sound-specific parameters and feature detectors– An example: “VOT” for V/UV stop discrimination
• Biologically-motivated processors and detectors– Analog detectors, short-term and long-term detectors
• Perceptually-motivated processors and detectors– Converting speech into neural activity level functions
• Others?
Nasal
j+ve d+ing z+ii j+i g+ong h+e g+uo d+e m+ing +vn
Stop
Vowel
XX
An Example: More Visible than Spectrogram?
Early acoustic to linguistic mapping !!
2. Event Merger
• Merge multiple time series into another time series– Maintaining the same detector output characteristics
• Combine temporal events– An example: combining phones into words (word detectors)
• Combine spatial events– An example: combining vowel and nasal features into
nasalized vowels
• Extreme: Build a 20K-word recognizer by implementing 20K keyword detectors
• Others: OOV, partial recognition
3. Evidence Verifier
• Provide confidence measures to events and evidences– Utterance verification algorithms can be used
• Output recognized evidences (words and others)– Hypothesis testing is needed in every stage
• Prune event and evidence lattices– Pruning threshold decisions
• Minimum verification error (MVE) verifiers• Many new theories can be developed• Others?
Word and Phone Verifiers(/w/+/ /+/n/ = “one”)
4. Knowledge Sources: Definition & Evaluation
• Explore large body of speech science literature• Define training, evaluation and testing databases• Develop Objective Evaluation Methodology
– Defining detectors, mergers, verifiers, recognizers
– Defining/collecting evaluation data for all
• Document all pieces on the web
5. Prototype ASR Systems and Platform
• Continuous Phone Recognition: TIMIT?• Continuous Speech Recognition
– Connected digit recognition– Wall Street Journal– Switchboard?
• Establishment of a collaborative platform– Implementing divide-’n’-conquer strategy– Developing a user community
Summary
• ASAT Goal: Go beyond state-of-the-art• ASAT Spirit: Work for team excellence• ASAT team member responsibilities
– MAC: Event Fusion– SD: Perception-based processing– EF: Knowledge Integration (Event Merger)– KJ: Acoustic Phonetics– BHJ: Evidence Verifier– LRR: Attribute Detector– CHL: Overall