word-subword based keyword spotting with implications in oov detection

Click here to load reader

Download Word-subword based keyword spotting with implications  in OOV detection

Post on 06-Feb-2016

27 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Word-subword based keyword spotting with implications in OOV detection. Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University of Techbnology BUT Speech@FIT 44 th Asilomar Conference on Signals, Systems and Computers, 8.11.2010. Agenda. - PowerPoint PPT Presentation

TRANSCRIPT

  • Word-subword based keyword spotting with implications in OOV detection

    Jan Honza ernock, Igor Szke, Mirko Hannemann, Stefan Kombrink

    Brno University of TechbnologyBUT Speech@FIT

    44th Asilomar Conference on Signals, Systems and Computers, 8.11.2010

    Zhlav (99.99.9999)

  • Agenda Word-based STD, OOV problem, subwordsExperimentsSub-word unitsHybrid word-subword system What can we do with OOVs Conclusion

    Zhlav (99.99.9999)

  • Goal of STD and glossary of termsGoal: detect keywords or key-phrases in input speech, for each detection, output:IdentityPositionScore

    Glossary Large Vocabulary Continuous Speech Recognizer LVCSR system converting spoken speech into text.Out-of-vocabulary OOV word which is not in the LVCSR vocabulary.Term textual entry consisting of one or more words in sequence.Spoken Term Detection STD a way to search for a term in spoken data.Subword(s) unit(s) that are parts of words (phones, syllables, automatically found, etc.).

    Zhlav (99.99.9999)

  • Word-based STDDue to the presence of language model, Word-based STD systems are reaching better accuracies than acoustic ones.

    Zhlav (99.99.9999)

  • ImplementationTerm is searched in recognition lattice Allows to estimate posterior probability of a term.

    Zhlav (99.99.9999)

  • The OOV problemREF: THIS IS AN EXAMPLE OF RECOGNIZER OUTPUTREC: THIS IS AMEX APPLE OF RECOGNIZER OUTPUT

    One OOV causes several errors:OOV can not be found (in the output of LVCSR).OOV impairs recognition of neighboring words.OOV usually carries lot of information (named entity).

    We need to handle OOVs ! Word accuracy.Spoken term detection accuracy.Practical (memory, CPU, index size, etc.).

    Zhlav (99.99.9999)

  • Answer to OOV problem sub-word STDSubword recognizer is built (output is subword lattice).Term is converted from words to sequence of subwords.This sequence is searched in the subword lattice.

    Zhlav (99.99.9999)

  • AgendaWord-based STD, OOV problem, subwordsExperimentsSub-word unitsHybrid word-subword system What can we do with OOVs Conclusion

    Zhlav (99.99.9999)

  • Evaluation - TWVDefined by NIST for NIST STD 2006 evaluation:

    one numberhigher is betterdepending on normalizationRequires full STD system

    Zhlav (99.99.9999)

  • Normalization-independent evaluation - UBTVWUBTWV - Upper Bound Term Weighted Value

    Finds optimum threshold for each termone numberhigher is betterIndependent on normalization

    Zhlav (99.99.9999)

  • DataNIST STD 2006 evaluations.3h of English telephone conversations.373 1-4 words long terms occurring 4737/196 times.

    Zhlav (99.99.9999)

  • Recognizer I.LVCSR developed in AMI/AMIDA projectState-of the art system including VTLN, MPE, posterior features, SAT, 3 passes. Acoustic models trained on 278h of speech.Language model trained on 977M word tokens (50k vocabulary).Dictionary pruned to generate OOVs -> WRDRED. Word accuracy 69.04%.

    Zhlav (99.99.9999)

  • Recognizer II.

    Zhlav (99.99.9999)

  • ResultsWordsWords converted to phonesPhone recognizer

    Phones too small => need longer units

    Zhlav (99.99.9999)

  • AgendaWord-based STD, OOV problem, subwordsExperimentsSub-word unitsHybrid word-subword system What can we do with OOVs Conclusion

    Zhlav (99.99.9999)

  • Better subwords phone multigramsStatistics of phone n-grams are collected (up to 6) from training data (phone transcriptions of speech).Probabilities of all units are estimated.Training data are segmented by the most probable sequence of multigrams.Statistics are recomputed and low occurring units are deleted. Several iterations.N-gram language model is estimated on top of the multigram segmentation of the training data.

    Zhlav (99.99.9999)

  • Constrained multigramsnosil sil is not part of multigram unit.noxwrd add information of word boundary to multigram unit.

    Term (word representation): PRIME MINISTERTerm pronunciation: p r ay m m ih n ih s t axrTerm (subword representation): *p-r-ay m* *m-ih-n ih-s t-axr*

    Zhlav (99.99.9999)

  • ResultsSubword search can process OOV terms.Subword search is not so accurate as word search of in-vocabulary terms.Subword search consumes more index space.

    => Need for combination of word and subword searches.

    Zhlav (99.99.9999)

  • AgendaWord-based STD, OOV problem, subwordsExperimentsSub-word unitsHybrid word-subword system What can we do with OOVs Conclusion

    Zhlav (99.99.9999)

  • Parallel word-subword works, but needs to maintain and run 2 systems.

    Zhlav (99.99.9999)

  • Hybrid word-subword

    Zhlav (99.99.9999)

  • Implementation by composition of networks

    Zhlav (99.99.9999)

  • Multigram dictionary for hybrid systemFor hybrid system, phone multigrams must not be trained on utterances.Phone multigrams are trained on dictionary.Experimented with LVCSR vs. big vs. OOV dictionary.

    Zhlav (99.99.9999)

  • Results different configurationsPruning factors play role in the memory consumption, size of index, RT factor Reasonable system~2.5x slower than word~2.5x bigger index than wordMatches the accuracy of word system for IVOOVs found.

    Zhlav (99.99.9999)

  • AgendaWord-based STD, OOV problem, subwordsExperimentsSub-word unitsHybrid word-subword system What can we do with OOVs Conclusion

    Zhlav (99.99.9999)

  • OOV detection by the hybrid systemComparison of the subword confidence measure

    to a threshold => detection of OOVs

    Zhlav (99.99.9999)

  • OOV recovery Use of phoneme to grapheme (P2G) to derive word-form of detected OOV

    Zhlav (99.99.9999)

  • Alignment error modelSome detected OOVs could be even converted back to in-vocabulary words ! But the phone pronunciation in 1-best output is not ideal alignment error modelParameters (probabilities of deletion, insertion, substitution) trained from data. Can process dictionary and look up detected OOVs.

    Zhlav (99.99.9999)

  • Going more complex Can construct an wFST accounting for Sequences of in-vocabulary wordsIn-vocabulary words + common pre- and suffixesOOVsAnd combinations

    m ey sh en -> INFORMATIONae l k ax hh aa l ih z em (ALCOHOLISM) -> ALCOHOL / ISMaa f ax s m ae k s (Office Max) -> OFFICE OOV1572

    Zhlav (99.99.9999)

  • OOV clusteringAlignment model allows for the evaluation of similarityClustering possible

    Zhlav (99.99.9999)

  • AgendaWord-based STD, OOV problem, subwordsExperimentsSub-word unitsHybrid word-subword system What can we do with OOVs Conclusion

    Zhlav (99.99.9999)

  • ConclusionSubword system with constrained multigrams - very good STD performace and OOV tolerant system.Improved hybrid word-subword system tested from STD accuracy and real application point of view.Hybrid system brings better accuracy/size ratio and is faster than the standalone system.It works well in a real indexing & search engine.With a hybrid system, we can Recover OOVs (simple P2G or more elaborate model)Measure similarity of OOVsCluster them, find re-occurring ones, update vocabulary.

    Zhlav (99.99.9999)

  • Reading and playing withIgor Szke: Hybrid word-subword spoken term detection, Ph.D. thesis, Brno University of Technology, Oct 2010Stefan Kombrink, Mirko Hannemann, Luk Burget, and Hynek Hemansk: Recovery of Rare Words in Lecture Speech, in Proc. Text, Speech and Dialogue (TSD) 2010, Brno, 2010Mirko Hannemann, Stefan Kombrink, Martin Karafit, and Luk Burget: Similarity Scoring for Recognizing Repeated Out-of-VocabularyWords, in Proc. Interspeech 2010, Makuhari, Japan, 2010. Publications section of http://speech.fit.vutbr.cz/ http://www.superlectures.com/odyssey/

    Zhlav (99.99.9999)

  • Thank you for your attention

    Zhlav (99.99.9999)

View more