3 rd progress meeting for sphinx 3.6 development
DESCRIPTION
3 rd Progress Meeting For Sphinx 3.6 Development. Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006. This meeting . 3 rd Progress report on 3.6 development (40 pages) Agenda What happened in Fall 2005? (4 slides) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/1.jpg)
3rd Progress Meeting For Sphinx 3.6 Development
Arthur Chan,David Huggins-Daines,
Yitao SunCarnegie Mellon University
Jan 25, 2006
![Page 2: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/2.jpg)
This meeting 3rd Progress report on 3.6 development
(40 pages) Agenda
What happened in Fall 2005? (4 slides) Progress of Sphinx Development in Fall
2005 (17 slides) Summary of Progress in 2005 (10 slides) Discussion: Should we create one release
candidate? (1 slide)
![Page 3: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/3.jpg)
What happened in FALL 2005?
![Page 4: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/4.jpg)
What happened in Fall 2005? Major Events in Sphinx Development
We participate GALE in Oct 2006 Conformance of the recognizers (sphinx 3 and sphinx 4)
become an issue Lack of advanced acoustic modeling techniques become
very glaring Sphinx 3 and 4 have gone through bug fixes.
CALO effort are now split to two Off-line recognizer: require major improvement in LM and
AM. AM Issue is shared with GALE
On-line recognizer (CALO jargon: Smartnote) Now have new LM and AM Require significant development work
![Page 5: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/5.jpg)
Time distribution (Estimated) Arthur
50% on GALE, 20% on CALO, 30% on Sphinx
Dave 65% CALO, 30% on
PocketSphinx, 5% on Sphinx
Yitao 90% CALO, 10% on
Sphinx
GALE
CALO
SphinxDev
GALE
CALO
SphinxDevPocketSphinx
GALE
CALO
SphinxDevPocketSphinx
![Page 6: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/6.jpg)
The Two Funded Projects Upside:
They point to issues that need to be solved Need significant reprioritization of tasks
Balance of effort on the 2 projects is now achieved
Downside: Code development of Sphinx becomes a
slower process Also, we haven’t released s3 for a while => Should we release the code now?
Tired students and staffs can be found everywhere
![Page 7: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/7.jpg)
Progress of Sphinx 3.6 in FALL 2005
![Page 8: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/8.jpg)
Overview Work on second-stage
Merging of bestpath search in the 2-nd stage of tree search
IBM lattice generation word confidence estimation
Behavior changes and bug fixes Treatment of acoustic scores Assertion in vithist.c
Attempts in search algorithm improvements Mode 3 – Flat lexicon decoding Mode 4 – Tree lexicon decoding
Sphinx on Mandarin and coded language. New tools: conf, dp
![Page 9: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/9.jpg)
Work Schedule Sep 1 to Oct 1:
Implementation of triphones in flat lexicon decoder Oct 1 to Nov 1:
Implementation of triphones on tree lexicon decoder (incomplete)
Nov 1 to Dec 8: IBM lattice generation Confidence score generation Fixed issues in scores
Dec 8 to Jan 3: Concept of “vacation” was tried Jan 3 to now:
Fixed bugs, prepare release.
![Page 10: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/10.jpg)
Second-stage Processing Best-path search could now be specified in
decode Implementation requires write back. (urgh.)
Recognizer can now generate lattice in IBM format
Word is attached at the link Sphinx format generates word attached to the node. Scores are normalized with best senone scores
Rong’s confidence-based routine is now in Sphinx conf Goodies: use Sphinx logs3 routine -> significantly
reduce alpha-beta scores mismatch.
![Page 11: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/11.jpg)
Second-stage Processing (cont.) Further work
Best-path generation doesn’t conform to past 3.5 -> Bugs caused by 3.6 development
Also, the best path is not always in the lattice -> Legacy bug
Confidence-based method Lattice-based : could only be used off-line currently 10% of the data still have alpha-beta mismatch
Consensus network generation need special focus
![Page 12: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/12.jpg)
Scores we see (Change 1) Tree search now truly generate un-
normalized scores. was normalized by the ending frame only Caused by bug introduced in mid-2005
All 1-st stage search use the same score logging functions Include align, allphone, decode_anytopo, decode matchseg_write, match_write are the current
versions log_* is still used but will soon be totally replaced
![Page 13: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/13.jpg)
Scores we see(Change 2) Multi-stream GMM computation
(ms_gauden) By default, it won’t quantize log pdf to 8 bits now
Single-stream GMM computation Vectors with zero means and variances are
removed (-remove_zero_var_gau) Scores and performance will change
Testing resource has changed. (Evandro grins at this point)
![Page 14: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/14.jpg)
Scores we see (Change 3) Sphinx now supports generation of
different hypseg format (-hypseg_fmt) SPHINX 2-format SPHINX 3-format ctm format
Always require more processing, but it is better than nothing.
![Page 15: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/15.jpg)
Scores – a summary Unnormalized (true) acoustic and language
scores generated by (-hypsegscore_unscale) 1-st stage search and Best path search right after the 1-st stage
Normalized acoustic score would be generated by
Lattice generation If developers wants to have true scores in
lattice Developers could get the best scores from the
decoder (–bestsenscrdir) and do their own processing
![Page 16: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/16.jpg)
Other important bug fixes Bug in vithist.c
Caused assertion and stop the recognizer
Now fix and will return error message to the search abstraction routine.
![Page 17: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/17.jpg)
Attempts in search algorithm improvements (Mode 3) Flat-lexicon decoder
Search implementation is completed decode could now use flat-lexicon decoding
-op_mode 3 Decoders revamping is completed
Mode 2 (FST) Mode 3 (Flat-lexicon) Mode 4 (Ravi’s Tree-Lexicon) Mode 5 (Arthur’s Tree-Lexicon)
decode_anytopo is still there for backward compatibility purpose
decode_anytopo = decode in mode 3
![Page 18: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/18.jpg)
No Further Re-factoring Avoid re-factoring before next check-in Align and allphone have different
input/output file formats It doesn’t make sense to stuff into a single
executable. Using XML configuration and control file
will be a choice But it takes too much time to implement
![Page 19: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/19.jpg)
Algorithmic Work -Flat Lexicon Decoder Full triphone completed in flat-lexicon
decoding 2.5% relative improvement in accuracy But requires 100xRT (urgh) Useful for debugging
Also considered full trigram implementation Will results in another 5-10 times slow down
Conclusion Flat lexicon search has come to its limit
![Page 20: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/20.jpg)
Algorithmic Work -Tree Lexicon Decoder Current full triphone implementation
Has flaws in score propagation Tree copies
No time to do it at all, Q4’s workload nearly kill AC Benchmarking results
GALE results: Full Lexicon = Tree Lexicon
CALO/Communicator results: Tree Lexicon 5% relative poorer.
Conclusion Half a year on search is expected to give us another
5%
![Page 21: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/21.jpg)
Conclusion on Search Need to seriously consider
Is working on search a good idea? In both CALO/GALE, gain come from
SAT and cross adaptation Second-stage processing
Confusion network Confidence annotation First-stage SD -> Second-stage SA
VTLN also only give 5% rel but it only takes 5 days to implement
![Page 22: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/22.jpg)
Sphinx on Different Text Encodings There are already non-CMU work
for Spanish French
Big question mark Could it work on other encoding?
![Page 23: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/23.jpg)
Sphinx on Mandarin (gb2312)
![Page 24: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/24.jpg)
Sphinx on Mandarin (cont.) Thanks to Ravi Bugs we fixed to get it through
1236322: libutil\str2words special character bug
1236166: special character wasn't supported
This should give us fairly good foundation to start on most language
![Page 25: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/25.jpg)
Summary of Sphinx in Fall 2005 We have done something Strong focus in search research
doesn’t seem to get us far. Fire to fight on the modeling side Sounds like the time to check in
and move on
![Page 26: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/26.jpg)
Progress of Sphinx 3.X (From X=5 to X=6)
![Page 27: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/27.jpg)
Progress of Sphinx 3.X(From X=5 to X=6) New Features (4 slides)
Items that are significant Gentle, mild and simple re-factoring
and its consequence (4 slides) Documentation (1 slide) Regression testing (1 slide) Pruned Features ?
![Page 28: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/28.jpg)
New Features (Search) Speed
Further enhancement of CIGMMS BBI tree implementation (by Dave, in
SphinxTrain) Search
FST search Full triphone implementation in
decode_anytopo Separation of search
abstraction/implementation in 3.X
![Page 29: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/29.jpg)
New Features (Adaptation) Adaptation
Multiple classes for MLLR (by Dave) MAP adaptation (by Dave, in
SphinxTrain)
![Page 30: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/30.jpg)
New Features (Others) New executables
lm_convert lm3g2dmp++
dp If Evandro ask, “Why do we need dp in sphinx 3?” Say this, “I don’t know, we found the executable
at ./s3/src/misc/dp.c” conf
Off-line word-level confidence annotation program Mismatch dict-LM
Un-match entries could be automatically generated (-lts_mismatch)
![Page 31: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/31.jpg)
Gentle, mild and simple re-factoring (GMM computation) GMM computation is now shared
among decode, decode_anytopo, align,
allphone So e.g.
decode_anytopo could use fast GMM computation
decode could use SCHMM
![Page 32: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/32.jpg)
Gentle, mild and simple re-factoring (Search) Its consequence in search
programming: FST, Flat, Tree search now share the
same interface (decode) Just like Sphinx 2 and 4
Writing a new search won’t be replacing a search
2-nd stage now works for decode Alright, not for FST search
![Page 33: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/33.jpg)
Gentle, mild and simple re-factoring (Others) Scores output now rationalized Several bug fixes causing seg faults
are eliminated Vithist.c bugs Class-based LM is now working correctly
Command-line among applications are now synchronized and re-factored
![Page 34: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/34.jpg)
Documentation/Tutorial Hieroglyph
Now writing 2nd draft Doxygen documentation (by Evandro) Tutorial now works
archive_s3 Sphinx 2 Sphinx 3 Sphinx 4
![Page 35: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/35.jpg)
Regression Testing Our weakest link Now daily
Standard regression test is done Performance check on
Communicator/TIDIGITs/TI46 doxygen documentation will be made and
tested make check now has 50 tests (3.5: 11)
fairly robust to careless mistakes
![Page 36: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/36.jpg)
Expected Trimmed Features Search
Mode 0: alignment (?) Mode 1: allphone Mode 5: word tree copies
If full triphone in Ravi’s tree search couldn’t be quickly, trimmed it as well
(?) Yitao’s PCFG rescoring
![Page 37: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/37.jpg)
Conclusion of Sphinx 3.X (From X=5 to X=6) We have done something Development last year
has enriched the code Niceify a lot of things internal to code
There are hiccups in our development Not perfect Well, compare this with NASDAQ.
![Page 38: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/38.jpg)
Discussion:What should we do now? Option 1, keep on working without
release Option 2, merge the crazy branch
with the trunk without release Option 3, merge the crazy branch
with the trunk and create release-candidate Sphinx 3.6 RCI
![Page 39: 3 rd Progress Meeting For Sphinx 3.6 Development](https://reader036.vdocuments.net/reader036/viewer/2022062301/56816092550346895dcfb557/html5/thumbnails/39.jpg)
End