carnegie mellon nod and multilingual status report april 1998 carnegie mellon university howard d....
Post on 22-Dec-2015
218 views
TRANSCRIPT
![Page 1: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/1.jpg)
CarnegieMellon
NoD and Multilingual Status ReportApril 1998
Carnegie Mellon UniversityHoward D. Wactlar
Digital Video LibraryDigital Video Library
![Page 2: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/2.jpg)
CarnegieMellon
MLI and NoD Tasks• Data collection & preparation - English, Serb-Croation, and
German
• Multilingual speech recognition enhancements
• Video and audio segmentation
• Multilingual indexing, retrieval, search
• Summarization-on-demand
• Annotations
• User studies
• Additional languages and functionalities
• Demonstration as a network-based service
![Page 3: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/3.jpg)
CarnegieMellon
Accomplishments to Apr 98
We are achieving what we proposed and beyond
• Advances in capability (research => integrated function)
• Infrastructure evolution & growth
• Testbed activity and extension
• Related research and outreach
![Page 4: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/4.jpg)
CarnegieMellon
Accomplishments to Apr 98 (cont’d)
• Serbo-Croation demonstration system
• Automated and dynamic abstraction and summarization for improved navigation
• Topic detection and assignment for subject browsing
• Dynamically improved speech recognition for index generation
• Coherent story segmentation through corpus specific, rule-based analysis
more ...
![Page 5: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/5.jpg)
CarnegieMellon
Accomplishments to Apr 98 (cont’d)
• Video-OCR for improved name/face identification
• Multi-level annotations to mark and share commentary
• Web interface enabling “slide show” viewing over slow links
• Database restructuring to enable size growth and function evolution
• Remote testbeds with access to daily updated news
![Page 6: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/6.jpg)
CarnegieMellon
Automated Abstraction and Summarization
• Critical to efficient navigation of video
• Improved automatic title generation
• Dynamic “poster frame” icons - query based
• Skims smoothed through enhanced language models and rule-based scene selection
![Page 7: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/7.jpg)
CarnegieMellon
![Page 8: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/8.jpg)
CarnegieMellon
“Naïve” Poster Frame Result List (Uses First Shot Image)
![Page 9: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/9.jpg)
CarnegieMellon
Query-based Poster Frame Result List
![Page 10: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/10.jpg)
CarnegieMellon
Query-based Poster Frame Selection Process
1. Decompose video segment into shots.2. Compute representative frame for each shot.
3. Locate query scoring words (shown by arrows).4. Use frame from highest scoring shot.
![Page 11: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/11.jpg)
CarnegieMellon
Enhances browsing and discovery over directed search
Different methods from several areas being evaluated
• Information retrieval - vector space methods - relevance feedback
• Speech recognition - hidden Markov models
• Statistics - k-nearest neighbors - exponential models
Topic Detection and Tracking
![Page 12: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/12.jpg)
CarnegieMellon
KNN-based Topic Detection
• Build training index with pre-labeled topics - 45000 Broadcast News stories from 1995 and 1996 - 3178 different news topics occurring > 10 times
• Search for top 10 related stories in training index
• Lookup topics for related stories
• Re-weight topics by story relevance (select top 5)
• At 5 topics, Recall - .491 Relevance - .482
![Page 13: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/13.jpg)
CarnegieMellon
Speech Recognition for Index Generation
• Integrate closed captioning with speech recognition generated transcription
• Improve accuracy by automatic daily expansion of language model from closed captioning e.g. “Dodi Fayed”
• Participated (with Claritech) in TREC Spoken Document track
– large text retrieval evaluation benchmarks (NIST/DARPA)
– scored second due to OOV words (CIA, well-known, torched)
![Page 14: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/14.jpg)
CarnegieMellon
Segmentation - Creating the Video Paragraph
Break up a video stream into semantically coherent pieces
• corpus-specific analysis
• language model approaches
• video structure analysis
![Page 15: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/15.jpg)
CarnegieMellon
Segmentation - Commercial Detection
Look for several potential indicators in multiple passes
• detect lapses in cc capture greater than some threshold
• occurrence of black frames
• rate of scene change and motion
![Page 16: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/16.jpg)
Ad Removal based on Black Frame and Scene Change Detection
Truth=>
Hypothesis=>
<= Black frames
<= Scene change
![Page 17: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/17.jpg)
CarnegieMellon
Segmentation - Language Models
Novel application to find shift in topic within a document
• Adaptive exponential language models improve as they see more material from current topic
e.g., probable distance of “managed care” to “physicians”
• Static language models are pre-computed likelihood of short-range adjacency (e.g. trigrams)
• Compare predictive performance models
i.e., assigned probability to the next observed words
• A segment boundary is likely to exist when the adaptive model shows a dip in performance relative to the short-range model
![Page 18: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/18.jpg)
CarnegieMellon
-0.05
0
0.05
0.1
0.15
0.2
0.25
-500 -400 -300 -200 -100 0 100 200 300 400 500
A plot of the ratio of the two language models as a function of the relative position in a segment.
![Page 19: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/19.jpg)
CarnegieMellon
Image component crucial to news corpus
Capture of text overlayed on the video image
Detected, filtered, OCR’d, incorporated into content and indexed
Video OCR
![Page 20: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/20.jpg)
CarnegieMellon
Video OCR Block Diagram
Text Area
Detection
Text Area
Preprocessing
Commercial
OCR
Video
ASCII Text
![Page 21: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/21.jpg)
CarnegieMellon
Video Frames(1/2 s intervals)
Filtered Frames AND-ed Frames
![Page 22: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/22.jpg)
CarnegieMellon
Text Detection False Alarms
Video Frame Filtered and Anded Frame
![Page 23: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/23.jpg)
CarnegieMellon
Text Detection Misses
Video Frame Filtered and Anded Frame
![Page 24: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/24.jpg)
CarnegieMellon
Challenges for VOCR Preprocessing
• The resolution of video text is very low (<10×10 ppc).
• Text detection and extraction are complicated by complex backgrounds.
![Page 25: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/25.jpg)
CarnegieMellon
VOCR Preprocessing Problems
![Page 26: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/26.jpg)
CarnegieMellon
![Page 27: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/27.jpg)
CarnegieMellon
Character recognition - 83%Word recognition - 70%
Language model post processing will improve word recognition rate, but new names and places will not be in language model
Important adjunct to Name-It: name/face correlation through co-occurrence matrices
Video OCR - Results
![Page 28: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/28.jpg)
CarnegieMellon
Annotation fields contain metadata automatically derived from the content (e.g. topics, chyron)
Annotations are included in the index (searchable separately or combined with transcript)
Personal annotations are typed or spoken comments that are established on a per user basis
• bookmarking or commentary
• fully indexed and searchable with other data
Annotations
![Page 29: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/29.jpg)
CarnegieMellon
Long-time concern about video fidelity on internet
Compromise is slide show of high quality JPEG images and continuous audio
Not all navigation tools translate directly
Required substantive change in interface specification
Browsing improved over full video interface
User effectiveness versus full video to be explored
Web Interface
![Page 30: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/30.jpg)
CarnegieMellon
Conversion of underlying database architecture (ONGOING)• extends functionality
- e.g. date filtering => “What’s new?” query• improved interoperability
- fully distributed, replicated function• increased scale• negative impact on query performance (improving)
Summer-long ruggedization program for reliable processing and quality control
900 hours on-line, terabyte data store
12 Alphas for parallel processing (and experiments)
Infrastructure Evolution and Growth
![Page 31: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/31.jpg)
CarnegieMellon
Corpus
• CNN data: 620 hours + 12 hrs/wkEarly Prime, World View, Impact, Science & Technology Week, Earth Matters, Travel Guide, Your Health
Distant high speed network access
• Informedia-Net attached to both vBNS and AAI nets
• enables attachment of clients to CMU servers from selected locations
• clients at DARPA, SPAWAR (forthcoming), NSA
Testbeds
![Page 32: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/32.jpg)
CarnegieMellon
Serbo-Croation LVCSR on the Dictation and Broadcast News Domain
• Informedia (English)– CMU Informedia Group (Howard Wactlar, Alex
Hauptmann, Ricky Houghton, et al.)– CMU Sphinx Group
• Multilingual Speech Recognition– CMU/UKA Interactive Systems Labs - JanusRTk (Alex
Waibel, Michael Finke, Petra Geutner, Peter Scheytt)• Translation/Cross Language Retrieval
– CMU Language Technologies Institute (Jaime Carbonell, Eric Nyberg, Bob Frederking, Paul Kennedy, et al.)
![Page 33: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/33.jpg)
CarnegieMellon
Serbo-Croation Broadcast News Recognition
• Initial database: Globalphone Serbo-Croation (UKA)• Broadcast news: Collected by satellite from Germany
(UKA)• 15 hours transcribed• Janus recognition toolkit: 15 languages• Janus applied to Serbo-Croation broadcast news• Problem: Morphology, large number of inflections• Competitive performance already: 26% WER
![Page 34: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/34.jpg)
CarnegieMellon
Vocabulary Growth Per Broadcast
Broadcast News System
0
5000
10000
15000
20000
25000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
News Broadcasts
Wo
rds
![Page 35: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/35.jpg)
CarnegieMellon
Serbo-Croatian BN Speech Performance
Broadcast News System
73.6
43.6
36.0 29.5
26.0
0
10
20
30
40
50
60
70
80
August September October December January
WE
R [
%]
Language Normalization
Hypothesis DrivenLexicon Adaptation
![Page 36: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/36.jpg)
CarnegieMellon
Informedia dataset and infrastructure as a benchmarkable testbed for research in spoken language and visual documents
Potential for establishing on-line public domain video archive
• e.g. all government produced video for training and public information
• fully indexed and searchable
Proposed National Research Data Testbed
![Page 37: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/37.jpg)
CarnegieMellon
Project Genoa Contributions
• Code to extract video to place in a CIP
• Processing changes to index I-frames
• Code to run Web browser to play the MPEG segment
• Working towards a generic Web-based interface
• Other CMU: Meeting browser
• Full access to client but not full source code
![Page 38: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/38.jpg)
CarnegieMellon
CMUInformedia
Server
CMUInformedia
Client(NOD)
CrisisBrowse ClientSpIKE/Visage/NOD?
Netscape
CrisisBrowseServer
MassStorage
CIPServer
?
Starlight
BWD
JTFPlanner
MIDB(S)
Sybase
MDITDS(S)
Sybase
JEDS
OSIS(U)
CIAFactbook
(U)
JANES(U)
Intelink-S
Pseudo-TS/SCI SecretUnclassified
WWW(U)
Starlight
?
DIAWash, DC
Pittsburgh, PA
Internet
CIALangley, VA
HPKB(U)
SIPRNETDISN LES
JEDS SAICSan Diego, CA
SAICSan Diego, CA
mpegjpegtxthtml
mpegjpegtxthtml
DB?DB?
DB?
Data Source PictureData Source PictureData Source PictureData Source Picture
DIAL-IN
NetworkNeighborhood
http
?
DARPA TIEArlington, VA
WorldEnergy
Database(U)
Access
![Page 39: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/39.jpg)
CarnegieMellon
• Complete full-function Web interface
• Foreign language system unification
• S-C language models for improved query and selection
• S-C segmentation
• System completeness, robustness
• Should we pursue?
– Regular capture & processing
– Delivery to testbeds
Future Plans - Near Term
![Page 40: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/40.jpg)
CarnegieMellon
Future Plans - Long Term
• NSA’s formal evaluation will help guide modifications and new features
• Other languages - Korean? Chinese?
• Translation? Translation tools?
• Named entity extraction: people, places, faces
• Geospatial correlation and visualization
• More content and multiple sources
• Multidocument summarization
![Page 41: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/41.jpg)
CarnegieMellon
![Page 42: Carnegie Mellon NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. Wactlar Digital Video Library](https://reader038.vdocuments.net/reader038/viewer/2022103022/56649d7f5503460f94a621f1/html5/thumbnails/42.jpg)
CarnegieMellon
Digital Video LibraryDigital Video Library