question answering from errorful multimedia streams aquaint pi meeting – june 2002
DESCRIPTION
Digital Video Library. Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002 Howard D. Wactlar Carnegie Mellon University, USA. Outline. Goals for QA from multimedia Background Informedia Information extraction Determining answer information - PowerPoint PPT PresentationTRANSCRIPT
Question Answering from Errorful Multimedia Streams
AQUAINT PI Meeting – June 2002
Howard D. WactlarCarnegie Mellon University, USA
Digital Video LibraryDigital Video Library
Outline
• Goals for QA from multimedia
• Background- Informedia
- Information extraction
• Determining answer information
• Presenting the answer and follow-up
Why is Multimedia Important
• TV and radio broadcasts record human events across the globe
• Broadcast interviews, analysis and opinions created globally provide varied interpretive perspectives and context
• Images of people, events, maps and charts provide additional content not conveyed orally
- May be correlated with the spoken words
• Some pictures are worth a thousand words
Annual Video and Audio Production
Commercial
• 4500 motion pictures -> 9,000 hours/year (4.5 TB)
• 33,000 TV stations x 4 hrs/day -> 48,000,000 hrs/yr (24,000 TB)
• 44,000 radio stations x 4 hrs/day -> 65,500,000 hrs/yr (3,275 TB)
Personal
• Photographs: 80 billion images -> 410,000 TB/yr
• Home videos: 1.4 billion tapes -> 300,000 TB/yr
• X-rays: 2 billion -> 17,000 TB/yr
Surveillance
• Airports: 14,000 terminals x 140 cameras x 24 hrs/day -> 48 M hrs/day
Background
REQUIREMENTS:
- Automated process for information extraction from video
- Full-content search and retrieval from any spoken language and visual document
Establishment of large video libraries as a network searchable information resource
Mission: Enable Search and Discovery in the Video Medium
APPROACH: Integration of machine speech, image and natural language
understanding for library creation and exploration
Exploit operational Informedia DVL infrastructure and technology
Indexing
Relevant Result SetRelevant Result Set
Requested Segment Requested Segment or Summarizationor Summarization
Information Exploration & DiscoveryInformation Exploration & DiscoveryONLINEONLINE
MultimodalMultimodalQueriesQueries
AnalystAnalyst
BrowsingBrowsingand Query and Query RefinementRefinement
Information Collection & AnalysisInformation Collection & AnalysisOFFLINEOFFLINE
Indexed DatabaseIndexed SegmentedTranscript Compressed Audio/Video& Images
Distribution To Users
Processing
Entity ExtractionFace, OCR Text Recognition
1010
011
100 01 10
Surveillance Broadcast TV Radio
Digital Encoding
ImageAnalysis
Speech Analysis
Informedia System Architecture
Related Language Processing Work
• MUC, DUC, TREC especially QA track- Pronoun and Anaphora resolution
- Part-of-speech tagging
- Fact extraction
- Summarization
- Question-answering
…Electronic text focus
Why is Multimedia Hard
• It’s a fundamentally linear, temporal medium
• Speech, image and language understanding are all errorful, ambiguous and incomplete
• Information must be time-synchronized and correlated across modalities for both produced and natural video
• Verbal content lacks:- sentence boundaries,
- punctuation,
- capitalization …that enables a syntactic analysis
• Image recognition w/o known context is very limited
• Many errors from many sources!
Why We Think the Problems are Trackable
• Lot’s of data enables LEARNING systems
• Have shown complete or perfect information is not necessary
• Utilize multiple sources of information jointly: - text, image, audio, web text and databases
Research Focus
• Determining the answer information- Resolving co-references
- Discovering semantic relations
- Learning Information flow
- Hardening uncertain information
• Organizing and presenting the answer result- Text summaries
- Augmenting contextual material
- Maps, charts and images to allow follow-up questions
- Explicit representation of uncertainty
Resolving Co-references
• When is the same person mentioned (or seen, or identified)
• Places referenced (in words, on signs, on maps)
• Organizations cited (verbally, on signage, in charts)
• Requires:- Pronoun resolution
- Merge multiple spellings, abbreviations and contractions
- Merge across media (OCR, audio, text, faces)
Mining Links and Learning Semantic Relations
• Visualize co-occurrence in documents, in location, in time- Location can be variably sized regions
- Times can be arbitrary periods
• Finding semantic roles for related named entities- Dr. X is CEO of company Y
Active Hardening of Evidence
• Extracted information is noisy
• Acquire new supporting or falsifying evidence from other sources (web)
- On-demand or
- Automatically when original evidence is weak
…Result is higher fidelity information
Learning Information Flow
Tightly correlated
Information flow
Conditional information flow3-6 days
CNN ABC
Radio Duetsch Welle
(Germany)
Wiretap 1(Saudi Arabia)
HiddenSource 3
3-6 days
HiddenSource 4
RadioTehran(Iran)
Lifestyle news
HiddenSource 1
HiddenSource 2
News onMiddle East,
407 days
Learning Information Flow
• Where did a fact originate?
• Multiple sources report facts over time, with small changes- E.g. Different newspapers get the same story from AP or
Reuters source. Story ‘looks’ different.
- Imagery frequently is reused as well
• Columbia’s Newsblaster exploits this idea for summarization of the core story sentences
Integrated Analysis Environment
• Summarize multimedia information visually and textually
• Allow explicit display of and control over acceptable level of uncertainty
• Show link structure of entities and relations
• Interactive visualization for drill-down and follow-up
Strategic Advantages of Multimedia Analysis and Response
• Collect Large Amounts of Data
• Learning Approaches
• Leverage across media types
• Perfection is not necessary (80% solution may be ok)
• User in the loop filters remaining errors
• Effective interfaces and visualizations
Digital Video LibraryDigital Video Library