searching video
DESCRIPTION
Searching Video. Ananth Sankar, Distinguished Engineer, Cisco, [email protected]. Outline. Value of video to enterprise Video search today Audio/video analytics for video search/navigation Accuracy of analytics Summary. The Value of Video Content in the Enterprise. - PowerPoint PPT PresentationTRANSCRIPT
Cisco Confidential 1Cisco Confidential 1Cisco Confidential 1Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 1
Ananth Sankar, Distinguished Engineer, Cisco, [email protected]
Searching Video
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
Outline• Value of video to enterprise
• Video search today
• Audio/video analytics for video search/navigation
• Accuracy of analytics
• Summary
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
The Value of Video Content in the Enterprise
• Organizational & Executive Communication
• Training/Meetings
• Internal/External Events
• Marketing
Video is a valuable marketing asset. People who view videos on Cisco.com:
- View 44% more pages and are 41% more likely to return to Cisco.com
- Are five times more likely to click-through on a blog post containing a video
- Are twice as likely to click-through on email containing a video
In the first 2.5 years Cisco used video content it conducted 14,000 video training sessions.
- Saved $57M in travel costs for trainers
- Saved $21M in productivity time for the trainer
- Saved 62,000 hours of productivity time for the attendees who didn’t have to travel to sessions
- 38 % of videos on the Cisco’s portal in 2010 provided organizational updates
- Recorded communications allow broader reach to global teams
- Employees can interact with executives via comments
Recording sessions at events such as Cisco Live and the Annual Sales meeting has
expanded the audience sizes by thousands of more attendees.
Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 4
Cisco employees view more than 85,000 videos on demand each month
On YouTube alone, there are over 18 million “how-to” videos
Over 4 million visitors per month take a lesson on KhanAcademy.com
The Volume of Video Content is Growing
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
Enterprise Video Content Length Going Beyond the Average Attention Span
Enterprises generate videos that are between 30-60 minutes long.
Average YouTube videos are ~6.4 minutes long.
Ted Talks are 18 minutes long. “It’s long enough to be serious and short enough to hold people’s attention” - Chris Anderson, Ted creator
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6
We Need New Ways to Effectively Engage with Video Content
1. Find video content that is relevant to us- Today rely on manual tagging & titles- Search isn’t effective
2. Efficiently navigate video content - Today navigation is linear
Cisco Confidential 7© 2011 Cisco and/or its affiliates. All rights reserved.
Video Search Today
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8
Video Search Today
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
Professor Ng talks about “parametric learning
algorithms” in a Stanford lecture on
machine learning
Suppose we want to find this video based on content
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
The obvious search does not work
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11
Needed to add this term
Extra terms must be added to find the video
Even then, we are left with 2
lectures to sort through
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12
Linear Playback – Play, Fast Forward, Rewind
Can’t find“parametric learning
algorithms” buried in the video!
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13
Video vs. Text Search
Video Search Text Search
Keywords must be entered in title or description
Keywords already exist in the text
Search depends on user’s SEO skills Search depends mainly on document content
Content is opaque – hidden in media Text can be quickly browsed
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
How can we make video search better?
Automatically extract information and convert it to useful metadata
Cisco Confidential 15© 2011 Cisco and/or its affiliates. All rights reserved.
Video Analytics Technology
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
Information Contained In Videos
• Speakers
• Speech
• Text
• People
• Pictures or slides
• Behaviors
• Sentiments
• Events
• Landmarks
• Places
Extracting this information to create metadata enables much better video search and navigation.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
Analytics System Overview
1. Video ingested into analytics engineThrough recording workflow or the video portal
2. Analytics engine processes videoGenerates speaker and key-phrase metadata
3. Augmented video available on portalIndexed using metadata
Player also augmented with metadata
Metadata extraction- Speech recognition- Speaker recognition- Slide detection, ….
Index
Video portal and video
player
queries
Video pointerand metadata
Index videowith metadata
Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 18
Demo
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
Plain old video player
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
Video player with speakers and keywords
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
Expected Speech Recognition Accuracy
Task Word Accuracy (%)100% - WER
Precision/recall of key-phrases
Broadcast news (prepared speech)
> 85% > 85%
Meetings, interviews, lectures (somewhat conversational)
60 - 75% > 70%
Youtube (varied)
30-90% 40-90%
Inaccurate recognition can still support some key-phrase applications
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
Many Factors Influence Accuracy
Speaking styles- Conversational- Presentation style
Acoustic conditions- Clean- Noisy, reverberant
Accents- Native- Non-native
Domain-specific language- Vocabularies- Word patterns
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
Speech Recognition Accuracy Improvement
• Keywords are important words within a domain
• Out-of-box model may not know these words
• Adapting the system to learn language for specific domains increases accuracy
• Adapted domain models can give > 70% precision and recall of keywords
• Challenges with adapting- Acquiring sample training data while maintaining privacy- Handling new vocabulary items, e.g., acronyms- Multiple accents within the same domain or customer
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
How Adaptation Works
Models
Adapted models
Adaptation algorithm
Training data
Domain data
Training algorithm
Training Adaptation
Generic Vocabulary:
- Health care- Finance- Economy- Education
Domain Vocabulary:
- Pancreatic cancer- Metastatic melanoma- Radiation treatment- Chemotherapy
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25
What else is possible?
• Transcripts
• Closed captioning
• Topics
• Summaries
• Sentiments
• Translation
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
Closing
• Video provides a very rich experience, but is an opaque media
• Keywords, phrases and speakers are examples of useful metadata
• Accuracy is impacted by the large variation in data
• Adaptation is a useful technique to improve accuracy
• Audio and video analytics make video as easy to find & navigate as text
Thank you.