video hyperlinking tutorial (part c)

Information Technologies Institute Centre for Research and Technology Hellas Video Hyperlinking Part C: Insights into Hyperlinking Video Content Benoit Huet EURECOM (Sophia-Antipolis, France) IEEE ICIP’14 Tutorial, Oct. 2014 ACM MM’14 Tutorial, Nov. 2014

Upload: linkedtv

Post on 20-Aug-2015




0 download


Page 1: Video Hyperlinking Tutorial (Part C)

Information Technologies Institute Centre for Research and Technology Hellas

Video Hyperlinking

Part C: Insights into Hyperlinking Video Content

Benoit Huet EURECOM

(Sophia-Antipolis, France)

IEEE ICIP’14 Tutorial, Oct. 2014 ACM MM’14 Tutorial, Nov. 2014

Page 2: Video Hyperlinking Tutorial (Part C)

3.2 Information Technologies Institute Centre for Research and Technology Hellas

Overview • Introduction – overall motivation • The General Framework • Indexing Video for Hyperlinking

– Apache Solr • Evaluation Measures • Challenge 1: Temporal Granularity

– Feature Alignment and Index Granularity

• Challenge 2: Crafting the Query – Selecting Keywords – Selecting Visual Concepts

• Hyperlinking Evaluation: MediaEval S&H • Hyperlinking Demos and LinkedTV Video • Conclusion and Outlook

• Additional Reading

Page 3: Video Hyperlinking Tutorial (Part C)

3.3 Information Technologies Institute Centre for Research and Technology Hellas


• Why Video Hyperlinking? – Linking multimedia documents with related

content – Automatic Hyperlink Creation

• Different from Search (no user query) • Query automatically crafted from source document


• Outreach – Recommendation system – Second screen applications

Page 4: Video Hyperlinking Tutorial (Part C)

3.4 Information Technologies Institute Centre for Research and Technology Hellas

Insights in Hyperlinking

• Hyperlinking – Creating “links” between media

• Video Hyperlinking – video to video – video fragment to video fragment

Page 5: Video Hyperlinking Tutorial (Part C)

3.5 Information Technologies Institute Centre for Research and Technology Hellas

Characterizing - Video

• Video – Title / Episode – Cast – Synopsis / Summary – Broadcast channel – Broadcast date – URI – Named Entities

Page 6: Video Hyperlinking Tutorial (Part C)

3.6 Information Technologies Institute Centre for Research and Technology Hellas

Characterizing – Video Fragment

• Video Fragment – Temporal location (Start and End) – Subtitles / Transcripts – Named Entities – Visual Concepts – Events – OCR – Character / Person

Page 7: Video Hyperlinking Tutorial (Part C)

3.7 Information Technologies Institute Centre for Research and Technology Hellas

General framework

Video Dataset Segmentation Feature Extraction Indexing

Video Anchor Fragment

Feature Selection Retrieval Personalisation

• Index Creation

• Hyperlinking

Page 8: Video Hyperlinking Tutorial (Part C)

3.8 Information Technologies Institute Centre for Research and Technology Hellas

Search and Hyperlinking Framework

BroadCast Media

Metadata (Subtitles,..) Lucene/Solr

Media DB

Solr Index

Content Analysis

Title Cast

Channel Subtitles

Transcript 1 Transcript 2

… Shots Scene OCR

Visual concepts

Page 9: Video Hyperlinking Tutorial (Part C)

3.9 Information Technologies Institute Centre for Research and Technology Hellas

Indexing Video for Hyperlinking

• Indexing systems: – Apache Lucene/Solr – TerrierIR – ElasticSearch – Xapian – …

• Popular for text-based indexing/search/retrieval • How to use index video for hyperlinking?

Page 10: Video Hyperlinking Tutorial (Part C)

3.10 Information Technologies Institute Centre for Research and Technology Hellas

Solr Indexing

• Solr engine (Apache Lucene) for data indexing – Index at different temporal granularities (shot,

scene, sliding window) – Index different features at each temporal

granularity (metadata, ocr, transcripts, visual concepts)

• All information stored in a unified structured way – flexible tool to perform search and hyperlinking

Page 11: Video Hyperlinking Tutorial (Part C)

3.11 Information Technologies Institute Centre for Research and Technology Hellas

Solr indexing – Sample Schema

• Schema = structure of document using fields of different types

• Fields: – name – Type (see next slide) – indexed=“true|false” – stored=“true|false” – multiValued=“true|false" – required=“true|false"

Page 12: Video Hyperlinking Tutorial (Part C)

3.12 Information Technologies Institute Centre for Research and Technology Hellas

Solr indexing – Sample Schema

• Fields type: – text (analysed, stopword removal, etc…) – string (not analysed) – date – float – int

• uniqueKey – unique document id

Page 13: Video Hyperlinking Tutorial (Part C)

3.13 Information Technologies Institute Centre for Research and Technology Hellas

Solr indexing – Sample Schema

<?xml version="1.0" encoding="UTF-8" ?> <schema name="subtitles" version="1.5"> <fields> <field name="videoId" type="string" indexed="true" stored="true" multiValued="false" required="true"/> <field name="serie_title" type="text_ws" indexed="false" stored="true" multiValued="false" required="true" /> <field name="short_synopsis" type="text_en_splitting" indexed="false" stored="true" multiValued="false" required="true" /> <field name="episode_title" type="text_en_splitting" indexed="false" stored="true" multiValued="false" required="true" /> <field name="channel" type="text_ws" indexed="false" stored="true" multiValued="false" required="true" /> <field name="cast" type="text_en_splitting" indexed="false" stored="true" multiValued="false" required="true" /> <field name="description" type="text_en_splitting" indexed="false" stored="true" multiValued="false" required="true" /> <field name="synopsis" type="text_en_splitting" indexed="false" stored="true" multiValued="false" required="true"/> <field name="subtitle" type="text_en_splitting" indexed="true" stored="true" multiValued="false" required="true"/> <field name="duration" type="int" indexed="false" stored="true" multiValued="false" required="true"/> <field name="shots_number" type="int" indexed="false" stored="true" multiValued="false" required="true"/> <field name="text" type="text_en_splitting" indexed="true" stored="false" multiValued="true" required="true"/> <field name="names" type="text_ws" indexed="true" stored="false" multiValued="true" required="true"/> <field name="keywords" type="text_ws" indexed="true" stored="false" multiValued="true" required="true"/> <field name="_version_" type="long" indexed="true" stored="true"/> </fields> <uniqueKey>videoId</uniqueKey> …

Page 14: Video Hyperlinking Tutorial (Part C)

3.14 Information Technologies Institute Centre for Research and Technology Hellas

Solr Indexing – Sample Document <?xml version="1.0" encoding="UTF-8"?> <add> <doc> <field name="videoId">20080506_183000_bbcfour_pop_goes_the_sixties</field> <field name="subtitle">SCREAMING APPLAUSE Subtitles by Red Bee Media Ltd E-mail [email protected] HELICOPTER WHIRRS TRAIN SPEEDS SIREN WAILS ENGINE REVS Your town, your street, your home - it's all in our database. New technology means it's easyto pay your TV licence and impossible to hide if you don't. KNOCKING</field> <field name="serie_title">Pop Goes the Sixties</field> <field name="short_synopsis">A colourful nugget of pop by The Shadows, mined from the BBC's archive.</field> <field name="description">The Shadows play their song Apache in a classic performance from the BBC's archives.</field> <field name="duration">300</field> <field name="episode_title">The Shadows</field> <field name="channel">BBC Four</field> <field name="cast" /> <field name="synopsis" /> <field name="shots_number">14</field> <field name="keywords">SCREAMING SPEEDS HELICOPTER WHIRRS REVS KNOCKING WAILS ENGINE SIREN APPLAUSE TV TRAIN Ltd E-mail Bee Subtitles Media Red</field> </doc> </add>

Page 15: Video Hyperlinking Tutorial (Part C)

3.15 Information Technologies Institute Centre for Research and Technology Hellas

Solr Indexing

• Analysis step: – Dependent on each type – Automatically performed: tokenization, removing

stop words, etc… – It creates tokens that are added to the index

• inverted index • query is made on tokens

Page 16: Video Hyperlinking Tutorial (Part C)

3.16 Information Technologies Institute Centre for Research and Technology Hellas

Solr Query

• Very easy with web interface

Page 17: Video Hyperlinking Tutorial (Part C)

3.17 Information Technologies Institute Centre for Research and Technology Hellas

Indexing Video Fragments with Solr

• Demo


Page 19: Video Hyperlinking Tutorial (Part C)

3.19 Information Technologies Institute Centre for Research and Technology Hellas

Evaluation measures

• Search – Mean Reciprocal Rank (MRR): assesses the rank

of the relevant segment

Page 20: Video Hyperlinking Tutorial (Part C)

3.20 Information Technologies Institute Centre for Research and Technology Hellas

Evaluation measures

• Search – Mean Reciprocal Rank (MRR): assesses the rank

of the relevant segment – Mean Generalized Average Precision (mGAP):

takes into account starting time of the segment – Mean Average Segment Precision (MASP):

measures both ranking and segmentation of relevant segments

Page 21: Video Hyperlinking Tutorial (Part C)

3.21 Information Technologies Institute Centre for Research and Technology Hellas

Evaluation measures

• Hyperlinking – Precision at rank n: how many relevant segment

appear in the top n results – Mean Average Precision (MAP)

– taking temporal segment to target offset into account

Aly, R., Ordelman, R. J.F., Eskevich, M., Jones, G. J.F., Chen, S. Linking Inside a Video Collection - What and How to Measure? In Proceedings of ACM WWW International Conference on World Wide Web Companion. ACM, Rio de Janeiro, Brazil, 457-460.

Page 22: Video Hyperlinking Tutorial (Part C)

3.22 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 1: Temporal Granularity

Content Analysis

BroadCast Media

Metadata (Subtitles,..) Lucene/Solr

Media DB

Solr Index

Program level: title, cast,… Audio-frame level: transcripts, subtitles…

Shot/Keyframe level: visual concepts, OCR

Page 23: Video Hyperlinking Tutorial (Part C)

3.23 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 1: Temporal Granularity

• Aligning features with different temporal granularity – Shots and Scenes

– Aligned by construction

Subtitles Shots


Page 24: Video Hyperlinking Tutorial (Part C)

3.24 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 1: Temporal Granularity

• Aligning features with different temporal granularity – Subtitles and Scenes


Subtitles Shots


Page 25: Video Hyperlinking Tutorial (Part C)

3.25 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 1: Temporal Granularity

• Aligning features with different temporal granularity – Subtitles and Scenes

– Alignment based on feature start

Subtitles Shots


Page 26: Video Hyperlinking Tutorial (Part C)

3.26 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 1: Temporal Granularity

• Aligning features with different temporal granularity – Subtitles and Scenes

– Alignment based on feature end

Subtitles Shots


Page 27: Video Hyperlinking Tutorial (Part C)

3.27 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 1: Temporal Granularity

• Aligning features with different temporal granularity – Subtitles and Scenes

– Feature duplication (bias?)

Subtitles Shots


Page 28: Video Hyperlinking Tutorial (Part C)

3.28 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 1: Temporal Granularity

• Aligning features with different temporal granularity – Subtitles and Scenes

– Alignment based on temporal overlap

Subtitles Shots


> <

Page 29: Video Hyperlinking Tutorial (Part C)

3.29 Information Technologies Institute Centre for Research and Technology Hellas

Performance Impact - Alignment

Scene-Subtitle-End Scene-Subtitle-Begin Scene-Subtitle-Duplicate Scene-Subtitle-Overlap

Page 30: Video Hyperlinking Tutorial (Part C)

3.30 Information Technologies Institute Centre for Research and Technology Hellas

Performance Impact - Granularity

Page 31: Video Hyperlinking Tutorial (Part C)

3.31 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 1: Discussion

• Subtitle to scene Alignment: – Similar performance across approaches – Slight advantage to align using segment start

• Granularity Impact

– Shots are too short – Scenes better reflect user’s requirements

Page 32: Video Hyperlinking Tutorial (Part C)

3.32 Information Technologies Institute Centre for Research and Technology Hellas

Let’s Hyperlink!

Content Analysis

BroadCast Media

Metadata (Subtitles,..) Lucene/Solr

Media DB

Solr Index

<anchor> <anchorId>anchor_1</anchorId> <fileName>v20080511_203000_bbctwo_TopGear</fileName> <startTime>13.07</startTime> <endTime>14.03</endTime> </anchor>

Page 33: Video Hyperlinking Tutorial (Part C)

3.33 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2 : Crafting the Query

Content Analysis

BroadCast Media

Metadata (Subtitles,..) Lucene/Solr

Media DB

Solr Index

<anchor> <anchorId>anchor_1</anchorId> <fileName>v20080511_203000_bbctwo_TopGear</fileName> <startTime>13.07</startTime> <endTime>14.03</endTime> </anchor>

Query crafted from the anchor Extract text from subtitles aligned with the anchor Identify relevant visual concepts from the subtitles Select visual concepts occurring in the anchor

Page 34: Video Hyperlinking Tutorial (Part C)

3.34 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2a : Keyword Selection

• Long anchor may generate long text query • Important Keyword (or Entities) should be


Page 35: Video Hyperlinking Tutorial (Part C)

3.35 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2a : Keyword Selection

• Keyword extraction based on term frequency-inverse document frequency (TF IDF) approach

• IDF computed on English news, with curated stop words (~200 entries)

• Incorporates Snowball stemming (as part of the Lucene project)

• 50 weighted keywords per documents, singletons removed

• Keyword Gluing for frequencies larger than 2 S. Tschöpel and D. Schneider. A lightweight keyword and tag-cloud retrieval´algorithm for automatic speech

recognition transcripts. In Proc. ISCA, 2010, Japan.

Page 36: Video Hyperlinking Tutorial (Part C)

3.36 Information Technologies Institute Centre for Research and Technology Hellas

Keyword Selection Performance

Page 37: Video Hyperlinking Tutorial (Part C)

3.37 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2b: Visual concept generality

Content Analysis

BroadCast Media

Metadata (Subtitles,..) Lucene/Solr

Media DB

Solr Index

No training data for visual concepts

Use 151 visual concept detectors trained on TrecVid

Page 38: Video Hyperlinking Tutorial (Part C)

3.38 Information Technologies Institute Centre for Research and Technology Hellas

151 Visual Concepts (TrecVid 2012) • 3_Or_More_People • Actor • Adult • Adult_Female_Human • Adult_Male_Human • Airplane • Airplane_Flying • Airport_Or_Airfield • Anchorperson • Animal • Animation_Cartoon • Armed_Person • Athlete • Baby • Baseball • Basketball • Beach • Bicycles • Bicycling • Birds • Boat_Ship • Boy

• Building • Bus • Car • Car_Racing • Cats • Cattle • Chair • Charts • Child • Church • City • Cityscape • Classroom • Clouds • Construction_Vehicles • Court • Crowd • Dancing • Daytime_Outdoor • Demonstration_Or_Protest • Desert • Dogs

• Emergency_Vehicles • Explosion_Fire • Face • Factory • Female-Human-Face-Closeup • Female_Anchor • Female_Human_Face • Female_Person • Female_Reporter • Fields • Flags • Flowers • Football • Forest • Girl • Golf • Graphic • Greeting • Ground_Combat • Gun • Handshaking • Harbors

• Helicopter_Hovering • Helicopters • Highway • Hill • Hockey • Horse • Hospital • Human_Young_Adult • Indoor • Insect • Kitchen • Laboratory • Landscape • Machine_Guns • Male-Human-Face-Closeup • Male_Anchor • Male_Human_Face • Male_Person • Male_Reporter • Man_Wearing_A_Suit • Maps • Meeting • …

Page 39: Video Hyperlinking Tutorial (Part C)

3.39 Information Technologies Institute Centre for Research and Technology Hellas

Solr Query

• How to include the visual concepts in Solr? – Using float typed fields – <field name=“Animal" type=“float" indexed="true"

stored=“true" multiValued=“false" required="true"/>

– <field name=“Animal">0.74</field>

– <field name=“Building">0.12</field>

• Query can be made through http request – http://localhost:8983/solr/collection_mediaEval/s


Page 40: Video Hyperlinking Tutorial (Part C)

3.40 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2b: Visual concept detectors confidence

Content Analysis

BroadCast Media

Metadata (Subtitles,..) Lucene/Solr

Media DB

Solr Index

No training data for visual concepts

Use 151 visual concept detectors trained on TrecVid

Unknown performance

Page 41: Video Hyperlinking Tutorial (Part C)

3.41 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2b: Visual concept detector confidence

• 100 top images for the concept “Animal” • 58 out of 100 are manually evaluated as valid • Confidence w = 0,58

Page 42: Video Hyperlinking Tutorial (Part C)

3.42 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2c: Map keywords to visual concepts


















WordNet Mapping ke



visual concepts

Page 43: Video Hyperlinking Tutorial (Part C)

3.43 Information Technologies Institute Centre for Research and Technology Hellas

Mapping keywords to visual concepts

• Concepts mapped to the keyword "Castle” • Semantic similarity computed using the “Lin”


Concept Windows Plant Court Church Building

β 0.4533 0.4582 0.5115 0.6123 0.701

Page 44: Video Hyperlinking Tutorial (Part C)

3.44 Information Technologies Institute Centre for Research and Technology Hellas

Fusing Text and Visual Scores

Text-based scores Lucene indexing

Visual-based scores

WordNet similarity

Selected concepts

Ranking Fusion

One score for each scene (t)

fi = tiα + vi


One score for each scene (v): Computed from the scores of the selected concepts for each scene

viq = wc × vsi


c∈C 'q

Page 45: Video Hyperlinking Tutorial (Part C)

3.45 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2c: Performance Results

• Low impact of visual concept detector confidence (w) • Significant improvement can be achieved by combining only

mapped concepts with θ ≥ 0.3. • Best performance is obtained when θ ≥ 0.8 (gain ≈ 11-12%).

w=1.0 w=confidence(c)

B. Safadi, M. Sahuguet and B. Huet, When textual and visual information join forces for multimedia retrieval, ICMR 2014, April 1-4, 2014, Glasgow, Scotland

Page 46: Video Hyperlinking Tutorial (Part C)

3.46 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2d: Visual Concept Selection

• 151 Visual Concept scores characterize each shots

• Anchors may refer to 1 or more shots • Selection of relevant shots for the anchors

using a threshold

• For those selected visual concepts identify a good search threshold

Page 47: Video Hyperlinking Tutorial (Part C)

3.47 Information Technologies Institute Centre for Research and Technology Hellas

Visual Concept Selection Performance


Solr queriesConcepts selection 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1 0.0892 0.0316 0.0558 0.0842 0.1183 0.168 0.1914 0.1919 0.18980.2 0.1741 0.1366 0.1152 0.1312 0.1503 0.1777 0.1922 0.1919 0.18980.3 0.184 0.1819 0.1806 0.1652 0.1731 0.1848 0.1927 0.1919 0.18980.4 0.1874 0.1883 0.1914 0.1868 0.1889 0.1897 0.1937 0.1919 0.18980.5 0.1875 0.1874 0.1886 0.1928 0.1937 0.1896 0.1939 0.1919 0.18980.6 0.1892 0.1884 0.1886 0.1913 0.1931 0.1946 0.1952 0.1923 0.18980.7 0.1901 0.1901 0.1901 0.191 0.1917 0.1943 0.1948 0.1905 0.18910.8 0.1935 0.1935 0.1935 0.1943 0.1947 0.1959 0.1954 0.1964 0.190.9 0.1946 0.1946 0.1946 0.1952 0.1953 0.1962 0.1961 0.1958 0.1945

Page 48: Video Hyperlinking Tutorial (Part C)

3.48 Information Technologies Institute Centre for Research and Technology Hellas

Visual Concept Selection Performance

Page 49: Video Hyperlinking Tutorial (Part C)

3.49 Information Technologies Institute Centre for Research and Technology Hellas

Visual Concept Selection Performance

• Precision@5

Solr queriesConcepts selection 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1 0.5533 0.26 0.3133 0.46 0.5467 0.66 0.7 0.7333 0.73330.2 0.72 0.6667 0.5267 0.6267 0.64 0.7 0.7067 0.7333 0.73330.3 0.6867 0.72 0.7067 0.6467 0.7 0.7267 0.7067 0.7333 0.73330.4 0.7 0.7 0.7267 0.6933 0.7133 0.7467 0.7133 0.7333 0.73330.5 0.7133 0.7133 0.7067 0.72 0.74 0.74 0.7133 0.7333 0.73330.6 0.7267 0.7267 0.7267 0.7333 0.7333 0.74 0.7133 0.7333 0.73330.7 0.72 0.72 0.72 0.7267 0.7333 0.7333 0.7133 0.7333 0.73330.8 0.74 0.74 0.74 0.74 0.74 0.7533 0.7467 0.74 0.740.9 0.74 0.74 0.74 0.74 0.74 0.7533 0.7533 0.7533 0.74

Page 50: Video Hyperlinking Tutorial (Part C)

3.50 Information Technologies Institute Centre for Research and Technology Hellas

Visual Concept Selection Performance

Page 51: Video Hyperlinking Tutorial (Part C)

3.51 Information Technologies Institute Centre for Research and Technology Hellas

Visual Concept Selection Performance

• Precision@10 Solr queriesConcepts selection 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1 0.4033 0.1667 0.2333 0.3233 0.4367 0.55 0.6033 0.6167 0.62670.2 0.5733 0.5 0.43 0.4967 0.51 0.5733 0.6067 0.6167 0.62670.3 0.6033 0.5733 0.5767 0.57 0.5567 0.5967 0.6067 0.6167 0.62670.4 0.59 0.5867 0.6 0.59 0.6 0.6067 0.6067 0.6167 0.62670.5 0.59 0.59 0.5967 0.6 0.59 0.6 0.61 0.6167 0.62670.6 0.61 0.61 0.61 0.61 0.6067 0.5933 0.61 0.6133 0.62670.7 0.61 0.61 0.61 0.61 0.61 0.5967 0.6133 0.6133 0.62330.8 0.6167 0.6167 0.6167 0.62 0.6233 0.6133 0.6233 0.6267 0.62330.9 0.63 0.63 0.63 0.6333 0.6333 0.63 0.6367 0.6367 0.6333

Page 52: Video Hyperlinking Tutorial (Part C)

3.52 Information Technologies Institute Centre for Research and Technology Hellas

Visual Concept Selection Performance

Page 53: Video Hyperlinking Tutorial (Part C)

3.53 Information Technologies Institute Centre for Research and Technology Hellas

Visual Concept Selection Performance

• Precision@20 Solr queriesConcepts selection 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1 0.2683 0.105 0.17 0.2267 0.3033 0.4017 0.44 0.4483 0.440.2 0.4167 0.345 0.3033 0.3383 0.3933 0.4317 0.44 0.4483 0.440.3 0.435 0.4333 0.4317 0.405 0.4233 0.4417 0.44 0.4483 0.440.4 0.4433 0.4367 0.4433 0.4433 0.4433 0.4433 0.4417 0.4483 0.440.5 0.445 0.4417 0.4417 0.4467 0.4583 0.4483 0.4417 0.4483 0.440.6 0.4467 0.445 0.445 0.45 0.4567 0.4483 0.4417 0.4483 0.440.7 0.4533 0.4533 0.4533 0.455 0.4583 0.4583 0.4417 0.4483 0.43830.8 0.4517 0.4517 0.4517 0.4517 0.4533 0.4517 0.445 0.4483 0.440.9 0.45 0.45 0.45 0.45 0.45 0.4483 0.4483 0.4483 0.4483

Page 54: Video Hyperlinking Tutorial (Part C)

3.54 Information Technologies Institute Centre for Research and Technology Hellas

Visual Concept Selection Performance

Page 55: Video Hyperlinking Tutorial (Part C)

3.55 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2e: Combining Visual Concept Selection and Fusion

• Logic (AND/OR) vs Fusion (weighted sum) • Text vs Visual Concepts weight • Visual Concept selection threshold

Page 56: Video Hyperlinking Tutorial (Part C)

3.56 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2e: Combining Visual Concept Selection and Fusion

• MAP Text vs Visual concept weight Visual Concept Selection Threshold

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0,1 0,227 0,232 0,233 0,233 0,233 0,233 0,233 0,233 0,232

0,2 0,206 0,228 0,23 0,231 0,232 0,231 0,231 0,231 0,233

0,3 0,185 0,219 0,225 0,227 0,228 0,228 0,229 0,23 0,232

0,4 0,168 0,21 0,22 0,225 0,227 0,228 0,229 0,23 0,232

0,5 0,138 0,201 0,215 0,221 0,223 0,226 0,226 0,23 0,231

0,6 0,138 0,199 0,213 0,219 0,223 0,225 0,227 0,23 0,232

0,7 0,132 0,197 0,213 0,219 0,223 0,228 0,229 0,232 0,233

0,8 0,091 0,139 0,169 0,186 0,196 0,204 0,213 0,222 0,231

0,9 0,195 0,206 0,213 0,218 0,22 0,221 0,224 0,228 0,231

Page 57: Video Hyperlinking Tutorial (Part C)

3.57 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2e: Combining Visual Concept Selection and Fusion




















Text vs Visual Concept Fusion Weight

Visual Concept Selection Threshold










Page 58: Video Hyperlinking Tutorial (Part C)

3.58 Information Technologies Institute Centre for Research and Technology Hellas

Challenge 2: Discussion

• Keyword selection is important • Mapping text with visual concepts isn’t

straight forward – But can boost performance

• Visual concept detector confidence has limited effect on performance

• Selecting visual concepts from the anchor is easier that mapping from text

Page 59: Video Hyperlinking Tutorial (Part C)

3.59 Information Technologies Institute Centre for Research and Technology Hellas

Hyperlinking Evaluation

• Evaluate LinkedTV / MediaMixer Technologies for Analysing and Connecting together video fragments with related content

• Relevance to users • Large-scale video collection

MediaEval Benchmarking Initiative for Multimedia Evaluation The "multi" in multimedia: speech, audio, visual content, tags, users, context

Page 60: Video Hyperlinking Tutorial (Part C)

3.60 Information Technologies Institute Centre for Research and Technology Hellas

The MediaEval Search and Hyperlinking Task

• Information seeking in a video dataset: retrieving video/media fragments

Eskevich, M., Aly, R., Ordelman, R., Chen, S., Jones, G. J.F. The Search and Hyperlinking Task at MediaEval 2013. In Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop,, 1043, ISSN: 1613-0073. Barcelona, Spain, 2013.

Page 61: Video Hyperlinking Tutorial (Part C)

3.61 Information Technologies Institute Centre for Research and Technology Hellas

The MediaEval Search and Hyperlinking Task

• The 2013 dataset: 2323 BBC videos of different genres (440 programs)

Page 62: Video Hyperlinking Tutorial (Part C)

3.62 Information Technologies Institute Centre for Research and Technology Hellas

The MediaEval Search and Hyperlinking Task

• The 2013 dataset: 2323 BBC videos of different genres (440 programs) – ~1697h of video + audio – Two types of ASR transcript (LIUM/LIMSI) – Manual subtitle – Metadata (channel, cast, synopsis, etc…) – Shot boundaries and keyframes – Face detection and similarity information – Concept detection

Page 63: Video Hyperlinking Tutorial (Part C)

3.63 Information Technologies Institute Centre for Research and Technology Hellas

The 2013 MediaEval Search and Hyperlinking Task

• Search: find a known segment in the collection given a query (text) <top> <itemId>item_18</itemId> <queryText>What does a ball look like when it hits the wall during Squash</queryText> <visualCues>ball hitting a wall in slow motion</visualCues> </top> • Hyperlinking: find relevant segments relatively to an “anchor” segment

(+- context) <anchor> <anchorId>anchor_1</anchorId> <startTime>13.07</startTime> <endTime>13.22</endTime> <item> <fileName>v20080511_203000_bbcthree_little_britain</fileName> <startTime>13.07</startTime> <endTime>14.03</endTime> </item> </anchor>

Page 64: Video Hyperlinking Tutorial (Part C)

3.64 Information Technologies Institute Centre for Research and Technology Hellas

The 2013 MediaEval Search and Hyperlinking Task

• Queries are user generated for both search and hyperlinking – Search: 50 queries from 29 users

• Known-item: the target is known to be in the dataset – Hyperlinking: 98 anchors

• Evaluation: – For search, searched segments are pre-defined – For hyperlinking, crowd-sourcing

– (on 30 anchors only)

Page 65: Video Hyperlinking Tutorial (Part C)

3.65 Information Technologies Institute Centre for Research and Technology Hellas

MediaEval 2013 Submissions

• Search Runs: – scenes-S(-U,-I): scenes search using only textual

features from subtitles (I and U: transcript type) – scenes-noC (-C): scenes search using textual (and

visual) features – cl10-noC (-C) : temporal shot clustering within a

video using textual features (and visual cues).

Page 66: Video Hyperlinking Tutorial (Part C)

3.66 Information Technologies Institute Centre for Research and Technology Hellas

Search Results

• Best performance obtained with scenes • Impact of visual concept: smaller than expected

Run MRR mGAP MASP scenes-C 0.324931 0.187194 0.199647 scenes-noC 0.324603 0.186916 0.199237 scenes-S 0.338594 0.182194 0.210934 scenes-I 0.261996 0.144708 0.158552 scenes-U 0.268045 0.152094 0.164817 cl10-C 0.294770 0.154178 0.181982 cl10-noC 0.286806 0.149530 0.171888

Page 67: Video Hyperlinking Tutorial (Part C)

3.67 Information Technologies Institute Centre for Research and Technology Hellas

mGAP results (60s window)

Page 68: Video Hyperlinking Tutorial (Part C)

3.68 Information Technologies Institute Centre for Research and Technology Hellas

Example Search and Result

• Text query : what to cook with everyday ingredients on a budget, denise van outen, john barrowman, ainsley harriot, seabass, asparagus,ostrich, mushrooms, sweet potato, mango, tomatoes

• Visual cues: denise van outen, john barrowman, ainsley harriot, seabass, asparagus,ostrich, mushrooms, sweet potato, mango, tomatoes

Expected Anchor 20080506_153000_bbctwo_ready_steady_cook.webm#t=67,321 Scenes 20080506_153000_bbctwo_ready_steady_cook.webm#t=48,323 cl10 20080506_153000_bbctwo_ready_steady_cook.webm#t=1287,1406

Page 69: Video Hyperlinking Tutorial (Part C)

3.69 Information Technologies Institute Centre for Research and Technology Hellas

MediaEval 2013 Submissions

• Hyperlinking Runs: – LA-scenes (-cl10/-MLT): only information from the

anchor is used – LC-scenes (-cl10/-MLT): a segment containing the

anchor is used (context)

Page 70: Video Hyperlinking Tutorial (Part C)

3.70 Information Technologies Institute Centre for Research and Technology Hellas

2013 Hyperlinking Results

• Scenes offer the best results • Using context (LC) improves performances • Precision at rank n decreases with n

Run MAP P-5 P-10 P-20

LA cl10 0.0337 0.3467 0.2533 0.1517 LA MLT 0.1201 0.4200 0.4200 0.3217 LA scenes 0.1196 0.6133 0.5133 0.3400 LC cl10 0.0550 0.4600 0.4000 0.2167 LC MLT 0.1820 0.5667 0.5667 0.4300 LC scenes 0.1654 0.6933 0.6367 0.4333

Page 71: Video Hyperlinking Tutorial (Part C)

3.71 Information Technologies Institute Centre for Research and Technology Hellas

2013 Hyperlinking Results (P=10 - 60s windows)

Page 72: Video Hyperlinking Tutorial (Part C)

3.72 Information Technologies Institute Centre for Research and Technology Hellas

The Search and Hyperlinking Demo

Content Analysis

BroadCast Media

Metadata (Subtitles) Lucene/Solr

Media DB

Solr Index

WebService (HTML5/AJAX/PHP)

User Interface

Page 73: Video Hyperlinking Tutorial (Part C)

3.73 Information Technologies Institute Centre for Research and Technology Hellas

• LinkedTV hyperlinking scenario


Page 74: Video Hyperlinking Tutorial (Part C)

3.74 Information Technologies Institute Centre for Research and Technology Hellas

Conclusions and Outlook

• Scenes offer the best temporal granularity • Actual algorithm based on visual features only • Future work: including semantic and audio features

• Importance of Context • Visual features integration is challenging

• Visual concept detectors (accuracy and coverage) • Combination of multimodal features • Mapping between text/entities and visual concepts

• Person identification

Page 75: Video Hyperlinking Tutorial (Part C)

3.75 Information Technologies Institute Centre for Research and Technology Hellas


• Mrs Mathilde Sahuguet (EURECOM/DailyMotion)

• Dr. Bahjat Safadi (EURECOM) • Mr Hoang-An Le (EURECOM) • Mr Quoc-Minh Bui (EURECOM) • LinkedTV Partners (CERTH/ITI, UEP,

Fraunhofer IAIS)

Page 76: Video Hyperlinking Tutorial (Part C)

3.76 Information Technologies Institute Centre for Research and Technology Hellas

Additional Reading

• E. Apostolidis, V. Mezaris, M. Sahuguet, B. Huet, B. Cervenkova, D. Stein, S. Eickeler, J.-L. Redondo Garcia, R. Troncy, L. Pikora, "Automatic fine-grained hyperlinking of videos within a closed collection using scene segmentation", Proc. ACM Multimedia (MM'14), Orlando, FL, US, 3-7 Nov. 2014.

• B. Safadi, M. Sahuguet and B. Huet, When textual and visual information join forces for multimedia retrieval, ICMR 2014, ACM International Conference on Multimedia Retrieval, April 1-4, 2014, Glasgow, Scotland

• M. Sahuguet and B. Huet. Mining the Web for Multimedia-based Enriching. Multimedia Modeling MMM 2014, 20th International Conference on MultiMedia Modeling, 8-10th January 2014, Dublin, Ireland

• M. Sahuguet, B. Huet, B. Cervenkova, E. Apostolidis, V. Mezaris, D. Stein, S. Eickeler, J-L. Redondo Garcia, R. Troncy, L. Pikora. LinkedTV at MediaEval 2013 search and hyperlinking task, MEDIAEVAL 2013, Multimedia Benchmark Workshop, October 18-19, 2013, Barcelona, Spain

• Stein, D.; Öktem, A.; Apostolidis, E.; Mezaris, V.; Redondo García, J. L.; Troncy, R.; Sahuguet, M. & Huet, B., From raw data to semantically enriched hyperlinking: Recent advances in the LinkedTV analysis workflow, NEM Summit 2013, Networked & Electronic Media, 28-30 October 2013, Nantes, France

• W. Bailer, M. Lokaj, and H. Stiegler. Context in video search: Is close-by good enough when using linking? In ACM ICMR, Glasgow, UK, April 1-4 2014.

• C. A. Bhatt, N. Pappas, M. Habibi, et al. Multimodal reranking of content-based recommendations for hyperlinking video snippets. In ACM ICMR, Glasgow, UK, April 1-4 2014.

• D. Stein, S. Eickeler, R. Bardeli, et al. Think before you link! Meeting content constraints when linking television to the web. In NEM Summit 2013, 28-30, October 2013, Nantes, France.

• P. Over, G. Awad, M. Michel, et al. TRECVID 2012 An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proc. of TRECVID 2012. NIST, USA, 2012.

• M. Eskevich, G. Jones, C. Wartena, M. Larson, R. Aly, T. Verschoor, and R. Ordelman. Comparing retrieval effectiveness of alternative content segmentation methods for Internet video search. In Content-Based Multimedia Indexing (CBMI), 2012.

Page 77: Video Hyperlinking Tutorial (Part C)

3.77 Information Technologies Institute Centre for Research and Technology Hellas

Additional Reading

• Lei Pang, Wei Zhang, Hung-Khoon Tan, and Chong-Wah Ngo. 2012. Video hyperlinking: libraries and tools for threading and visualizing large video collection. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1461-1464.

• A. Habibian, K. E. van de Sande, and C. G. Snoek. Recommendations for Video Event Recognition Using Concept Vocabularies. In Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, ICMR ’13, pages 89–96, Dallas, Texas, USA, April 2013.

• A. Hauptmann, R. Yan, W.-H. Lin, M. Christel, and H. Wactlar. Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News. Multimedia, IEEE Transactions on, 9(5):958–966, 2007.

• A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349–1380, 2000.

• A. Rousseau, F. Bougares, P. Deleglise, H. Schwenk, and Y. Estev. LIUM's systems for the IWSLT 2011 Speech Translation Tasks. In Proceedings of IWSLT 2011, San Francisco, USA, 2011.

• Gauvain, J.-L., Lamel, L. and Adda, G., 2002. The LIMSI broadcast news transcription system. Speech Communication 37, 89-108

• C. Fellbaum, editor. WordNet: an electronic lexical database. MIT Press, 1998. • Carles Ventura, Marcel Tella-Amo, Xavier Giro-I-Nieto, “UPC at MediaEval 2013 Hyperlinking Task”, Proceedings of the

MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 18-19, 2013. • Camille Guinaudeau, Anca-Roxana Simon, Guillaume Gravier, Pascale Sébillot, “HITS and IRISA at MediaEval 2013: Search

and Hyperlinking Task” , Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 18-19, 2013.

• Mathilde Sahuguet, Benoit Huet, Barbora Červenková, Evlampios Apostolidis, Vasileios Mezaris, Daniel Stein, Stefan Eickeler, Jose Luis Redondo Garcia, Lukáš Pikora, “LinkedTV at MediaEval 2013 Search and Hyperlinking Task” , Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 18-19, 2013.

Page 78: Video Hyperlinking Tutorial (Part C)

3.78 Information Technologies Institute Centre for Research and Technology Hellas

Additional Reading

• Tom De Nies, Wesley De Neve, Erik Mannens, Rik Van de Walle, “Ghent University-iMinds at MediaEval 2013: An Unsupervised Named Entity-based Similarity Measure for Search and Hyperlinking” , Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 18-19, 2013.

• Fabrice Souvannavong, Bernard Mérialdo, Benoit Huet, Video content modeling with latent semantic analysis, CBMI 2003, 3rd International Workshop on Content-Based Multimedia Indexing, September 22-24, 2003, Rennes, France

• Itheri Yahiaoui, Bernard Merialdo, Benoit Huet, Comparison of multiepisode video summarization algorithms, EURASIP Journal on applied signal processing, 2003

• Chidansh Bhatt, Nikolaos Pappas, Maryam Habibi, Andrei Popescu-Belis, “Idiap at MediaEval 2013: Search and Hyperlinking Task” , Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 18-19, 2013.

• Petra Galuščáková, Pavel Pecina, “CUNI at MediaEval 2013 Search and Hyperlinking Task” , Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 18-19, 2013.

• Shu Chen, Gareth J.F. Jones, Noel E. O'Connor, “DCU Linking Runs at MediaEval 2013: Search and Hyperlinking Task” , Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 18-19, 2013.

• Michal Lokaj, Harald Stiegler, Werner Bailer, “TOSCA-MP at Search and Hyperlinking of Television Content Task” , Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 18-19, 2013.

• Bahjat Safadi, Mathilde Sahuguet, Benoit Huet, Linking text and visual concepts semantically for cross modal multimedia search, 21st IEEE International Conference on Image Processing, October 27-30, 2014, Paris, France

Indexing Systems • • • •

Projects • LinkedTV: Television linked to the web. • MediaMixer: Community set-up and networking for the remixing

of online media fragments. • Axes: Access to audiovisual archives.

Page 79: Video Hyperlinking Tutorial (Part C)

3.79 Information Technologies Institute Centre for Research and Technology Hellas

Thank you!

More information: [email protected]