the journey is the reward - towards a change in (web) search paradigms
TRANSCRIPT
The Journey is the Reward Towards new Paradigms in (Web) Search
Harald SackHasso-Plattner-Institute for IT Systems Engineering
University of Potsdam, Germany
18th International Conference on Business Information Systems, BIS 2015 / 4th DBpedia Community MeetingPoznan, Poland, 25. June 2015 CC BY-NC-SA 2.0 1
The Journey is the RewardTowards new Paradigms in (Web) Search
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015
Agenda:
● Search and Find and why we are not always content with the result
● Semantic Multimedia Analysisto better “understand” the content
● Exploratory Search and Intelligent Recommendationfrom retrieval to discovery
CC BY-NC-SA 2.0 2
CC BY-NC-SA 2.0 4Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015
CC BY-NC-SA 2.0 8CC BY-NC-SA 2.0 www.yovisto.com
...but maybe you are interested in:● Buzz Aldrin (1 videos)
● John Glen (1 video)
● Juri Gagarin (2 videos)
● Richard Nixon (3 videos)
● Apollo 11 (1 video)
● NASA (20 videos)
● Moon (14 videos)
● space exploration (34 videos)
● technology (1.205 videos)
sorry, no results found for “Neil Armstrong”, ...
CC BY-NC-SA 2.0 9Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015
The Information Retrieval Paradigm
Set of Queries Set of Documents
Query Formulation Indexing
indexquery
matches based on string similarity
Harald Sack, From Script Idea to TV Rerun, LIME 2015, Florence, Italy, May 18, 2015
How do you express a search if you don‘t know the proper name?
CC BY-NC-SA 2.0 10
17
● sometimes text or media alone are not sufficient to
answer the information needs
● what is missing are often the relational connections
and circumstances
● i.e. the contextual information is needed to answer
the queries...
● therefore you have to better understand the media
Semantic Media Analysis
The Journey is the RewardTowards new Paradigms in (Web) Search
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015
Agenda:
● Search and Find and why we are not always content with the result
● Semantic Multimedia Analysisto better “understand” the content
● Exploratory Search and Intelligent Recommendationfrom retrieval to discovery
CC BY-NC-SA 2.0 18
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 19
How to Search a Multimedia Archive?
(Selected) Automated Media Analysis
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 20
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 21
Structural Video Analysis
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 22
Multimedia Annotation
media fragment media fragment media fragment
http://example.com/example.ogv#t=10,20 http://example.com/example.ogv#t=20,30 http://example.com/example.ogv#t=10,20
... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Neil Armstrong</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 140 330 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...
MPEG 7metadata
<a href=“http://example.com/armstrong.ogv#t=20,30&xywh=480,150,140,330“>Neil Armstrong</a> media fragment URI
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 24
What is the Meaning of Metadata?
● Authoritative Metadata
○ structured data
○ semi-structured data
○ natural language text
● Non-authoritative Metadata
○ User Tags (free keywords)
○ User Comments
○ controlled vocabulary
● Media Analysis Metadata
○ low level features
○ high level features
Interpretation
level of abstraction
reliability
accuracy
context
pragmatics
time & space constraints
Semantic Analysis
Semantic Media Analysis“Neal Armstrong” is more than just a character string
Person
CC BY-NC-SA 2.0 25
is a
Name
Date of Birth
has a
has a
Space Missionhas crew member
Event
is an
Begin Date
End Date
has a
has a
Crew Sizehas a
Buzz Aldrin
is an
Neil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“
is an
Astronaut
Semantic Media Analysis“Neal Armstrong” is more than just a character string
rdf:type
http://dbpedia.org/resource/Neal_Armstrong
dbpedia-owl:Astronautrdfs:subClassOf
dbpedia-owl:birth_name
dbpedia-owl:birth_date
dbpedia-owl:Person“Neil Armstrong“@en
“1930-08-05”^^xsd:date
CC BY-NC-SA 2.0 26
Neil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“
dbpedia:Apollo_11dbpprop:crewMembers
umbel:SpaceMission
rdf:type
“3”^^xsd:integer
dbpprop:crewSize
text
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 27
Semantic Annotation
media fragment media fragment media fragment
http://example.com/example.ogv#t=10,20 http://example.com/example.ogv#t=20,30 http://example.com/example.ogv#t=10,20
rdf:type
http://dbpedia.org/resource/Neal_Armstrong
dbpedia-owl:Astronautrdfs:subClassOf
dbpedia-owl:birth_name
dbpedia-owl:Person“1930-08-05”^^xsd:date
“Neil Armstrong“@en
dbpedia-owl:birth_date
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 28
Semantic Annotation
media fragment media fragment media fragment
http://example.com/example.ogv#t=10,20 http://example.com/example.ogv#t=20,30 http://example.com/example.ogv#t=10,20
...<div vocab="http://www.w3.org/ns/oa#" prefix="dctypes: http://purl.org/dc/dcmitype/ foaf: http://xmlns.com/foaf/0.1/" typeof="Annotation" resource="#contentAnnotation-001"> <div property="hasTarget" resource="http://example.com/armstrong.ogv#t=20,30&xywh=480,150,140,330" typeof="dctypes:video"> </div> <div property="hasBody" typeof="SemanticTag"> <a property="foaf:page" href="http://dbpedia.org/resource/Neil_Armstrong"> Neil Armstrong </a> </div></div> ... HTML with RDFa
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 29
Named Entity Resolution
Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”
http://dbpedia.org/resource/Neal_Armstrong
1) Detect Named Entites in text2) Determine possible Candidate Entites3) Filter Entity Candidates4) Disambiguate Entity Candidates according to Context
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 30
(1) Detect Named Entities in TextNamed Entity Resolution
● Linguistic Analysis with
○ POS Tagging
○ Named Entity Recognition
○ n-gram Analysis
○ Normalization
Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 31
(1) Detect Named Entities in TextNamed Entity Resolution
● Linguistic Analysis with
○ POS Tagging
○ Named Entity Recognition
○ n-gram Analysis
○ Normalization
Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”
Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”
PersonLocationOrganization
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 32
(1) Detect Named Entities in TextNamed Entity Resolution
● Linguistic Analysis with
○ POS Tagging
○ Named Entity Recognition
○ n-gram Analysis
○ Normalization
Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”
mission 1-gramscontrolroommission control 2-gramscontrol roommission control room 3-grams
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 33
(2) Determine possible Entity CandidatesNamed Entity Resolution
● Gazeteer: semantic entities ↔ assigned labels
● ca. 40M labels for 9M DBpedia entities
disambiguate
● Homonyms
○ disambiguation pages via dbpedia-owl:wikiPageDisambiguates
redirect
● Synonyms
○ redirected via dbpedia-owl:wikiPageRedirects
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 34
disambiguate
● resolve chains and cycles in DBpedia link graph
● aggregate all labels from redirect and disambiguation paths
within the leafs
redirectredirect
redirect
label a1 label a2
label a3
label a4
label a5 label a6
(2) Determine possible Entity CandidatesNamed Entity Resolution
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 35
disambiguate
● resolve chains and cycles in DBpedia link graph
● aggregate all labels from redirect and disambiguation paths
within the leafs
redirectredirect
redirect
label a1 label a2
label a3
label a4
label a5
label a3label a2label a1
label a6label a5label a3label a2label a1
(2) Determine possible Entity CandidatesNamed Entity Resolution
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 36
Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”
(2) Determine possible Entity CandidatesNamed Entity Resolution
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 37
(3) Filter Entity CandidatesNamed Entity Resolution
Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”
PersonLocationOrganization
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 38
(4) Disambiguate Entity CandidatesNamed Entity Resolution
● Context Analysis
○ considers ambiguity, accuracy, reliability
○ of source data (provenance, static properties)
○ as well as of mapping [label -> entity]
N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, ESWC 2013
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015CC BY-NC-SA 2.0 39
(4) Disambiguate Entity CandidatesNamed Entity Resolution Neil Armstrong, the 38-year-old civilian commander,
radioes to earth and the mission control room here: “Houston, Tranquility Base here, the Eagle has landed.”
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0
● Link Graph Analysis
● Co-Occurrence Analysis
● Relevance Ranking
induced Link Graph
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0
(4) Disambiguate Entity CandidatesNamed Entity Resolution
Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, the Eagle has landed.”
CC BY-NC-SA 2.0
(4) Disambiguate Entity CandidatesNamed Entity Resolution
Connected
Component
Analysis
CC BY-NC-SA 2.0
(4) Disambiguate Entity CandidatesNamed Entity Resolution
1) Identify Connected Components
CC BY-NC-SA 2.0
(4) Disambiguate Entity CandidatesNamed Entity Resolution
(2) Identify Connected Components that cover most term partitions
CC BY-NC-SA 2.0
(4) Disambiguate Entity CandidatesNamed Entity Resolution
(3) Strongly Connected Components consolidate disambiguation
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015CC BY-NC-SA 2.0 45
(4) Disambiguate Entity CandidatesNamed Entity Resolution
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0
● Link Graph Analysis
● Co-Occurrence Analysis
● Relevance Ranking
45
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015CC BY-NC-SA 2.0 46
(4) Disambiguate Entity CandidatesNamed Entity Resolution
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0
● Link Graph Analysis
● Co-Occurrence Analysis
● Relevance Ranking
Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015CC BY-NC-SA 2.0 47
Hierarchical Named Entity Resolution
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0
General Approach:
1. Always start disambiguation with the most reliable algorithm on most
accurate and most reliable data
2. Resolve the remaining ambiguity with less reliable algorithms on less
reliable data
KEA Named Entity Resolution
1. Connected Component Analysis on Link Graph
2. Co-occurrence on wikipedia text corpus
3. Popularity Based Link Graph Analysis
4. Negative Context AnalysisN.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, ESWC 2013N. Steinmetz, H. Sack: About the Influence of Negative Context. ICSC 2013R. Usbeck et al, GERBIL - General Entity Annotator Benchmark, WWW 2014
47
Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015CC BY-NC-SA 2.0 48
Feedback-based Named Entity Resolution
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0
General Approach:
1. KEA Automated Entity Resolution
2. Manual Correction of wrongly assigned entities
3. Include induced Link Graph from annotated text for future Link Graph Analysis
4. Include annotated text for future Co-Occurrence Analysis
Implementation
● WordPress PlugIn refer.cx
for automated & manual NER annotation of blog posts
http://refer.cx
48
The Journey is the RewardTowards new Paradigms in (Web) Search
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015
Agenda:
● Search and Find and why we are not always content with the result
● Semantic Multimedia Analysisto better “understand” the content
● Exploratory Search and Intelligent Recommendationfrom retrieval to discovery
CC BY-NC-SA 2.0 51
Retrieval vs Exploration
● Find another interesting book for me● Find books with similar topics● Find books from similar authors● ...
J. Waitelonis, H. Sack: Towards exploratory video search using linked data, Multimedia Tools and Applications, Volume 59, Number 2 (2012)
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 55
Linked Data Based Exploration
dbpedia:Neil_Armstrong
dbpedia:Michael_Collins
dbpedia:Buzz_Aldrin
dbpedia:Apollo_11dbpedia-owl:mission
dbpedia-owl:mission
dbpedia-owl:mission
category:Apollo_program
dcterms:subject
dbpedia:Apollo_13
dcterms:subject
yago:Space_accidents_and_incidents
rdf:type
dbpedia:Space_Shuttle_Challengerrdf:type
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 56
Linked Data Based Exploration
e.g. via refer.cx WordPress PlugIn at http://blog.yovisto.com
Harald Sack, From Script Idea to TV Rerun, LIME 2015, Florence, Italy, May 18, 2015
Now, which movie should I watch next?
...and please ...don’t make boring suggestions!CC BY-NC-SA 2.0 57
The Journey is the RewardTowards new Paradigms in Search
Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015
● Search and Find and why we are not always content with the result
● Semantic Multimedia Analysisto better “understand” the content
● Exploratory Search and Intelligent Recommendationfrom retrieval to discovery
CC BY-NC-SA 2.0 61
Harald SackHasso-Plattner-Institute for IT Systems Engineering University of Potsdam, Germanyemail: [email protected]: @lysander07 / @yovisto
Contact:
Thank you for your Attention!