the journey is the reward - towards a change in (web) search paradigms

61
The Journey is the Reward Towards new Paradigms in (Web) Search Harald Sack Hasso-Plattner-Institute for IT Systems Engineering University of Potsdam, Germany 18th International Conference on Business Information Systems, BIS 2015 / 4th DBpedia Community Meeting Poznan, Poland, 25. June 2015 CC BY-NC-SA 2.0 1

Upload: harald-sack

Post on 31-Jul-2015

162 views

Category:

Education


0 download

TRANSCRIPT

The Journey is the Reward Towards new Paradigms in (Web) Search

Harald SackHasso-Plattner-Institute for IT Systems Engineering

University of Potsdam, Germany

18th International Conference on Business Information Systems, BIS 2015 / 4th DBpedia Community MeetingPoznan, Poland, 25. June 2015 CC BY-NC-SA 2.0 1

The Journey is the RewardTowards new Paradigms in (Web) Search

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015

Agenda:

● Search and Find and why we are not always content with the result

● Semantic Multimedia Analysisto better “understand” the content

● Exploratory Search and Intelligent Recommendationfrom retrieval to discovery

CC BY-NC-SA 2.0 2

CC BY-NC-SA 2.0 3Web Search today...

CC BY-NC-SA 2.0 4Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015

CC BY-NC-SA 2.0

autocompletion

Googleknowledgegraph

CC BY-NC-SA 2.0

Query by example

visual analysis

searchrecommendations

CC BY-NC-SA 2.0 7CC BY-NC-SA 2.0 www.yovisto.com

CC BY-NC-SA 2.0 8CC BY-NC-SA 2.0 www.yovisto.com

...but maybe you are interested in:● Buzz Aldrin (1 videos)

● John Glen (1 video)

● Juri Gagarin (2 videos)

● Richard Nixon (3 videos)

● Apollo 11 (1 video)

● NASA (20 videos)

● Moon (14 videos)

● space exploration (34 videos)

● technology (1.205 videos)

sorry, no results found for “Neil Armstrong”, ...

CC BY-NC-SA 2.0 9Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015

The Information Retrieval Paradigm

Set of Queries Set of Documents

Query Formulation Indexing

indexquery

matches based on string similarity

Harald Sack, From Script Idea to TV Rerun, LIME 2015, Florence, Italy, May 18, 2015

How do you express a search if you don‘t know the proper name?

CC BY-NC-SA 2.0 10

Moon

Moon Landing

Moon Landing Hit

Moon Landing Hit silent movie

Moon Landing Hit silent movie

Moon Landing Hit silent movie

17

● sometimes text or media alone are not sufficient to

answer the information needs

● what is missing are often the relational connections

and circumstances

● i.e. the contextual information is needed to answer

the queries...

● therefore you have to better understand the media

Semantic Media Analysis

The Journey is the RewardTowards new Paradigms in (Web) Search

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015

Agenda:

● Search and Find and why we are not always content with the result

● Semantic Multimedia Analysisto better “understand” the content

● Exploratory Search and Intelligent Recommendationfrom retrieval to discovery

CC BY-NC-SA 2.0 18

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 19

How to Search a Multimedia Archive?

(Selected) Automated Media Analysis

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 20

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 21

Structural Video Analysis

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 22

Multimedia Annotation

media fragment media fragment media fragment

http://example.com/example.ogv#t=10,20 http://example.com/example.ogv#t=20,30 http://example.com/example.ogv#t=10,20

... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Neil Armstrong</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 140 330 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...

MPEG 7metadata

<a href=“http://example.com/armstrong.ogv#t=20,30&xywh=480,150,140,330“>Neil Armstrong</a> media fragment URI

CC BY-NC-SA 2.0 23

www.yovisto.com

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 24

What is the Meaning of Metadata?

● Authoritative Metadata

○ structured data

○ semi-structured data

○ natural language text

● Non-authoritative Metadata

○ User Tags (free keywords)

○ User Comments

○ controlled vocabulary

● Media Analysis Metadata

○ low level features

○ high level features

Interpretation

level of abstraction

reliability

accuracy

context

pragmatics

time & space constraints

Semantic Analysis

Semantic Media Analysis“Neal Armstrong” is more than just a character string

Person

CC BY-NC-SA 2.0 25

is a

Name

Date of Birth

has a

has a

Space Missionhas crew member

Event

is an

Begin Date

End Date

has a

has a

Crew Sizehas a

Buzz Aldrin

is an

Neil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“

is an

Astronaut

Semantic Media Analysis“Neal Armstrong” is more than just a character string

rdf:type

http://dbpedia.org/resource/Neal_Armstrong

dbpedia-owl:Astronautrdfs:subClassOf

dbpedia-owl:birth_name

dbpedia-owl:birth_date

dbpedia-owl:Person“Neil Armstrong“@en

“1930-08-05”^^xsd:date

CC BY-NC-SA 2.0 26

Neil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“

dbpedia:Apollo_11dbpprop:crewMembers

umbel:SpaceMission

rdf:type

“3”^^xsd:integer

dbpprop:crewSize

text

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 27

Semantic Annotation

media fragment media fragment media fragment

http://example.com/example.ogv#t=10,20 http://example.com/example.ogv#t=20,30 http://example.com/example.ogv#t=10,20

rdf:type

http://dbpedia.org/resource/Neal_Armstrong

dbpedia-owl:Astronautrdfs:subClassOf

dbpedia-owl:birth_name

dbpedia-owl:Person“1930-08-05”^^xsd:date

“Neil Armstrong“@en

dbpedia-owl:birth_date

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 28

Semantic Annotation

media fragment media fragment media fragment

http://example.com/example.ogv#t=10,20 http://example.com/example.ogv#t=20,30 http://example.com/example.ogv#t=10,20

...<div vocab="http://www.w3.org/ns/oa#" prefix="dctypes: http://purl.org/dc/dcmitype/ foaf: http://xmlns.com/foaf/0.1/" typeof="Annotation" resource="#contentAnnotation-001"> <div property="hasTarget" resource="http://example.com/armstrong.ogv#t=20,30&xywh=480,150,140,330" typeof="dctypes:video"> </div> <div property="hasBody" typeof="SemanticTag"> <a property="foaf:page" href="http://dbpedia.org/resource/Neil_Armstrong"> Neil Armstrong </a> </div></div> ... HTML with RDFa

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 29

Named Entity Resolution

Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”

http://dbpedia.org/resource/Neal_Armstrong

1) Detect Named Entites in text2) Determine possible Candidate Entites3) Filter Entity Candidates4) Disambiguate Entity Candidates according to Context

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 30

(1) Detect Named Entities in TextNamed Entity Resolution

● Linguistic Analysis with

○ POS Tagging

○ Named Entity Recognition

○ n-gram Analysis

○ Normalization

Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 31

(1) Detect Named Entities in TextNamed Entity Resolution

● Linguistic Analysis with

○ POS Tagging

○ Named Entity Recognition

○ n-gram Analysis

○ Normalization

Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”

Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”

PersonLocationOrganization

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 32

(1) Detect Named Entities in TextNamed Entity Resolution

● Linguistic Analysis with

○ POS Tagging

○ Named Entity Recognition

○ n-gram Analysis

○ Normalization

Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”

mission 1-gramscontrolroommission control 2-gramscontrol roommission control room 3-grams

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 33

(2) Determine possible Entity CandidatesNamed Entity Resolution

● Gazeteer: semantic entities ↔ assigned labels

● ca. 40M labels for 9M DBpedia entities

disambiguate

● Homonyms

○ disambiguation pages via dbpedia-owl:wikiPageDisambiguates

redirect

● Synonyms

○ redirected via dbpedia-owl:wikiPageRedirects

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 34

disambiguate

● resolve chains and cycles in DBpedia link graph

● aggregate all labels from redirect and disambiguation paths

within the leafs

redirectredirect

redirect

label a1 label a2

label a3

label a4

label a5 label a6

(2) Determine possible Entity CandidatesNamed Entity Resolution

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 35

disambiguate

● resolve chains and cycles in DBpedia link graph

● aggregate all labels from redirect and disambiguation paths

within the leafs

redirectredirect

redirect

label a1 label a2

label a3

label a4

label a5

label a3label a2label a1

label a6label a5label a3label a2label a1

(2) Determine possible Entity CandidatesNamed Entity Resolution

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 36

Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”

(2) Determine possible Entity CandidatesNamed Entity Resolution

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 37

(3) Filter Entity CandidatesNamed Entity Resolution

Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”

PersonLocationOrganization

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 38

(4) Disambiguate Entity CandidatesNamed Entity Resolution

● Context Analysis

○ considers ambiguity, accuracy, reliability

○ of source data (provenance, static properties)

○ as well as of mapping [label -> entity]

N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, ESWC 2013

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015CC BY-NC-SA 2.0 39

(4) Disambiguate Entity CandidatesNamed Entity Resolution Neil Armstrong, the 38-year-old civilian commander,

radioes to earth and the mission control room here: “Houston, Tranquility Base here, the Eagle has landed.”

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0

● Link Graph Analysis

● Co-Occurrence Analysis

● Relevance Ranking

induced Link Graph

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0

(4) Disambiguate Entity CandidatesNamed Entity Resolution

Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, the Eagle has landed.”

CC BY-NC-SA 2.0

(4) Disambiguate Entity CandidatesNamed Entity Resolution

Connected

Component

Analysis

CC BY-NC-SA 2.0

(4) Disambiguate Entity CandidatesNamed Entity Resolution

1) Identify Connected Components

CC BY-NC-SA 2.0

(4) Disambiguate Entity CandidatesNamed Entity Resolution

(2) Identify Connected Components that cover most term partitions

CC BY-NC-SA 2.0

(4) Disambiguate Entity CandidatesNamed Entity Resolution

(3) Strongly Connected Components consolidate disambiguation

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015CC BY-NC-SA 2.0 45

(4) Disambiguate Entity CandidatesNamed Entity Resolution

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0

● Link Graph Analysis

● Co-Occurrence Analysis

● Relevance Ranking

45

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015CC BY-NC-SA 2.0 46

(4) Disambiguate Entity CandidatesNamed Entity Resolution

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0

● Link Graph Analysis

● Co-Occurrence Analysis

● Relevance Ranking

Neil Armstrong, the 38-year-old civilian commander, radioes to earth and the mission control room here: “Houston, Tranquility Base here, The Eagle has landed.”

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015CC BY-NC-SA 2.0 47

Hierarchical Named Entity Resolution

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0

General Approach:

1. Always start disambiguation with the most reliable algorithm on most

accurate and most reliable data

2. Resolve the remaining ambiguity with less reliable algorithms on less

reliable data

KEA Named Entity Resolution

1. Connected Component Analysis on Link Graph

2. Co-occurrence on wikipedia text corpus

3. Popularity Based Link Graph Analysis

4. Negative Context AnalysisN.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, ESWC 2013N. Steinmetz, H. Sack: About the Influence of Negative Context. ICSC 2013R. Usbeck et al, GERBIL - General Entity Annotator Benchmark, WWW 2014

47

Harald Sack/Barbara Fichte, Linked Data for Media Production,EBU MDN Workshop 2015,Geneva, Switzerland, June 10, 2015CC BY-NC-SA 2.0 48

Feedback-based Named Entity Resolution

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0

General Approach:

1. KEA Automated Entity Resolution

2. Manual Correction of wrongly assigned entities

3. Include induced Link Graph from annotated text for future Link Graph Analysis

4. Include annotated text for future Co-Occurrence Analysis

Implementation

● WordPress PlugIn refer.cx

for automated & manual NER annotation of blog posts

http://refer.cx

48

CC BY-NC-SA 2.0 CC BY-NC-SA 2.0

CC BY-NC-SA 2.0 CC BY-NC-SA 2.0

The Journey is the RewardTowards new Paradigms in (Web) Search

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015

Agenda:

● Search and Find and why we are not always content with the result

● Semantic Multimedia Analysisto better “understand” the content

● Exploratory Search and Intelligent Recommendationfrom retrieval to discovery

CC BY-NC-SA 2.0 51

Retrieval vs Exploration

Retrieval vs Exploration

● Find another interesting book for me● Find books with similar topics● Find books from similar authors● ...

J. Waitelonis, H. Sack: Towards exploratory video search using linked data, Multimedia Tools and Applications, Volume 59, Number 2 (2012)

traditional librariesenable exploratory search

...and intelligent recommendations

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 55

Linked Data Based Exploration

dbpedia:Neil_Armstrong

dbpedia:Michael_Collins

dbpedia:Buzz_Aldrin

dbpedia:Apollo_11dbpedia-owl:mission

dbpedia-owl:mission

dbpedia-owl:mission

category:Apollo_program

dcterms:subject

dbpedia:Apollo_13

dcterms:subject

yago:Space_accidents_and_incidents

rdf:type

dbpedia:Space_Shuttle_Challengerrdf:type

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015CC BY-NC-SA 2.0 56

Linked Data Based Exploration

e.g. via refer.cx WordPress PlugIn at http://blog.yovisto.com

Harald Sack, From Script Idea to TV Rerun, LIME 2015, Florence, Italy, May 18, 2015

Now, which movie should I watch next?

...and please ...don’t make boring suggestions!CC BY-NC-SA 2.0 57

CC BY-NC-SA 2.0

Linked Data Based Recommendations

58

CC BY-NC-SA 2.0

Linked Data Based Recommendations

59

CC BY-NC-SA 2.0

Linked Data Based Recommendations

60

The Journey is the RewardTowards new Paradigms in Search

Harald Sack, BIS 2015 / 4th Int. DBpedia Community Meeting 2015, Poznan, Poland, 25 June 2015

● Search and Find and why we are not always content with the result

● Semantic Multimedia Analysisto better “understand” the content

● Exploratory Search and Intelligent Recommendationfrom retrieval to discovery

CC BY-NC-SA 2.0 61

Harald SackHasso-Plattner-Institute for IT Systems Engineering University of Potsdam, Germanyemail: [email protected]: @lysander07 / @yovisto

Contact:

Thank you for your Attention!