entity retrieval (tutorial organized by radialpoint in montreal)
DESCRIPTION
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at a tutorial organized by Radialpoint (together with E. Meij and D. Odijk). Previous versions of the tutorial were given at WWW'13, SIGIR'13, and WSDM'14. The current version contains an overhaul of the type-aware ranking part. For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/TRANSCRIPT
![Page 1: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/1.jpg)
Part II Entity Retrieval
Tutorial organized by Radialpoint | Montreal, Canada, 2014
Krisztian Balog University of Stavanger
![Page 2: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/2.jpg)
Entity retrieval
Addressing information needs that are better answered by returning specific objects (entities) instead of just any type of documents.
![Page 3: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/3.jpg)
6%
36%
1%5% 12%
41%Entity (“1978 cj5 jeep”)Type (“doctors in barcelona”)Attribute (“zip code waterville Maine”)Relation (“tom cruise katie holmes”)Other (“nightlife in Barcelona”)Uninterpretable
Distribution of web search queries [Pound et al. 2010]
![Page 4: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/4.jpg)
28%
15%
10% 4%14%
29% EntityEntity+refinerCategoryCategory+refinerOtherWebsite
Distribution of web search queries [Lin et al. 2011]
![Page 5: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/5.jpg)
What’s so special here?
- Entities are not always directly represented- Recognize and disambiguate entities in text
(that is, entity linking) - Collect and aggregate information about a given
entity from multiple documents and even multiple data collections
- More structure than in document-based IR- Types (from some taxonomy) - Attributes (from some ontology) - Relationships to other entities (“typed links”)
![Page 6: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/6.jpg)
Semantics in our context
- working definition:references to meaningful structures
- How to capture, represent, and use structure?- It concerns all components of the retrieval process!
Text-only representation Text+structure representation
info need entity
matching
Abc Abc
Abcinfo
need entitymatching
Abc
Abc
Abc
![Page 7: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/7.jpg)
Overview of core tasksQueries Data set Results
(adhoc) entity retrieval keyword unstructured/
semi-structured ranked list
keyword++(target type(s)) semi-structured ranked list
list completion keyword++(examples) semi-structured ranked list
related entity finding
keyword++(target type, relation)
unstructured & semi-structured ranked list
![Page 8: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/8.jpg)
In this part
- Input: keyword(++) query - Output: a ranked list of entities- Data collection: unstructured and
(semi)structured data sources (and their combinations)
- Main RQ: How to incorporate structure into text-based retrieval models?
![Page 9: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/9.jpg)
Outline
1.Ranking based on entity descriptions
2.Incorporating entity types
3.Entity relationships
Attributes (/Descriptions)
Type(s)
Relationships
![Page 10: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/10.jpg)
Probabilistic models (mostly)- Estimating conditional probabilities!
!
- Conditional independence!
!
- Conditional dependence
P (A|B)
P (A, B|C) = P (A|C) · P (B|C)P (A|B) = P (A)
P (A,B|C)
P (A,B|C) = P (A|B,C)P (B|C)
P (A|B) = P (B|A)P (A)/P (B)
![Page 11: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/11.jpg)
Ranking entity descriptions
Attributes (/Descriptions)
Type(s)
Relationships
![Page 12: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/12.jpg)
Task: ad-hoc entity retrieval
- Input: unconstrained natural language query- “telegraphic” queries (neither well-formed nor
grammatically correct sentences or questions)
- Output: ranked list of entities- Collection: unstructured and/or semi-
structured documents
![Page 13: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/13.jpg)
Example information needs
meg ryan war
american embassy nairobi
ben franklinChernobyl
Worst actor century
Sweden Iceland currency
![Page 14: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/14.jpg)
Two settings
1.With ready-made entity descriptions
2.Without explicit entity representations
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx e
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx e
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx e
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
![Page 15: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/15.jpg)
Ranking with ready-made entity descriptions
![Page 16: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/16.jpg)
This is not unrealistic...
![Page 17: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/17.jpg)
Document-based entity representations
- Most entities have a “home page”- I.e., each entity is described by a document- In this scenario, ranking entities is much like
ranking documents- unstructured - semi-structured
![Page 18: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/18.jpg)
Crash course into Language modeling
![Page 19: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/19.jpg)
ExampleIn the town where I was born, Lived a man who sailed to sea, And he told us of his life,In the land of submarines,
So we sailed on to the sun, Till we found the sea green,And we lived beneath the waves, In our yellow submarine, We all live in yellow submarine, yellow submarine, yellow submarine, We all live in yellow submarine, yellow submarine, yellow submarine.
![Page 20: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/20.jpg)
Empirical document LM
0,00
0,03
0,06
0,08
0,11
0,14
submarine
yelloww
ealllivelivedsailedseabeneathbornfoundgreenhe hisi landlifem
anourso subm
arinessuntilltoldtow
nus w
avesw
herew
ho
P (t|d) =n(t, d)
|d|
![Page 21: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/21.jpg)
Alternatively...
![Page 22: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/22.jpg)
Scoring a queryq = {sea, submarine}
P (q|d) = P (“sea”|✓d) · P (“submarine”|✓d)
![Page 23: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/23.jpg)
Language ModelingEstimate a multinomial probability distribution
from the text
Smooth the distributionwith one estimated from
the entire collection
P (t|✓d) = (1� �)P (t|d) + �P (t|C)
![Page 24: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/24.jpg)
Standard Language Modeling approach
- Rank documents d according to their likelihood of being relevant given a query q: P(d|q)
P (d|q) =P (q|d)P (d)
P (q)/ P (q|d)P (d)
Document priorProbability of the document being relevant to any query
Query likelihoodProbability that query q
was “produced” by document d
P (q|d) =Y
t2q
P (t|✓d)n(t,q)
![Page 25: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/25.jpg)
Standard Language Modeling approach (2)
Number of times t appears in q
Empirical document model
Collection model
Smoothing parameter
Maximumlikelihood estimates
P (q|d) =Y
t2q
P (t|✓d)n(t,q)
Document language modelMultinomial probability distribution over the vocabulary of terms
P (t|✓d) = (1� �)P (t|d) + �P (t|C)
n(t, d)|d|
Pd n(t, d)P
d |d|
![Page 26: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/26.jpg)
Scoring a queryq = {sea, submarine}
P (q|d) = P (“sea”|✓d) · P (“submarine”|✓d)
0.04 0.00020.10.9
0.03602
t P(t|d)submarine 0,14sea 0,04...
t P(t|C)submarine 0,0001sea 0,0002...
(1� �)P (“sea”|d) + �P (“sea”|C)
![Page 27: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/27.jpg)
Scoring a queryq = {sea, submarine}
P (q|d) = P (“sea”|✓d) · P (“submarine”|✓d)
0.14 0.00010.10.9
0.03602
t P(t|d)submarine 0,14sea 0,04...
t P(t|C)submarine 0,0001sea 0,0002...
(1� �)P (“submarine”|d) + �P (“submarine”|C)
0.126010.04538
![Page 28: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/28.jpg)
Here, documents==entities, so
P (e|q) / P (e)P (q|✓e) = P (e)Y
t2q
P (t|✓e)n(t,q)
Entity priorProbability of the entity
being relevant to any query
Entity language modelMultinomial probability distribution over the vocabulary of terms
![Page 29: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/29.jpg)
Semi-structured entity representation
- Entity description documents are rarely unstructured
- Representing entities as - Fielded documents – the IR approach - Graphs – the DB/SW approach
![Page 30: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/30.jpg)
![Page 31: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/31.jpg)
foaf:name Audi A4 rdfs:label Audi A4 rdfs:comment The Audi A4 is a compact executive car produced since late 1994 by the German car manufacturer Audi, a subsidiary of the Volkswagen Group. The A4 has been built [...] dbpprop:production 1994 2001 2005 2008 rdf:type dbpedia-owl:MeanOfTransportation dbpedia-owl:Automobile dbpedia-owl:manufacturer dbpedia:Audi dbpedia-owl:class dbpedia:Compact_executive_car owl:sameAs freebase:Audi A4 is dbpedia-owl:predecessor of dbpedia:Audi_A5 is dbpprop:similar of dbpedia:Cadillac_BLS
dbpedia:Audi_A4
![Page 32: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/32.jpg)
Mixture of Language Models [Ogilvie & Callan 2003]
- Build a separate language model for each field- Take a linear combination of them
mX
j=1
µj = 1
Field language modelSmoothed with a collection model builtfrom all document representations of thesame type in the collectionField weights
P (t|✓d) =mX
j=1
µjP (t|✓dj )
![Page 33: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/33.jpg)
Setting field weights
- Heuristically - Proportional to the length of text content in that field,
to the field’s individual performance, etc.
- Empirically (using training queries)- Problems
- Number of possible fields is huge - It is not possible to optimise their weights directly
- Entities are sparse w.r.t. different fields- Most entities have only a handful of predicates
![Page 34: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/34.jpg)
Predicate folding
- Idea: reduce the number of fields by grouping them together
- Grouping based on (BM25F and)- type [Pérez-Agüera et al. 2010] - manually determined importance [Blanco et al. 2011]
![Page 35: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/35.jpg)
Hierarchical Entity Model [Neumayer et al. 2012]
- Organize fields into a 2-level hierarchy- Field types (4) on the top level - Individual fields of that type on the bottom level
- Estimate field weights- Using training data for field types - Using heuristics for bottom-level types
![Page 36: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/36.jpg)
Two-level hierarchy [Neumayer et al. 2012]
foaf:name Audi A4 rdfs:label Audi A4 rdfs:comment The Audi A4 is a compact executive car produced since late 1994 by the German car manufacturer Audi, a subsidiary of the Volkswagen Group. The A4 has been built [...] dbpprop:production 1994 2001 2005 2008 rdf:type dbpedia-owl:MeanOfTransportation dbpedia-owl:Automobile dbpedia-owl:manufacturer dbpedia:Audi dbpedia-owl:class dbpedia:Compact_executive_car owl:sameAs freebase:Audi A4 is dbpedia-owl:predecessor of dbpedia:Audi_A5 is dbpprop:similar of dbpedia:Cadillac_BLS !
Name
Attributes
Out-relations
In-relations
![Page 37: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/37.jpg)
Comparison of models
d
dfF
...
t
dfF t
... ...d
tdf
...
tdf
...d
t
...
t
Unstructureddocument model
Fieldeddocument model
Hierarchicaldocument model
![Page 38: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/38.jpg)
Probabilistic Retrieval Model for Semistructured data [Kim et al. 2009]
- Extension to the Mixture of Language Models- Find which document field each query term
may be associated with
Mapping probabilityEstimated for each query term
P (t|✓d) =mX
j=1
µjP (t|✓dj )
P (t|✓d) =mX
j=1
P (dj |t)P (t|✓dj )
![Page 39: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/39.jpg)
Estimating the mapping probability
Term likelihoodProbability of a query term
occurring in a given field type
Prior field probabilityProbability of mapping the query term to this field before observing collection statistics
P (dj |t) =P (t|dj)P (dj)
P (t)
X
dk
P (t|dk)P (dk)
P (t|Cj) =P
d n(t, dj)Pd |dj |
![Page 40: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/40.jpg)
Example
cast 0,407
team 0,382
title 0,187
genre 0,927
title 0,07
location 0,002
cast 0,601
team 0,381
title 0,017
dj dj djP (t|dj) P (t|dj) P (t|dj)
meg ryan war
![Page 41: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/41.jpg)
Evaluation initiatives
- INEX Entity Ranking track (2007-09)- Collection is the (English) Wikipedia - Entities are represented by Wikipedia articles
- Semantic Search Challenge (2010-11)- Collection is a Semantic Web crawl (BTC2009)
- ~1 billion RDF triples - Entities are represented by URIs
- INEX Linked Data track (2012-13)- Wikipedia enriched with RDF properties from
DBpedia and YAGO
![Page 42: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/42.jpg)
Ranking without explicit entity representations
![Page 43: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/43.jpg)
Scenario
- Entity descriptions are not readily available- Entity occurrences are annotated
- manually - automatically (~entity linking)
![Page 44: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/44.jpg)
The basic idea Use documents to go from queries to entities
Query-document association
the document’s relevance
Document-entity association how well the document characterises the entity
eqxxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
![Page 45: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/45.jpg)
Two principal approaches
- Profile-based methods- Create a textual profile for entities, then rank them
(by adapting document retrieval techniques)
- Document-based methods- Indirect representation based on mentions identified
in documents - First ranking documents (or snippets) and then
aggregating evidence for associated entities
![Page 46: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/46.jpg)
Profile-based methods
q
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx exxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
e
e
![Page 47: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/47.jpg)
Document-based methods
q
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
X
eX
Xe
e
![Page 48: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/48.jpg)
Many possibilities in terms of modeling
- Generative (probabilistic) models- Discriminative (probabilistic) models- Voting models- Graph-based models
![Page 49: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/49.jpg)
Generative probabilistic models
- Candidate generation models (P(e|q))- Two-stage language model
- Topic generation models (P(q|e))- Candidate model, a.k.a. Model 1 - Document model, a.k.a. Model 2 - Proximity-based variations
- Both families of models can be derived from the Probability Ranking Principle [Fang & Zhai 2007]
![Page 50: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/50.jpg)
Candidate models (“Model 1”) [Balog et al. 2006]
P (q|✓e) =Y
t2q
P (t|✓e)n(t,q)
SmoothingWith collection-wide background model
(1� �)P (t|e) + �P (t)X
d
P (t|d, e)P (d|e)
Document-entity association
Term-candidate co-occurrence
In a particular document. In the simplest case: P (t|d)
![Page 51: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/51.jpg)
Document models (“Model 2”) [Balog et al. 2006]
P (q|e) =X
d
P (q|d, e)P (d|e)
Document-entity association
Document relevance How well document d
supports the claim that e is relevant to q
Y
t2q
P (t|d, e)n(t,q)
Simplifying assumption (t and e are conditionally independent given d)
P (t|✓d)
![Page 52: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/52.jpg)
Document-entity associations
- Boolean (or set-based) approach- Weighted by the confidence in entity linking- Consider other entities mentioned in the
document
eqxxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
![Page 53: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/53.jpg)
Proximity-based variations
- So far, conditional independence assumption between candidates and terms when computing the probability P(t|d,e)
- Relationship between terms and entities that in the same document is ignored- Entity is equally strongly associated with everything
discussed in that document
- Let’s capture the dependence between entities and terms- Use their distance in the document
![Page 54: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/54.jpg)
Using proximity kernels [Petkova & Croft 2007]
P (t|d, e) =1Z
NX
i=1
�d(i, t)k(t, e)
Indicator function 1 if the term at position i is t, 0 otherwise
Normalizing contant
Proximity-based kernel - constant function- triangle kernel- Gaussian kernel- step function
![Page 55: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/55.jpg)
Figure taken from D. Petkova and W.B. Croft. Proximity-based document representation for named entity retrieval. CIKM'07.
![Page 56: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/56.jpg)
Many possibilities in terms of modeling
- Generative probabilistic models- Discriminative probabilistic models- Voting models- Graph-based models
![Page 57: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/57.jpg)
Discriminative models
- Vs. generative models:- Fewer assumptions (e.g., term independence) - “Let the data speak”
- Sufficient amounts of training data required - Incorporating more document features, multiple
signals for document-entity associations - Estimating P(r=1|e,q) directly (instead of P(e,q|r=1)) - Optimization can get trapped in a local maximum/
minimum
![Page 58: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/58.jpg)
Arithmetic Mean Discriminative (AMD) model [Yang et al. 2010]
P✓(r = 1|e, q) =X
d
P (r1 = 1|q, d)P (r2 = 1|e, d)P (d)
Document prior
Query-document relevance
Document-entity relevance
logistic function over a linear combination of features
�⇣ NgX
j=1
�jgj(e, dt)⌘
�⇣ NfX
i=1
↵ifi(q, dt)⌘
standard logistic function
weight parameters
(learned)
features
![Page 59: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/59.jpg)
Learning to rank && entity retrieval
- Pointwise- AMD, GMD [Yang et al. 2010] - Multilayer perceptrons, logistic regression [Sorg &
Cimiano 2011] - Additive Groves [Moreira et al. 2011]
- Pairwise- Ranking SVM [Yang et al. 2009] - RankBoost, RankNet [Moreira et al. 2011]
- Listwise- AdaRank, Coordinate Ascent [Moreira et al. 2011]
![Page 60: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/60.jpg)
Voting models [Macdonald & Ounis 2006]
- Inspired by techniques from data fusion- Combining evidence from different sources
- Documents ranked w.r.t. the query are seen as “votes” for the entity
![Page 61: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/61.jpg)
Voting models Many different variants, including...
- Votes- Number of documents mentioning the entity !!
- Reciprocal Rank- Sum of inverse ranks of documents !!
- CombSUM- Sum of scores of documents
Score(e, q) = |{M(e) \R(q)}|X
{M(e)\R(q)}
s(d, q)
Score(e, q) =X
{M(e)\R(q)}
1rank(d, q)
Score(e, q) = |M(e) \R(q)|
![Page 62: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/62.jpg)
Graph-based models [Serdyukov et al. 2008]
- One particular way of constructing graphs- Vertices are documents and entities - Only document-entity edges
- Search can be approached as a random walk on this graph- Pick a random document or entity - Follow links to entities or other documents - Repeat it a number of times
![Page 63: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/63.jpg)
Infinite random walk [Serdyukov et al. 2008]
Pi(d) = �PJ(d) + (1� �)X
e!d
P (d|e)Pi�1(e),
Pi(e) =X
d!e
P (e|d)Pi�1(d),
PJ(d) = P (d|q),
ee e
d d
e
d d
![Page 64: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/64.jpg)
Evaluation
- Expert finding task @TREC Enterprise track- Enterprise setting (intranet of a large organization) - Given a query, return people who are experts on the
query topic - List of potential experts is provided
- We assume that the collection has been annotated with <person>...</person> tokens
![Page 65: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/65.jpg)
Incorporating entity types
Attributes (/Descriptions)
Type(s)
Relationships
![Page 66: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/66.jpg)
people
products
organizations
locations
![Page 67: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/67.jpg)
Interacting with types grouping results
people
companies
jobs
(more) people
![Page 68: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/68.jpg)
Interacting with types filtering results
![Page 69: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/69.jpg)
Interacting with types filtering results
![Page 70: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/70.jpg)
Type-aware ranking
- Typically, a two-component model:
P (q|e) = P (qT , qt|e) = P (qT |e)P (qt|e)
Type-basedsimilarity
Term-basedsimilarity
#1 Where to get the target type from?
#2 How to estimate type-based similarity?
![Page 71: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/71.jpg)
Target type
- Provided by the user- keyword++ query
- Need to be automatically identified- keyword query
![Page 72: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/72.jpg)
Target type(s) are providedfaceted search, form fill-in, etc.
![Page 73: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/73.jpg)
But what about very many types?which are typically hierarchically organized
![Page 74: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/74.jpg)
Challenges
- Users are not familiar with the type system
![Page 75: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/75.jpg)
In general, categorizing things can be hard
- What is King Arthur?- Person / Royalty / British royalty - Person / Military person - Person / Fictional character
![Page 76: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/76.jpg)
Which King Arthur?!
![Page 77: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/77.jpg)
Upshot for type-aware ranking
- Need to be able to handle the imperfections of the type system- Inconsistencies - Missing assignments - Granularity issues
- Entities labeled with too general or too specific types
- User input is to be treated as a hint, not as a strict filter
![Page 78: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/78.jpg)
Two settings
- Target type(s) are provided by the user- keyword++ query
- Target types need to be automatically identified- keyword query
![Page 79: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/79.jpg)
Identifying target types for queries
- Types can be ranked much like entities[Balog & Neumayer 2012]- Direct term-based representations (“Model 1”) - Types of top ranked entities (“Model 2”)
[Vallet & Zaragoza 2008]
![Page 80: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/80.jpg)
Type-centric vs. entity-centric type ranking
q
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx txxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
t
t
Type-centric
q
X
tX
Xt
t
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
Entity-centric
![Page 81: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/81.jpg)
Hierarchical target type identification
- Finding the single most specific type [from an ontology] that is general enough to cover all entities that are relevant to the query.
- Finding the right granularity is difficult…- Models are good at finding either general (top-level)
or specific (leaf-level) types
![Page 82: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/82.jpg)
Type-based similarity
- Measuring similarity- Set-based - Content-based (based on type labels)
- Need “soft” matching to deal with the imperfections of the category system- Lexical similarity of type labels - Distance based on the hierarchy - Query expansion
P (q|e) = P (qT , qt|e) = P (qT |e)P (qt|e)
![Page 83: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/83.jpg)
Modeling types as probability distributions [Balog et al. 2011]
- Analogously to term-based representations!
!
!
!
- Advantages- Sound modeling of uncertainty associated with
category information - Category-based feedback is possible
p(c|✓Cq ) p(c|✓C
e )
Query Entity
KL(✓Cq ||✓C
e )
![Page 84: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/84.jpg)
Joint type detection and entity ranking [Sawant & Chakrabarti 2013]
- Assumes “telegraphic” queries with target type- woodrow wilson president university - dolly clone institute - lead singer led zeppelin band
- Type detection is integrated into the ranking- Multiple query interpretations are considered
- Both generative and discriminative formulations
![Page 85: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/85.jpg)
Approach- Each query term is either a “type hint” ( )
or a “word matcher” ( )- Number of possible partitions is manageable ( )2|q|
h(~q, ~z)s(~q, ~z)
By comparison, the Padres have been to two World Series, losing in 1984 and 1998.
losing baseball team world series 1998
Major league baseball teams
mentionOf
instanceOf
San Diego PadresEntity
Evidence snippets
Type
![Page 86: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/86.jpg)
Figure taken from Sawant & Chakrabarti (2013). Learning Joint Query Interpretation and Response Ranking. In WWW ’13. (see presentation)
San Diego Padres!
Major league !baseball team!
type context
E
T Padres have been to two World Series, losing in 1984 and 1998!
θ
Type hint : baseball , team
losing team baseball world series 1998
Z
ϕ
Context matchers : !lost , 1998, world series switch!
model! model!
q losing team baseball world series 1998
Generative approach Generate query from entity
![Page 87: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/87.jpg)
Generative formulation
P (e|q) / P (e)X
t,~z
P (t|e)P (~z)P (h(~q, ~z)|t)P (s(~q, ~z)|e)
Type priorEstimated from
answer types in the past
Type modelProbability of observing
t in the type model
Entity modelProbability of observing t in the entity model
Query switchProbability of the interpretation
Entity prior
![Page 88: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/88.jpg)
Discriminative approach Separate correct and incorrect entities
Figure taken from Sawant & Chakrabarti (2013). Learning Joint Query Interpretation and Response Ranking. In WWW ’13. (see presentation)
San_Diego_Padres!
losing team baseball world series 1998 (baseball team)!
losing team baseball world series 1998 (baseball team)!
losing team baseball world series 1998 (t = baseball team)!
1998_World_Series!
losing team baseball world series 1998
(series)!losing team baseball
world series 1998 (series)!
losing team baseball world series 1998
(t = series)!
: losing team baseball world series 1998!q!
![Page 89: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/89.jpg)
Discriminative formulation
�(q, e, t, ~z) = h�1(q, e),�2(t, e),�3(q, ~z, t),�4(q, ~z, e)i
Comparability between hint words
and type
Comparability between matchers and snippets that mention e
Models the type prior P(t|e)
Models the entity prior P(e)
![Page 90: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/90.jpg)
Evaluation
- INEX Entity Ranking track- Entities are represented by Wikipedia articles - Topic definition includes target categories
Movies with eight or more Academy Awardsbest picture oscar british films american films
![Page 91: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/91.jpg)
![Page 92: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/92.jpg)
Entity relationshipsAttributes
(/Descriptions)
Type(s)
Relationships
![Page 93: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/93.jpg)
Related entities
![Page 94: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/94.jpg)
![Page 95: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/95.jpg)
![Page 96: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/96.jpg)
Searching for arbitrary relations*
airlines that currently use Boeing 747 planesORG Boeing 747
Members of The Beaux Arts Trio PER The Beaux Arts Trio
What countries does Eurail operate in?LOC Eurail
*given an input entity and target type
![Page 97: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/97.jpg)
Modeling related entity finding [Bron et al. 2010]
- Ranking entities of a given type (T) that stand in a required relation (R) with an input entity (E)
- Three-component model
p(e|E, T,R) / p(e|E) · p(T |e) · p(R|E, e)
Context model
Type filtering
Co-occurrence model
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
![Page 98: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/98.jpg)
Evaluation
- TREC Entity track- Given
- Input entity (defined by name and homepage) - Type of the target entity (PER/ORG/LOC) - Narrative (describing the nature of the relation in free
text)
- Return (homepages of) related entities
![Page 99: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/99.jpg)
Wrapping up
- Entity retrieval in different flavors using generative approaches based on language modeling techniques
- Increasingly more discriminative approaches over generative ones- Increasing amount of components (and parameters) - Easier to incrementally add informative but
correlated features - But, (massive amounts of ) training data is required!
![Page 100: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/100.jpg)
Future challenges
- It’s “easy” when the “query intent” is known- Desired results: single entity, ranked list, set, … - Query type: ad-hoc, list search, related entity finding, …
- Methods specifically tailored to specific types of requests
- Understanding query intent still has a long way to go
![Page 101: Entity Retrieval (tutorial organized by Radialpoint in Montreal)](https://reader034.vdocuments.net/reader034/viewer/2022051616/5558a265d8b42a2a738b4eab/html5/thumbnails/101.jpg)