information extraction - max planck institute for informatics
TRANSCRIPT
![Page 1: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/1.jpg)
Information extraction
9. Applications
Simon Razniewski
Winter semester 2019/20
![Page 2: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/2.jpg)
Announcements
• Results assignment 8
• Evaluation (w/ break in middle)
• Tentative exam schedule
• 15 minutes/exam
• Sample questions today’s lab
• Conflicts? Forum
2
![Page 3: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/3.jpg)
Assignment 8 - Sample rules
3
![Page 4: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/4.jpg)
Outline
1. Academic projects
• Scraping and Harvesting
• Pattern-based text extraction and OpenIE
2. Industrial Knowledge Bases
3. Knowledge Base Question Answering
4. Semantic Web
4
![Page 5: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/5.jpg)
DBpedia (2007)
• Large-scale Wikipedia infobox+category scraping
• Manually designed mappings to consolidate synonymous
attributes
• See lecture/assignment 3
• Multilingual
• No persistent IDs
• For long considered the “core” of Semantic Web (see later)
• Data access
• Per entity: http://dbpedia.org/page/Max_Planck_Institute_for_Informatics
• SPARQL endpoint:
• http://dbpedia.org/snorql/?query=SELECT+%3Fitem+WHERE+%7B%0D%0A%3Fi
tem+dbo%3AalmaMater+dbr%3ASaarland_University%0D%0A%7D
• Data dumps
• https://wiki.dbpedia.org/develop/datasets
• https://wiki.dbpedia.org/downloads-2016-10
5
![Page 6: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/6.jpg)
YAGO (2007)
• Precision-oriented Wikipedia infobox+category extraction
• Subset of 76 important relations, cleaning steps (>95%
precision)
• Much focus on type extraction from categories
• “French writers” “Writer” + “French person”
• WordNet disambiguation and linking
• Data access
• Per-entity access: https://gate.d5.mpi-
inf.mpg.de/webyago3spotlx/Browser
• Or https://gate.d5.mpi-
inf.mpg.de/webyago3spotlxComp/SvgBrowser/
• SPARQL access: (currently down)
• Data dumps: https://www.mpi-
inf.mpg.de/departments/databases-and-information-
systems/research/yago-naga/yago/downloads/
6
![Page 7: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/7.jpg)
BabelNet (2012)
7
Focus on general terms, sense disambiguation, instead of
named entities
http://babelfy.org/
![Page 8: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/8.jpg)
Wikidata (2012)
• Largely supersedes YAGO and DBpedia
• Not itself built using automated IE techniques
• Community generally disapproves of automated extraction
• Isolated projects, e.g. https://github.com/google/sling
• https://www.wikidata.org/wiki/User:Anders-sandholm
• Nonetheless highly important for IE
• Disambiguation reference
• Training data source (distant supervision)
• Data access:
• SPARQL: https://w.wiki/DKU
• Individual entities: https://www.wikidata.org/wiki/Q565400
• JSON:
https://www.wikidata.org/wiki/Special:EntityData/Q565400.json
• Dumps:
https://www.wikidata.org/wiki/Wikidata:Database_download
• ~65 GB zipped
8
![Page 9: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/9.jpg)
Outline
1. Academic projects
• Scraping and Harvesting
• Pattern-based text extraction and OpenIE
2. Industrial Knowledge Bases
3. Knowledge Base Question Answering
4. Semantic Web
9
![Page 10: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/10.jpg)
10
(2010)
327 manually
designed relations
each with a few
curated training
examples
Sales point: Continuous nature of extraction and learning
http://rtw.ml.cmu.edu/rtw/
![Page 11: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/11.jpg)
11
![Page 12: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/12.jpg)
ReVerb/OpenIE 4.0
• Knowledge base built using open information
extraction
• 5 billion extractions from general web crawls
• https://openie.allenai.org/
• (previous lecture)
12
![Page 13: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/13.jpg)
13
![Page 14: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/14.jpg)
Outline
1. Academic projects
• Scraping and Harvesting
• Pattern-based text extraction and OpenIE
2. Industrial Knowledge Bases
3. Knowledge Base Question Answering
4. Semantic Web
14
![Page 15: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/15.jpg)
Industrial projects
15
- Baidu
![Page 16: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/16.jpg)
Google Knowledge Vault (2014)
• Ambitious project combining text extraction,
semistructured extraction, and predictive models
• See lecture 8
• Usage status unknown
16
[Dong, Xin, et al. "Knowledge vault: A
web-scale approach to probabilistic
knowledge fusion, KDD 2014]
![Page 17: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/17.jpg)
17
• https://developers.google.com/knowledge-graph
• Wikidata noise copied (see lecture 3)
(since ~2012)
![Page 18: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/18.jpg)
18
![Page 19: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/19.jpg)
19
![Page 20: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/20.jpg)
20
[Noy, Natasha, et al. "Industry-scale Knowledge Graphs:
Lessons and Challenges." Queue 17.2 (2019): 20]
![Page 21: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/21.jpg)
21
![Page 22: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/22.jpg)
22
![Page 23: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/23.jpg)
23
Watson outperformed the 74-fold human
winner in the Jeopardy quiz show
![Page 24: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/24.jpg)
24
![Page 25: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/25.jpg)
Baidu
• Non-English languages traditionally underrepresented
• Open (academic) solutions:
• Zhishi.me: Chinese-language equivalent of DBpedia
• Based on Baidu Baike, Hudong Baike, Chinese Wikipedia
• Xlore: English-Chinese alignment KB
• Baidu has apparently three internal knowledge graphs
• https://www.mdpi.com/2071-1050/10/9/3245/htm
• Huawei building a knowledge graph?
• https://www.huawei.com/en/press-
events/news/2019/9/atlas-series-products-cloud-services-
all-scenario-ai-solutions
25
![Page 26: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/26.jpg)
Outline
1. Academic projects
• Scraping and Harvesting
• Pattern-based text extraction and OpenIE
2. Industrial Knowledge Bases
3. Knowledge Base Question Answering
4. Semantic Web
26
![Page 27: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/27.jpg)
Evaluation
27
![Page 28: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/28.jpg)
Advertisement: Thesis topics
• http://simonrazniewski.com/#theses
1. Social commonsense knowledge extraction
• “Americans like guns, Germans speed on highways, Japanese bow for greetings”
• Treasure of world knowledge, yet risk of bias and prejudice
• Context: IE, ML
2. Commonsense extraction from children (audio)books
• Does infant content make commonsense more explicit?
• Context: IE
3. Stability and completeness prediction in Wikidata
• What information is complete, and which one is stable?
• Context: Data management, Machine Learning
4. Topical image representativeness and coverage
• What’s missing in an image collection?
• Context: Data management, (Computer Vision)
5. Knowledge-grounded story generation
• Can structured knowledge yield better stories?
• Context: Text generation
• Limited availability
For planning please write me till Feb 17
(actual start flexible)
28
![Page 29: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/29.jpg)
Outline
1. Scraping and Harvesting
• DBpedia, Yago, BabelNet, (Wikidata)
2. Pattern-based text extraction and OpenIE
• NELL and ReVerb
3. Industrial Knowledge Bases
4. Knowledge Base Question Answering
5. Semantic Web
29
![Page 30: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/30.jpg)
30
Question answering: Vital for information access
★ Direct answers to questions
★ Saves time and effort
★ Natural in voice UI
![Page 31: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/31.jpg)
31
Question answering: Vital for information access
![Page 32: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/32.jpg)
32
Approaches to question answering
• Traditional IR-style approach: Match question with
text phrases in documents
• “What is the capital of Belgium”
• “Brussels is the capital of Belgium”
• Works only for simple questions
• Misses additional conditions
• Google, Siri, Echo et al.
• Precision much more important than recall
• Answer origin needs to be debuggable/explainable
Question answering from structured sources much preferred
![Page 33: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/33.jpg)
33
Question answering is a hot topic
★ QA over knowledge graphs [Abujabal et al. 2018]
★ Reading comprehension QA [Reddy et al. 2018]
★ Visual and multimodal QA [Lu et al. 2016]
★ Community QA [Hoogeveen et al. 2018]
★ Passage retrieval and sentence selection [Shen et al. 2018]
★ Non-factoid: Causal, procedural, ...
![Page 34: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/34.jpg)
34
QA over Knowledge Graphs
Which Oscar nominations did Nolan
receive?
ChristopherNolan
gender
Male Director
type
BestDirector
nominatedFor
Inception
directed
AcademyAwardtype
<ChristopherNolan, gender, Male>
<ChristopherNolan, type, Director>
<ChristopherNolan, directed, Inception>
<ChristopherNolan, nominatedFor,
BestDirector>
<BestDirector, type, AcademyAward>
<ChristopherNolan, birthDate, 30 July 1970>birthDate 30 July 1970
![Page 35: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/35.jpg)
<ChristopherNolan, gender, Male>
<ChristopherNolan, type, Director>
<ChristopherNolan, directed, Inception>
<ChristopherNolan, nominatedFor, BestDirector>
<BestDirector, type, AcademyAward>
<ChristopherNolan, birthDate, 30 July 1970>
35
QA over Knowledge Graphs
Which Oscar nominations did Nolan
receive?
ChristopherNolan
?ANS
nominatedFor
AcademyAwardtype
SELECT ?ANSWHERE {ChristopherNolan nominatedFor ?ANS .?ANS type AcademyAward }
BestDirector
SPARQL
![Page 36: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/36.jpg)
36
Generalizing QA
★ If we can answer:
○ What are the Oscar award nominations of Nolan?
★ Then we should be able to answer:
○ What are the Cannes award nominations of Ryan Coogler?
○ Which Oscar award nominations did Nolan receive?
Same syntax!
Same semantics!
![Page 37: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/37.jpg)
★ Interpretable
Template-based Question Answering
37
Question
Who is Inception’s director?
Question template
Who is <NOUN1>’s <NOUN2>?
Query
?ANS director Inception
Query template
?ANS <PRED1> <ENT1>
1 SPARQL
triple pattern
![Page 38: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/38.jpg)
★ Generalizes to new domains
Template-based Question Answering
38
Question
Who is Inception’s director?
Question template
Who is <NOUN1>’s <NOUN2>?
Query
?ANS director Inception
Query template
?ANS <PRED1> <ENT1>
Who is Libya’s president?
Who is Messi’s manager?
1 SPARQL
triple pattern
![Page 39: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/39.jpg)
★ Generalizes to new domains
Template-based Question Answering
39
Question
Who plays the role of Cobb in
Inception?
Question template
Who <VERB> <DT> <NOUN>
<PREP> <NOUN>?
Query
?ANS playsIn Inception
?ANS role Cobb
Query template
?ANS <PRED1> <ENT1>
?ANS <PRED2> <ENT2>
2 SPARQL
triple patterns
![Page 40: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/40.jpg)
★ Hand-crafted by experts
(Fader et al. 2014; Unger et al. 2013)
★ Low coverage
★ Solution: Learn templates
○ Question templates
○ Query templates
○ Slot alignments
Challenges with templates
40
![Page 41: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/41.jpg)
Question: What are the Oscar award nominations of Nolan?
Dependency parse
Question template (labeled nodes and edges)
41
Dependency-parse-based templates
![Page 42: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/42.jpg)
Query: ChristopherNolan nominatedFor ?VAR .
?VAR awardTitle ?ANS .
?ANS type AcademyAward
Graphical Query
Query template
42
Graphical query templates
ChristopherNolan nominatedFor
awardTitle
?VAR
?ANStypeAcademyAward
ENT PRED1
PRED2
?VAR
?ANStypeTYPE
![Page 43: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/43.jpg)
43
Slot Alignments
ENT PRED1 PRED2?VAR ?ANS type TYPE
![Page 44: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/44.jpg)
Template-based question answering
Question-Query
log
TrainTemplate-based
answering
Template
bank
Match
New question
44
No answer found
Similarity-based
answering
Match
User feedback on
answers Add
QA pairs
Generalize
![Page 45: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/45.jpg)
45
Training template-based QA
★ Collecting question-query pairs difficult
★ Start with question-answer pairs instead
★ Create queries by distant supervision
★ Generalize to create slot-aligned templates
![Page 46: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/46.jpg)
46
ChristopherNolan nominatedFor
awardTitle
Award
BestDirector
type
AcademyAward
★ Retain shortest path between question and
answer entities
★ Retain answer type information
Question: What are the Oscar award nominations of Nolan?
Answer: Best Director
Distant supervision from Q-A pairs
![Page 47: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/47.jpg)
Question: What are the Oscar award nominations of Nolan?
Answer: Best Director
Query: ChristopherNolan nominatedFor ?VAR .
?VAR awardTitle ?ANS .
?ANS Type AcademyAward
47
ChristopherNolan nominatedFor
awardTitle
?VAR
?ANS
type
AcademyAward
Distant supervision from Q-A pairs
![Page 48: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/48.jpg)
Question-schema alignment
4848
nominatedFor AcademyAwardawardTitle
oscar nominations oscarnominations ofwhat nominations
award nominationsaward
oscar award
oscar award nominations nominationswhat are
Question
KB schema
![Page 49: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/49.jpg)
★ Bipartite graph with edge weights (Yahya et al. 2012)
★ Weights from lexicons LP
and LT
(Abujabal et al. 2017,
Berant and Liang 2013)
Create Candidate Alignments
4949
nominatedFor AcademyAwardawardTitle
oscar nominations oscarnominations ofwhat nominations
award nominationsaward
oscar award
oscar award nominations nominationswhat are
0.90.5
0.7
![Page 50: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/50.jpg)
Create Candidate Alignments
5050
nominatedFor AcademyAwardawardTitle
oscar nominations oscarnominations ofwhat nominations
award nominationsaward
oscar award
oscar award nominations nominationswhat are
0.90.5
0.7
Phrase KG Predicate Weight
nominee for nominatedFor 0.8
nominations of nominatedFor 0.9
oscar nominations nominatedFor 0.5
Phrase KG Type Weight
Academy Award AcademyAward 0.9
Oscar AcademyAward 0.7
Oscar Award AcademyAward 0.8
![Page 51: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/51.jpg)
★ Best alignment of items with Integer Linear Program (ILP)
★ At least/at most constraints
★ Type coherence
Optimal Mapping via ILP
51
nominatedFor AcademyAwardawardTitle
oscar nominations oscarnominations ofwhat nominations
award nominationsaward
oscar award
oscar award nominations nominationswhat are
0.90.5
0.7
![Page 52: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/52.jpg)
★ Best alignment of items with Integer Linear Program (ILP)
★ At least/at most constraints
★ Type coherence
Optimal Mapping via ILP
5252
nominatedFor AcademyAwardawardTitle
oscarnominations of
award
![Page 53: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/53.jpg)
53
Apply Alignment to Question-Query
ChristopherNolan nominatedFor awardTitle?VAR ?ANS
type
AcademyAward
![Page 54: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/54.jpg)
Answering with templates
New question: What are the Cannes award nominations of Ryan
Coogler?
54
Match Template
bank
![Page 55: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/55.jpg)
Instantiating Queries
What are the Cannes award nominations of Ryan Coogler?
Query template 1 Query template 2 ...
...Question template 1 Question template 2
Query 1 Query 2 ...
RyanCoogler nominated
?VAR .
?VAR awardTitle ?ANS .
?ANS Type CannesAward
Rank queries with learning to
rank and execute best query55
RyanCoogler awarded
?VAR .
?VAR awardTitle ?ANS .
?ANS Type GoldenGlobe
![Page 56: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/56.jpg)
★ So far, assumed all answers were correct: Pseudo-relevance
★ Pseudo-relevance degrades quality
★ Users provide feedback on answers
Question: Which Oscar nominations did Nolan receive?
Answer: Best Director
User:
★ Positive feedback:
○ Learn new template from question-query
○ Add new question-query to log
○ Update learning-to-rank model
Closing the Loop with User Feedback
56
![Page 57: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/57.jpg)
Outline
1. Academic projects
• Scraping and Harvesting
• Pattern-based text extraction and OpenIE
2. Industrial Knowledge Bases
3. Knowledge Base Question Answering
4. Semantic Web
57
![Page 58: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/58.jpg)
58
![Page 59: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/59.jpg)
59
![Page 60: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/60.jpg)
60
![Page 61: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/61.jpg)
61
![Page 62: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/62.jpg)
62
![Page 63: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/63.jpg)
63
Reminder: RDF
![Page 64: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/64.jpg)
64
![Page 65: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/65.jpg)
65
![Page 66: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/66.jpg)
66
![Page 67: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/67.jpg)
67
![Page 68: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/68.jpg)
68
![Page 69: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/69.jpg)
69
![Page 70: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/70.jpg)
70
![Page 71: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/71.jpg)
71
RDF and RDFS vocabularies
![Page 72: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/72.jpg)
72
![Page 73: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/73.jpg)
73
![Page 74: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/74.jpg)
74
![Page 75: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/75.jpg)
75
![Page 76: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/76.jpg)
76
![Page 77: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/77.jpg)
77
https://www.wikidata.org/wiki/Special:EntityData/Q565400.rdf
![Page 78: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/78.jpg)
78
![Page 79: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/79.jpg)
79
![Page 80: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/80.jpg)
80
Interlinking on the Semantic Web
![Page 81: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/81.jpg)
81
![Page 82: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/82.jpg)
82
![Page 83: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/83.jpg)
83
![Page 84: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/84.jpg)
84
![Page 85: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/85.jpg)
85
![Page 87: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/87.jpg)
87
• JSON-LD
![Page 88: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/88.jpg)
88
![Page 89: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/89.jpg)
89
![Page 90: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/90.jpg)
90
![Page 91: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/91.jpg)
91
![Page 92: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/92.jpg)
92
![Page 93: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/93.jpg)
References
• Selected references
• Further reading
• qa.mpi-inf.mpg.de
• Slides
• Adapted from Fabian Suchanek and Rishiraj Saha Roy
93
![Page 94: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/94.jpg)
Assignment 9
• No assignment
• Tutorial today: Exam questions
94
![Page 95: Information extraction - Max Planck Institute for Informatics](https://reader033.vdocuments.net/reader033/viewer/2022051323/627c45063afdca41d95ed174/html5/thumbnails/95.jpg)
Take home
• IE important tool for building structured knowledge
• Wikipedia popular resource
• Free text extraction harder but possible
• KBs in widespread use in tech companies
• Actual methods guarded secrets
• Source of data not always known
• Signature application: Question answering
• Challenge: From unstructured user question to structured KB query
• Semantic web: Vision of interlinked and machine-readable
internet
• Schema reuse essential for (simple) machine-readability
95