![Page 1: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/1.jpg)
Generating Linked Data by inferring the semantics of tables
Varish Mulwad (@varish)University of Maryland, Baltimore County
September 2, 2011
Dr. Tim Finin Dr. Anupam Joshi
![Page 2: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/2.jpg)
Goal
2
Image from : Zagari RM, Bianchi-Porro G, Fiocca R, Gasbarrini G, Roda E, Bazzoli F. Comparison of 1 and 2 weeks of omeprazole, amoxicillin and clarithromycin treatment for Helicobacter pylori eradication: the HYPER Study. Gut. 2007;56: 475-9. [PMID: 17028126]
![Page 3: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/3.jpg)
3
Contribution
Name Team Position Height
Michael Jordan Chicago Shooting guard 1.98
Allen Iverson Philadelphia Point guard 1.83
Yao Ming Houston Center 2.29
Tim Duncan San Antonio Power forward 2.11
http://dbpedia.org/class/yago/NationalBasketballAssociationTeams
http://dbpedia.org/resource/Allen_Iverson Map literals as values of properties
dbprop:team
![Page 4: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/4.jpg)
4
Contribution
Name Team Position Height
Michael Jordan Chicago Shooting guard 1.98
Allen Iverson Philadelphia Point guard 1.83
Yao Ming Houston Center 2.29
Tim Duncan San Antonio Power forward 2.11
@prefix dbpedia: <http://dbpedia.org/resource/> .@prefix dbpedia-owl: <http://dbpedia.org/ontology/> .@prefix yago: <http://dbpedia.org/class/yago/> .
"Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer ."Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams .
"Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan .dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer .
"Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls .dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams .
All this in a completely automated way !!
![Page 5: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/5.jpg)
5
Introduction & Motivation
![Page 6: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/6.jpg)
6
Tables are everywhere !
389, 697 raw and geospatial datasets
The web – 154 million high quality relational tables (Cafarella et al. 2008)
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 7: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/7.jpg)
7
Evidence–based medicine
Figure: Evidence-Based Medicine - the Essential Role of Systematic Reviews, and the Need for Automated Text Mining Tools, IHI 2010
The idea behind Evidence-based Medicine is to judge the efficacy oftreatments or tests by meta-analyses or reviews of clinical trials. Key information in such trials is encoded in tables.
However, the rate at which meta-analyses are published remains very low … hampers effective health care treatment …
# of Clinical trials published in 2008
# of meta analysis published in 2008
![Page 8: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/8.jpg)
8
Related Work
• Extracting tables from documents and web pages Hurst (2006), Embley et al. (2006)
• Understanding semantics of tables Wang et al. (2011), Ventis et al. (2011), Limaye et al.
(2010)
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 9: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/9.jpg)
9
Current systems• Use ‘semantically poor’ knowledge bases
• Only one system focuses on complete table interpretation
• Do not generate Linked Data
• No system tackles literal data
• Critical piece of evidence for interpreting medical tables
• No system dealing with tables in specialized domains (e.g. tables found medical literature)
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 10: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/10.jpg)
10
• Preliminary work / Baseline system
• Analysis and Evaluation of baseline
• Framework grounded in graphical models and probabilistic reasoning
Building a table interpretation framework
![Page 11: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/11.jpg)
11
The System’s Brain (Knowledgebase)
Yago
Wikitology1 – A hybrid knowledgebase where structured data meets unstructured data
1 – Wikitology was created as part of Zareen Syed’s Ph.D. dissertation
Syed, Z., and Finin, T. 2011. Creating and Exploiting a Hybrid Knowledge Base for Linked Data, volume 129 of Revised Selected Papers Series: Communications in Computer and Information Science. Springer.
![Page 12: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/12.jpg)
12
The Baseline System
![Page 13: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/13.jpg)
13
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 14: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/14.jpg)
Predicting Class Labels for column
Team
Chicago
Philadelphia
Houston
San Antonio
Class
Instance
Introduction Related Work Baseline Results Joint Inference Conclusion
1. Chicago Bulls2. Chicago3. Judy Chicago
{dbpedia-owl:Place,dbpedia-owl:City,yago:WomenArtist,yago:LivingPeople,yago:NationalBasketballAssociationTeams }
{dbpedia-owl:Place, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film,yago:NationalBasketballAssociationTeams …. ….. ….. }
{……………………………………………………………. }
dbpedia-owl:Place, dbpedia-owl:City, yago:WomenArtist, yago:LivingPeople, yago:NationalBasketballAssociationTeams, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film ….
![Page 15: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/15.jpg)
15
Linking table cells to entities
Michael Jordan + Chicago + Shooting
Guard + 1.98 + dbpedia-
owl:BasketballPlayer
1. Michael Jordan2. Michael-Hakim Jordan
Classifier 1 – SVM Rank(Ranks the set of entities)
Classifier 2 – SVM (Computes Confidence)
Link to the top ranked entity
Don’t link
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 16: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/16.jpg)
16
Identify Relations
Name
Michael Jordan
Allen Iverson
Yao Ming
Tim Duncan
Team
Chicago
Philadelphia
Houston
San Antonio
Rel ‘A’
Rel ‘A’
Rel ‘A’, ‘C’
Rel ‘A’, ‘B’, ‘C’
Rel ‘A’, ‘B’
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 17: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/17.jpg)
17
Generating a linked RDF representation
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix dbpedia: <http://dbpedia.org/resource/> .@prefix dbpedia-owl: <http://dbpedia.org/ontology/> .@prefix yago: <http://dbpedia.org/class/yago/> .
"Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer ."Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams .
"Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan .dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer .
"Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls .dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams .
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 18: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/18.jpg)
18
Evaluation of the baseline system
![Page 19: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/19.jpg)
19
Dataset summaryNumber of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
* The number in the brackets indicates # excluding columns that contained numbers
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 20: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/20.jpg)
20
Evaluation # 1 (MAP)• Compared the system’s ranked list of labels
against a human–ranked list of labels
• Metric - Average Precision (a.p.) [Mean Average Precision gives a mean over set of queries]
• Commonly used in the Information Retrieval domain to compare two ranked sets
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 21: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/21.jpg)
21
Evaluation # 1 (MAP)
0 10 20 30 40 50 600
0.2
0.4
0.6
0.8
1
1.2Average Precision
Average Precision
Column #
Aver
age
Prec
isio
n
MAP = 0.411
System Ranked:1. Person2. Politician3. President
Evaluator Ranked:1. President2. Politician3. OfficeHolder
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 22: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/22.jpg)
22
Evaluation # 2 (Correctness)• Evaluated whether our predicted class labels were “fair and
correct”
• Class label may not be the most accurate one, but may be correct– E.g. dbpedia:PopulatedPlace is not the most accurate, but still a
correct label for column of cities
• Three human judges evaluated our predicted class labels
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 23: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/23.jpg)
23
Evaluation # 2 (Correctness)
Person Place Organization Other0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
76.92%90.48%
66.67%58.33%
23.08%9.52%
33.33%41.67%
IncorrectCorrect
% o
f cor
rect
ly a
nd in
corr
ectly
pre
dict
ed cl
ass
labe
ls
Column – NationalityPrediction – MilitaryConflict
Column – Birth PlacePrediction – PopulatedPlace
Overall Accuracy: 76.92 %
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 24: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/24.jpg)
24
Accuracy for Entity Linking
Person Place Organization Other0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
83.05% 80.43%61.90%
29.22%
16.95% 19.57%38.10%
70.78%
IncorrectCorrect
Categories
% o
f cor
rect
and
inco
rrec
t ins
tanc
es li
nked
Overall Accuracy: 66.12 %
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 25: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/25.jpg)
25
Lessons Learnt
• Sequential System – Error percolated from one phase to the next
• Current system favors general classes over specific ones (MAP score = 0.411)
• Largely, a system driven by “heuristics”• Although we consider evidence, we don’t do
assignment jointly
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 26: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/26.jpg)
26
Joint Inference over evidence in a table
Probabilistic Graphical Models
Markov logic Networks
![Page 27: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/27.jpg)
27
A graphical model for tables
C1 C2 C3
R11
R12
R13
R21
R22
R23
R31
R32
R33
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 28: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/28.jpg)
28
Parameterized graphical model
C1 C2C3
𝝍𝟓
R11 R12 R13 R21 R22 R23 R31 R32 R33
𝝍𝟑 𝝍𝟑 𝝍𝟑
𝝍𝟒 𝝍𝟒 𝝍𝟒
Function that captures the affinity between the column headers and row values
Row value
Variable Node: Column header
Captures interaction between column headers
Captures interaction between row values
Factor Node
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 29: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/29.jpg)
Challenges - Abbreviations
• Other examples: • State Abbreviations• Stock Tickers• Airport Codes• Currency codes
• Preprocessing – parse and identify such columns
• Replace abbreviations with expanded forms
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 30: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/30.jpg)
Challenges - LiteralsPopulation
690,000
345,000
510,020
120,000
Age
75
65
50
25
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 31: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/31.jpg)
Conclusion• Presented a framework for inferring the semantics of
tables and generating Linked data
• Evaluation of the baseline system show feasibility in tackling the problem
• Work in progress for building framework grounded in graphical models and probabilistic reasoning
• Working on tackling challenges posed by tables from domains such as the medical and open government data
Introduction Related Work Baseline Results Joint Inference Conclusion
![Page 32: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/32.jpg)
32
References1. Cafarella, M. J.; Halevy, A. Y.; Wang, Z. D.; Wu, E.; and Zhang, Y. 2008.
Webtables:exploring the power of tables on the web. PVLDB 1(1):538–549
2. M. Hurst. Towards a theory of tables. IJDAR,8(2-3):123-131, 2006.
3. D. W. Embley, D. P. Lopresti, and G. Nagy. Notes on contemporary table recognition. In Document Analysis Systems, pages 164-175, 2006.
4. Wang, Jingjing, Shao, Bin, Wang, Haixun, and Zhu, Kenny Q. Understanding tables on the web. Technical report, Microsoft Research Asia, 2010.
5. Venetis Petros, Halevy Alon, Madhavan Jayant, Pasca Marius, Shen Warren, Wu Fei, Miao Gengxin, and Wu Chung. Recovering semantics of tables on the web. In Proc. of the 37th Int'l Conference on Very Large Databases (VLDB), 2011.
6. Limaye Girija, Sarawagi Sunita, and Chakrabarti Soumen. Annotating and searching web tables using entities, types and relationships. In Proc. of the 36th Int'l Conference on Very Large Databases (VLDB), 2010
![Page 33: Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore](https://reader036.vdocuments.net/reader036/viewer/2022062401/5a4d1b7a7f8b9ab0599b8d1a/html5/thumbnails/33.jpg)
Thank You ! Questions ?
@varish Web: http://goo.gl/NVu8N
33
“A little semantics goes a long way” ~ Jim Hendler