towards efficient and effective semantic table interpretation
DESCRIPTION
Presentation given by Ziqi Zhang at ISWC2014 on "Towards Efficient and Effective Semantic Table Interpretation"TRANSCRIPT
![Page 1: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/1.jpg)
Towards Efficient and Effective Semantic Table Interpretation
Ziqi Zhang Department of Computer Science, University of Sheffield
![Page 2: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/2.jpg)
Outline
• Define semantic table interpretation
• State-of-the-art and motivation
• The method – TableMiner
• Evaluation
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
![Page 3: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/3.jpg)
Semantic Table Interpretation
• Input
• Ontology
• Relational table
• Goals/Tasks
• Label columns by concepts
• Link cells to named entities
• Connect columns by
relations
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Thing Work
Artist
Location
… …
Ent:USA
Ent:UK
… Film
Actor/ Actress
Country
Name Film Country
1 Tom Hanks Philadelphia USA
2 Jamie Foxx Ray USA
3 Kate Winslet The Reader UK
99 Charlize Theron
Monster South Africa
Table of Best Actor/Actress
< … … >
… … Rel:performIn
Rel:performIn
![Page 4: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/4.jpg)
Semantic Table Interpretation
• Input
• Ontology
• Relational table
• Goals/Tasks
• Label columns by concepts
• Link cells to named entities
• Connect columns by
relations
Column classification/ header
disambiguation
Relation interpretation
Cell disambiguation
![Page 5: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/5.jpg)
Motivation and State-of-the-art
• 154 mil. relational tables on the Web and growing [Cafarella2008]
• Classic Information Extraction methods do not work [Limaye2010, Lu2013]
• They cannot model the complex interdependence among table components
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
![Page 6: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/6.jpg)
Motivation and State-of-the-art
• SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013]
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Limitation 1 Inference is ‘exhaustive’, but unnecessary
Name Film Country
1 Tom Hanks Philadelphia USA
2 Jamie Foxx Ray USA
3 Kate Winslet The Reader UK
99 Charlize Theron
Monster South Africa
Table of Best Actor/Actress
< … … >
Goal: Assign a concept to this column
Hint: Content in the column gives useful clues
How much do we need for inference (99 rows in this example)?
- Human: SOME (learn by example)
- SoA: ALL
![Page 7: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/7.jpg)
Motivation and State-of-the-art
• SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013]
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Limitation 2 Contextual features for inference
Table of Best Actor/Actress
SoA: features only from within the table
Context outside the table also makes hint for interpretation. E.g., the words in the paragraph are often found in descriptions of actors
![Page 8: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/8.jpg)
TableMiner
![Page 9: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/9.jpg)
TableMiner
• Two tasks:
• Column classification
• Cell disambiguation
• Non-exhaustive inference in a bootstrapping pattern
• phase 1 – inference with partial content
• phase 2 – propagation and update
• Contextual features both inside and outside tables
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
![Page 10: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/10.jpg)
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
![Page 11: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/11.jpg)
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Itr.1
….
(until stop)
Ei,j= {<e1,s1>, <e2,s2>, …}
![Page 12: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/12.jpg)
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Itr.1
….
(until stop)
Ei,j= {<e1,s1>, <e2,s2>, …}
concepts = {<c1,s1>, <c2,s2>, …}
Cj= {<c1,s1’>, <c2,s2‘>}
![Page 13: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/13.jpg)
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Itr.1
….
(until stop)
Ei,j= {<e1,s1>, <e2,s2>, …}
concepts = {<c1,s1>, <c2,s2>, …}
Cj= {<c1,s1’>, <c2,s2‘>}
|H(Cj) – H(prevCj)|<t? Yes – stop
No – next itr.
![Page 14: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/14.jpg)
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
….
(until stop)
concepts = {<c1,s1>, <c3,s3>, …}
Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>}
Ei,j= {<e1,s1>, <e2,s2>, …}
Itr.2
|H(Cj) – H(prevCj)|<t? Yes – stop
No – next itr.
![Page 15: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/15.jpg)
TableMiner – Phase 1 I-Inf
• Incremental inference with stopping (I-Inf)
Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
….
(until stop)
Itr.3 Ei,j= {<e1,s1>, <e2,s2>, …}
concepts = {<c11,s11>}
Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>}
|H(Cj) – H(prevCj)|<t? Yes – stop
No – next itr.
![Page 16: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/16.jpg)
TableMiner – Phase 1 I-Inf
• To compute scores of candidate named entities (e.g.
<e1,s1>) and concepts (e.g., <c1,s1’>)
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
• Candidate NE
• Build a feature vector of a candidate using the ontology
• Build a feature vector of the cell/column header using its context
• Compute vector similarity
• Candidate concept: same principle, but also depends on score of contributing NEs
![Page 17: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/17.jpg)
TableMiner – Phase 2 Propagate, Update
• When I-Inf stops
• Select the highest scoring candidate concept c+ to label the column
• Propagate: use c+ as constrain to disambiguate remaining cells – candidate NEs not belonging to c+ are discarded
• Update:
• Re-compute c+ after all cells are disambiguated
• If the new c+ is different, revise disambiguation across the entire column with it as new constraint
• Repeat until no change
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>}
c+ Rank and select
Use as constraint to disamb-iguate cells
![Page 18: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/18.jpg)
Evaluation
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
![Page 19: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/19.jpg)
TableMiner – Evaluation
• Data
• Freebase as reference ontology/background knowledge
• Limaye112 – 112 Web tables from Limaye2010 originally annotated with Wikipedia
• Cells are automatically mapped to Freebase – some are unmapped
• Columns are manually annotated
• IMDB – 7,354 “cast” tables of films mapped to Freebase
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
![Page 20: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/20.jpg)
TableMiner – Evaluation
• Baselines (both uses exhaustive inference)
• Bfirst - cell disambiguation: choose the top ranked NE candidate in the Freebase search result
- column classification: each disambiguated cell casts a vote to the set of concepts the NEs belong to, and the majority wins
• Bsim - cell disambiguation: string similarity + feature vector similarity (in-table context only)
- column classification: the majority vote method as above + string similarity
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
![Page 21: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/21.jpg)
TableMiner – Evaluation Results
• Cell disambiguation
Manual validation of 932 cell annotations in Limaye112
not covered by the above results (i.e., unmapped cells)
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
If only consider those cells
where at least one system
predicts correctly
![Page 22: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/22.jpg)
TableMiner – Evaluation Results
• Column classification
best only – a column is labelled correctly only if the concept
is suitable for the data in the column and is specific enough
best or ok – a column is labelled correctly if the concept is
suitable for the data in the column, though not very specific
(E.g., ‘Film Actors’ may be the best, while ‘Artist’ or
‘Person’ is OK, but ‘Engineer’ is incorrect)
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
![Page 23: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/23.jpg)
TableMiner – Evaluation Results
• Efficiency – TableMiner is efficient because
• Column classification: processes partial content from a column (avg. 57% Limaye112, 43% IMDB)
• Cell disambiguation: constrained by column classification, resulting in smaller NE candidate space (avg. 32% reduction Limaye32, 24% IMDB)
• Fewer candidates => less time spent on retrieval and feature space creation (typically >90% of CPU in the pipeline, Limaye2010)
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
![Page 24: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/24.jpg)
TableMiner – Conclusion
• TableMiner take-home messages
• How can it be more effective?
• Use both context within and outside tables as features for inference
• How can it be more efficient?
• Perform inference with partial data and follow the boot-strapping pattern of learning
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Message 1
Message 2
![Page 25: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/25.jpg)
References
• [Cafarella2008] Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y. 2008: Webtables: exploring the power of tables on the web. Proceedings of VLDB Endowment 1(1), 538–549
• [Limaye2010] Limaye, G., Sarawagi, S., Chakrabarti, S. 2010: Annotating and searching web tables using entities, types and relationships. Proceedings of the VLDB Endowment 3(1-2), 1338–134
• [Lu2013] Lu, C., Bing, L., Lam, W., Chan, K., Gu, Y. 2013: Web entity detection for semi-structured text data records with unlabeled data. International Journal of Computational Linguistics and Applications
• [Mulwad2013] Mulwad, V., Finin, T., Joshi, A. 2013: Semantic message passing for generating linked data from tables. In: International Semantic Web Conference (1). pp. 363–378. Lecture Notes in Computer Science, Springer
• [Venetis2011] Venetis, P., Halevy, A., Madhavan, J., Pas ca, M., Shen,W.,Wu, F., Miao, G.,Wu, C. 2011: Recovering semantics of tables on the web. Proceedings of VLDB Endowment 4(9), 528–538
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
![Page 26: Towards Efficient and Effective Semantic Table Interpretation](https://reader033.vdocuments.net/reader033/viewer/2022051016/559485ee1a28ab56198b47f9/html5/thumbnails/26.jpg)
Thank you
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation