toward semantic web information extraction b. popov, a. kiryakov, d. manov, a. kirilov, d....
Post on 22-Dec-2015
216 views
TRANSCRIPT
![Page 1: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/1.jpg)
Toward Semantic Web Information Extraction
B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov
Presenter: Yihong Ding
![Page 2: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/2.jpg)
Toward a Semantic Web
Fully automatic methods for the semantic annotation are needed
Related topics Information retrieval (IR) Information extraction (IE) Name-entity recognition (NER) Annotation processes
![Page 3: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/3.jpg)
Semantic Annotation Diagram
![Page 4: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/4.jpg)
Name Entities
Named Entities (NE) people, organizations, locations, and others referred by
name.
May also include scalars and expressions numbers, amounts of money, dates, etc. (NUMEX,
TIMEX)
Hypothesis Named entities (and the relations between them)
mentioned in a resource constitute an important part of its semantics
![Page 5: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/5.jpg)
Semantic Annotation of NEs
Semantic Annotation of the NEs in a text includes: Recognition of the type of the entities in the text Identification of the entity individual
Comparison the traditional NER approach results in:
<Person>Yihong Ding</Person> the Semantic Annotation of NEs should result in something
like the following: <BYUPerson ID=“http://..byu../YihongDing”>Yihong
Ding</BYUPerson>
![Page 6: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/6.jpg)
The KIM Platform
The Knowledge and Information Management Platform provides: Automatic Semantic Annotation of NEs (and
relations between them) Ontology Population with NE individuals and
relations Indexing and Retrieval w.r.t NEs Query and Navigation over the Formal Knowledge
![Page 7: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/7.jpg)
KIM Constituents
KIM Ontology (KIMO)
KIM World KB
KIM Server – with API for remote access and integration
Front-ends: KIM Web UI, Plug-in for Internet Explorer, and KB Explorer
![Page 8: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/8.jpg)
KIM Bases
KIM is based on the following open-source platforms: GATE – NLP and IE platform in
University of Sheffield Sesame – RDF(S) repository
Administrator b.v. Ontology Middleware and
Custom Inference by Ontotext as extensions of Sesame
Lucene – open source IR-engine from Apache
![Page 9: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/9.jpg)
KIM Architecture
![Page 10: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/10.jpg)
KIM Ontology (KIMO)
Light-weight upper-level ontology 250 NE classes 100 relations and
attributes: covers mostly NE
classes, and ignores general concepts
includes classes representing lexical resources
www.ontotext.com/KIM/kimo.rdfs
![Page 11: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/11.jpg)
KIM World KB
A projection of the world (domain ontology) Quasi-exhaustive coverage of the most popular
entities in the world Entities of general importance – like the ones
that appear in the news At present KIM KB consists of about
200,000 entities: 50,000 locations, 130,000 organizations, 6000
people, etc.
![Page 12: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/12.jpg)
Entity Description
NEs are represented in KIM World KB with their Semantic Descriptions consisting of… Aliases (Florida & FL) Relations with other
entities (Person hasPosition Position)
Attributes (latitude & longitude of geographic entities)
Proper class of the NE
![Page 13: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/13.jpg)
KIM Server
APIs for: Semantic Annotation Document Persistence Indexing & Retrieval of documents w.r.t NEs Semantic Repository Access & Exploration
![Page 14: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/14.jpg)
KIM Semantic Information Extraction
Based on GATE NLP IE platform Rules now based on ontology classes instead of a flat
set of NE types
Recognition and Identification of the NEs IE supported by a Semantic Repository
Containing lexical and gazetteer resources Annotations referring to Entity Descriptions
Ontology Population with the newly recognized entities & relations
![Page 15: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/15.jpg)
KIM IE Pipeline
![Page 16: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/16.jpg)
KIM Plug-in
![Page 17: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/17.jpg)
KIM IE Performance
Evaluated over 3 human-annotated corpora of news articles: International Business News, International
Political News, and UK Political News (~500 articles):
Precision 86%, Recall 84% w.r.t the standard NE types
But these metrics are not representative for semantic annotation
![Page 18: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/18.jpg)
Semantic Annotation Metrics
There are no established metrics for semantic annotation: No human-annotated corpora with precise class
and instance information No metrics for various partial matches
When a more specific class is recognized When a more general class is recognized When the class is correctly recognized, but the
individual entity is not correctly identified.
![Page 19: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding](https://reader030.vdocuments.net/reader030/viewer/2022032523/56649d775503460f94a59cef/html5/thumbnails/19.jpg)
Conclusion
It is possible to adopt traditional IE techniques for semantic annotation
It is worth using almost-exhaustive entity knowledge for IE
KIM is still under development Proper evaluation metrics Precise disambiguation More advanced IE techniques KIM ontology and KB development