annotation for the semantic web yihong ding a phd research area background study
Post on 19-Dec-2015
215 views
TRANSCRIPT
Annotation for the Semantic Web
Yihong Ding
A PhD Research Area Background Study
2
Introduction
• Current web is designed for humans• Semantic web (next-generation web) is
designed for both humans and machines
• Semantic annotation– Disclose semantic meanings of web content– Convert current HTML web pages to machine-
understandable semantic web pages
3
Outline
• Historical Review
• Current Status
• Related Research Fields
• Future Challenges
4
Semantic Annotation in Ancient Ages
• No evidence when humans started to annotate text
about 350 BC
history of semantic annotation ≈ history of ontologies
5
The First Dream of Modern Semantic Annotation
• July 1945, Vannevar Bush, As We May Think, The Atlantic Monthly
• Bush's dream device– humans could acquire
information (World Wide Web)– humans could contribute their
own ideas (Web Annotation)
from/to the community
6
Web Annotation before 1999
[Heck et. al., 1999]
• Developing better user interfaces
• Improving storage structures
• Increasing annotation sharability
• Example systems: ComMentor, AnnotatorTM, Third Voice, CritLink, CoNote, and Futplex
7
Semantic Labeling before 1999
• Dublin Core Metadata Standard [http://dublincore.org/]
– 15 element sets encapsulate data
• Superimposed
Information [Delcambre et. al., 2001]
marks
Superimposed Layer
Base Layer
Information Source1
Information Source2
Information Sourcen
…
– Title
– Subject
– Description
– Creator
– Publisher
– Contributor
– Date
– Type– Format– Identifier– Source– Language– Relation– Coverage– Rights
8
Status of Current Web Semantic Annotation Studies
• Interactive annotation
• Automatic annotation
9
Interactive Annotation Systems
• Lets humans interact through machine interfaces to annotate documents
• Problems– Inconsistency– Error-proneness– Lack of scalability
• Values – Easy to implement– Suitable for small-scale tasks and experiments – Helpful to build corpora for evaluations
10
Interactive Annotation Systems
• Annotea [Kahan et. al., 2001] – W3C project
– An open RDF infrastructure for shared web annotations
• SHOE (Simple HTML Ontology Extensions) [Heflin et. al., 2000]
– University of Maryland, College Park – Manual annotator using SHOE ontologies
11
Automatic Annotation Systems
• Common feature: use of ontologies
• Typical approaches– Annotation with automatic ontology
generation (1 system)– Annotation with automatic information
extraction (6 systems)
12
Annotation with Ontology Generation
• SCORE (Semantic Content Organization and Retrieval Engine) [Sheth et. al., 2002]
• Voquette (now acquired by Semagix Co.), University of Georgia
13
Annotation with Automatic IE
• Ont-O-Mat [Handschuh et. al., 2002]
– University of Karlsruhe at Germany• MnM [Vargas-Vera et. al., 2002]
– Open University of United Kingdom
• Common features– DAML+OIL ontologies– Supervised adaptive learning with Lazy-NLP (Amilcare)– Annotation stored inside web pages
• Differences– MnM allows multiple ontologies at one time– MnM also stores annotations in a knowledge base– Ont-O-Mat uses OntoBroker both as an annotation server and
as a reasoning engine
14
Annotation with Automatic IE
• KIM Platform [Kiryakov et. al., 2004]
– Ontotext Lab., Sirma Group, a Canadian-Bulgarian joint venture• SemTag [Dill et. al., 2003]
– IBM Almaden Research Center
• Similar features– Use one special designed upper-level ontology, KIM ontology vs.
TAP ontology
• Specific features– KIM uses an NLP tool (GATE) to extract information – KIM stores annotations in a separate file– SemTag uses inductive learning to extract information– SemTag annotates 264 million Web pages and generate
approximately 434 million semantic tags
15
Annotation with Automatic IE
• Stony Brook Annotator [Mukherjee et. al., 2003]
– Stony Brook University– Structural analysis of DOM tree for HTML pages – Drawbacks
• Taxonomic relationships only• No generic labeling algorithm disclosed
• RoadRunner Labeller [Arlotta et. al., 2003]
– Università di Roma Tre and Università della Basilicata– Automatic assign label names based on image recognition– Drawbacks
• Semantic meaning of labels unknown• Difficulty in associating labels with ontologies
16
Related Research Fields
• Semantic Web
• Information extraction
• Ontology related topics
• Conceptual modeling
• Logic languages
• Web services
17
Semantic Web
• Weaving the Web [Berners-Lee 1999], birth of the Semantic Web
• The Semantic Web [Berners-Lee et. al., 2001]
18
Information Extraction [Laender et. al., 2002]
1. Human-guided approaches• Wrapper languages, Modeling-based tools • No annotation examples• Too heavily human involvement
2. Non-ontology-based approaches• HTML-aware tools: StonyBrook tool [Mukherjee et. al., 2003],
RoadRunner Labeller [Arlotta et. al., 2003]
• NLP-based tools: Ont-O-Mat [Handschuh et. al., 2002], MnM [Vargas-Vera et. al., 2002], KIM platform [Kiryakov et. al., 2004]
1. ILP-based tools: SemTag [Dill et. al., 2003]
2. Require extra alignment between extraction categories in wrappers and concepts in ontologies
• Ontology-based Approaches• Ontology-based tools: my proposal • Not require alignment, resilient to web page layouts• Slow in execution time
19
Ontology Related Topics
• Ontology languages [W3C, OWL]
– Knowledge representation and reasoning
• Ontology generation [Ding et. al., 2002a] – Annotation domain specification
• Ontology enrichment [Parekh et. al., 2004] – Annotation domain specification expanding
• Ontology population [Alani et. al., 2003] – Annotation result output
• Ontology mapping and merging [Ding et. al., 2002b] – Large-scale annotation requires large-scale ontologies– Small-scale ontologies are less expensive to build– Ontology mapping creates the links among small-scale ontologies – Ontology merging fuses small-scale ontologies into a large-scale
ontology
20
Conceptual Modeling
• Annotation requires knowledge modeling• Ontology is a type of conceptual modeling
• ER Model [Chen 1976] – The most influential conceptual model– Influence OSM model, basis of data-extraction
ontology
21
Logic Languages
• Logic foundation provides reasoning and inference power for modeling languages
• Examples– First-order logic [Smullyan 1995] – Description logics [Brachman et. al., 1984]
22
Web Services
• More and more, web services become the typical application in semantic web scenario.
• Two ways aligning web services with semantic annotation– Web service annotation [Brodie 2003] – Semantic annotation web service
23
Summary and Future Challenges
• Annotation for the semantic web – Enable machine-understandable web– Support semantic searching– Support global-wide web services– Still an unsolved problem
• Main technical challenges– Direct ontology-driven annotation mechanism– Concept disambiguation – Automatic domain ontology generation– Scalability