annotation for the semantic web yihong ding a phd research area background study

23
Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

Annotation for the Semantic Web

Yihong Ding

A PhD Research Area Background Study

Page 2: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

2

Introduction

• Current web is designed for humans• Semantic web (next-generation web) is

designed for both humans and machines

• Semantic annotation– Disclose semantic meanings of web content– Convert current HTML web pages to machine-

understandable semantic web pages

Page 3: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

3

Outline

• Historical Review

• Current Status

• Related Research Fields

• Future Challenges

Page 4: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

4

Semantic Annotation in Ancient Ages

• No evidence when humans started to annotate text

about 350 BC

history of semantic annotation ≈ history of ontologies

Page 5: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

5

The First Dream of Modern Semantic Annotation

• July 1945, Vannevar Bush, As We May Think, The Atlantic Monthly

• Bush's dream device– humans could acquire

information (World Wide Web)– humans could contribute their

own ideas (Web Annotation)

from/to the community

Page 6: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

6

Web Annotation before 1999

[Heck et. al., 1999]

• Developing better user interfaces

• Improving storage structures

• Increasing annotation sharability

• Example systems: ComMentor, AnnotatorTM, Third Voice, CritLink, CoNote, and Futplex

Page 7: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

7

Semantic Labeling before 1999

• Dublin Core Metadata Standard [http://dublincore.org/]

– 15 element sets encapsulate data

• Superimposed

Information [Delcambre et. al., 2001]

marks

Superimposed Layer

Base Layer

Information Source1

Information Source2

Information Sourcen

– Title

– Subject

– Description

– Creator

– Publisher

– Contributor

– Date

– Type– Format– Identifier– Source– Language– Relation– Coverage– Rights

Page 8: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

8

Status of Current Web Semantic Annotation Studies

• Interactive annotation

• Automatic annotation

Page 9: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

9

Interactive Annotation Systems

• Lets humans interact through machine interfaces to annotate documents

• Problems– Inconsistency– Error-proneness– Lack of scalability

• Values – Easy to implement– Suitable for small-scale tasks and experiments – Helpful to build corpora for evaluations

Page 10: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

10

Interactive Annotation Systems

• Annotea [Kahan et. al., 2001] – W3C project

– An open RDF infrastructure for shared web annotations

• SHOE (Simple HTML Ontology Extensions) [Heflin et. al., 2000]

– University of Maryland, College Park – Manual annotator using SHOE ontologies

Page 11: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

11

Automatic Annotation Systems

• Common feature: use of ontologies

• Typical approaches– Annotation with automatic ontology

generation (1 system)– Annotation with automatic information

extraction (6 systems)

Page 12: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

12

Annotation with Ontology Generation

• SCORE (Semantic Content Organization and Retrieval Engine) [Sheth et. al., 2002]

• Voquette (now acquired by Semagix Co.), University of Georgia

Page 13: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

13

Annotation with Automatic IE

• Ont-O-Mat [Handschuh et. al., 2002]

– University of Karlsruhe at Germany• MnM [Vargas-Vera et. al., 2002]

– Open University of United Kingdom

• Common features– DAML+OIL ontologies– Supervised adaptive learning with Lazy-NLP (Amilcare)– Annotation stored inside web pages

• Differences– MnM allows multiple ontologies at one time– MnM also stores annotations in a knowledge base– Ont-O-Mat uses OntoBroker both as an annotation server and

as a reasoning engine

Page 14: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

14

Annotation with Automatic IE

• KIM Platform [Kiryakov et. al., 2004]

– Ontotext Lab., Sirma Group, a Canadian-Bulgarian joint venture• SemTag [Dill et. al., 2003]

– IBM Almaden Research Center

• Similar features– Use one special designed upper-level ontology, KIM ontology vs.

TAP ontology

• Specific features– KIM uses an NLP tool (GATE) to extract information – KIM stores annotations in a separate file– SemTag uses inductive learning to extract information– SemTag annotates 264 million Web pages and generate

approximately 434 million semantic tags

Page 15: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

15

Annotation with Automatic IE

• Stony Brook Annotator [Mukherjee et. al., 2003]

– Stony Brook University– Structural analysis of DOM tree for HTML pages – Drawbacks

• Taxonomic relationships only• No generic labeling algorithm disclosed

• RoadRunner Labeller [Arlotta et. al., 2003]

– Università di Roma Tre and Università della Basilicata– Automatic assign label names based on image recognition– Drawbacks

• Semantic meaning of labels unknown• Difficulty in associating labels with ontologies

Page 16: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

16

Related Research Fields

• Semantic Web

• Information extraction

• Ontology related topics

• Conceptual modeling

• Logic languages

• Web services

Page 17: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

17

Semantic Web

• Weaving the Web [Berners-Lee 1999], birth of the Semantic Web

• The Semantic Web [Berners-Lee et. al., 2001]

Page 18: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

18

Information Extraction [Laender et. al., 2002]

1. Human-guided approaches• Wrapper languages, Modeling-based tools • No annotation examples• Too heavily human involvement

2. Non-ontology-based approaches• HTML-aware tools: StonyBrook tool [Mukherjee et. al., 2003],

RoadRunner Labeller [Arlotta et. al., 2003]

• NLP-based tools: Ont-O-Mat [Handschuh et. al., 2002], MnM [Vargas-Vera et. al., 2002], KIM platform [Kiryakov et. al., 2004]

1. ILP-based tools: SemTag [Dill et. al., 2003]

2. Require extra alignment between extraction categories in wrappers and concepts in ontologies

• Ontology-based Approaches• Ontology-based tools: my proposal • Not require alignment, resilient to web page layouts• Slow in execution time

Page 19: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

19

Ontology Related Topics

• Ontology languages [W3C, OWL]

– Knowledge representation and reasoning

• Ontology generation [Ding et. al., 2002a] – Annotation domain specification

• Ontology enrichment [Parekh et. al., 2004] – Annotation domain specification expanding

• Ontology population [Alani et. al., 2003] – Annotation result output

• Ontology mapping and merging [Ding et. al., 2002b] – Large-scale annotation requires large-scale ontologies– Small-scale ontologies are less expensive to build– Ontology mapping creates the links among small-scale ontologies – Ontology merging fuses small-scale ontologies into a large-scale

ontology

Page 20: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

20

Conceptual Modeling

• Annotation requires knowledge modeling• Ontology is a type of conceptual modeling

• ER Model [Chen 1976] – The most influential conceptual model– Influence OSM model, basis of data-extraction

ontology

Page 21: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

21

Logic Languages

• Logic foundation provides reasoning and inference power for modeling languages

• Examples– First-order logic [Smullyan 1995] – Description logics [Brachman et. al., 1984]

Page 22: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

22

Web Services

• More and more, web services become the typical application in semantic web scenario.

• Two ways aligning web services with semantic annotation– Web service annotation [Brodie 2003] – Semantic annotation web service

Page 23: Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study

23

Summary and Future Challenges

• Annotation for the semantic web – Enable machine-understandable web– Support semantic searching– Support global-wide web services– Still an unsolved problem

• Main technical challenges– Direct ontology-driven annotation mechanism– Concept disambiguation – Automatic domain ontology generation– Scalability