xml on semantic web. outline the semantic web ontology xml probabilistic dtd references

30
XML on Semantic Web

Post on 21-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

XML on Semantic Web

Page 2: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Outline

The Semantic Web Ontology XML Probabilistic DTD References

Page 3: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

The Semantic Web (1/4)

The first generation Web The second generation Web: current Web The third generation Web: Semantic Web The conceptual structuring of the Web in an explicit

machine-readable way Requirements: Universal expressive

power、 Support for syntactic Interoperability、 Support for Semantic Interoperability

Page 4: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

The Semantic Web (2/4)

Syntactic interoperability talks about parsing the data, and semantic interoperability means to define mappings between unknown terms and known terms in the data

Semantic interoperability: requires standards syntactic form of document and semantic content

A further representation and inference layer is needed on top of the currently available layers of the WWW: Ontology

Page 5: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

The Semantic Web (3/4)

Page 6: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

The Semantic Web (4/4)

Page 7: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Ontology (1/5)

An explicit machine-readable specification of a shared conceptualization

Crucial role: representation of a shared conceptualization of a particular domain

reusable find pages that contain syntactically different but

semantically similar words Construct: concepts (which are usually organized

by taxonomies), relations, functions, axioms, instances

Page 8: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Ontology (2/5)

Page 9: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Ontology (3/5)

Concepts:– Be anything about which something is said– Also known as classes (XOL, RDF(s), OIL, DAML

+OIL), objects (OML), categories (SHOE) Taxonomies:

– used to organize ontological knowledge using generalization and specialization relationships through which simple and multiple inheritance could be applied

Page 10: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Ontology (4/5)

Relations and functions:– An interaction between concepts of the domain an

d attributes– Be called relations in SHOE、 OML, roles in OIL– Functions are a special kind of relation

Axioms:– Constraining information, verifying correctness, d

educting new information– Also known as assertions (OML), rule, logic

Page 11: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Ontology (5/5)

Instances:– Represent elements in th

e domain attached to a specific concept

Measurement of the expressiveness:

– XOL, RDF(s), SHOE, OML, OIL, DAML+OIL

Page 12: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

XML (1/7)

As a serialization syntax for other markup language, ex: SMIL、 XOL、 SHOE

As semantic markup of Web-pages As a uniform data-exchange format

Page 13: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

XML (2/7)

Universal expressive power: anything can be encoded in XML if a grammar can be defined for it

Syntactic interoperability: XML parser can parse any XML data and is usually a reusable component

Semantic interoperability: there is no way of recognizing a semantic unit from a particular domain of interest (not yet widely recognized)

Page 14: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

XML (3/7)

Page 15: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

XML (4/7)

Data exchange:– Build a model of the domain of interest– From the domain model a DTD or an XMLs is constructed

Advantage: reusability of the parsing software components

There exists multiple possibilities to encode a given domain model into a DTD, so the direct connection from the DTD to the domain model is lost and it cannot be easily reconstructed

Page 16: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

XML (5/7)

Page 17: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

XML (6/7)

A direct mapping based on the different DTDs is not possible

So we have to define the mappings between the different domain models, then between the different DTDs:

– Reengineering of the original Domain Model from the DTD or XML Schema

– Establishing mappings between the entities in the domain model

– Defining translation procedures for XML Documents Using a more suitable formalism than pure XML can

save much of the additional effort

Page 18: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

XML (7/7)

Page 19: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD(1/11)

Describes the most likely orderings of XML tags and that contains statistical properties for each tag

Utilize association rule discovery algorithm and sequence mining techniques

Page 20: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD (2/11)

Objectives: tagging all text documents and deriving an appropriate preliminary flat XML DTD– A knowledge discovery in textual databases

(KDT) process to build clusters of semantically similar text units and then new documents can be converted into XML documents

Page 21: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD (3/11)

UML schema: are initially conceived by experts serves as a reference for the DTD, but there is no guarantee that the final DTD will be contained in or contain this schema

KDT process:– Tagging initial text documents– Domain knowledge constitutes such as thesaurus、 preliminary

UML schema, input to process– Pre-processing– Iterative clustering– Post-processing– Establishing a probabilistic DTD

Page 22: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD (4/11)

Page 23: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD (5/11)

Pre-processing:– Setting the level of granularity– NLP processing such as

tokenization、 normalization、 word stemming– Building text unit descriptors—a reduced feature

space(now are chosen by engineer)– Mapping all text units into Boolean vectors of this

feature space– Extract named entity

Page 24: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD (6/11)

Clustering:– Performed in multiple iterations, each iteration

outputs a set of clusters– All text unit vectors are clustered– Partition clusters into “acceptable” and

“unacceptable” according to quality criteria– Members of “unacceptable” are input data to the

next iteration

Page 25: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD (7/11)

Post-processing:– “acceptable” clusters are semi-automatically

assigned a label– Ultimately, cluster labels are determined by the

engineer– All default cluster labels are derived from text unit

descriptors– Automatically derived XML DTD from XML tags

Page 26: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD (8/11)

Page 27: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD (9/11)

Establishing a probabilistic DTD:– Deriving the most likely ordering of the tags– Computing the statistically properties of each tag

inside the document type definition

Deriving the ordering of the tags– Backward Construction of DTD Sequences:

builds “maximal” sequences– Forward sequence construction

Page 28: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD (10/11)

Backward Construction of DTD Sequences– Starts with an arbitrary tag ح and then identifies the tag most lik

ely to appear before it– If no such tag exists, then shifts to the next sequence. If there is

one, then the next iteration starts. If there are k tags, then duplicates k incomplete sequences.

– Each tag Xi leading to ح with a confidence Ci

– If there is a Ci larger than the others, then Xi is the predecessor of ح in the sequence

– If C0 where is the confidence where ح has no predecessor is largest, then ح is the first element

– Confidence is the tag’s TagSupport multiplied by the accuracy

Page 29: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

Probabilistic DTD (11/11)

Page 30: XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References

References

The Semantic Web—on the respective Roles of XML and RDF

– Stefan Decker, Frank van Harmelen, Jeen Broekstra, Michael Erdmann, Dieter Fensel, Ian Horrocks, Michel Klein, Sergey Melnik

Intelligent Information Agent with Ontology on the Semantic Web

– Weihua Li

Ontology Languages for the Semantic Web– Asuncion Gomez-Perez, Oscar Corcho

Extraction of Semantic XML DTDs from Texts Using Data Mining Techniques

– Karsten Winkler, Myra Spiliopoulou