object orientation in xml dtd & schema

Post on 14-Jan-2016

48 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Object Orientation in XML DTD & Schema. Dunam Kim / Jongdae Han IDB Lab. / SE Lab. SNU CSE April 25, 2007. Contents. Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Object Orientation in XML DTD & Schema

Dunam Kim / Jongdae HanIDB Lab. / SE Lab.

SNU CSE

April 25, 2007

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion

Background

Needs were arisen to process complex and large text data

Certain kind of ‘language’ was requested to describe such complicate text

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion

Text Processing & Storage

Primitive : Long, simple sequence of string

DO primitive Lacks of semantic information Not machine intuitive

Advanced : Organized structure

Title<cr>This text is a sample.<eof>Title<cr>This text is a sample.<eof>

Title

This entity is title of the article.

It should be common string, with maximum length of 10.

This entity is title of the article.

It should be common string, with maximum length of 10.

Serialization(1)

Ruby, Smaltalk, Python, ObjC, Java, .NET process of saving an object onto a storage

medium or transmit it over network deflating or marshalling example of ObjC

Sender

GNU TypedStream 1D@îC¡

Receiver

received 1089356705

1089356705

Serialization(2)

Simple, non-structured Focuses on efficiency

Not applicable to long text document How can we find certain phrase in a 5MB

document?

Indexed Text(1)

Inspired by RDB Increased search speed Syntax resolution rather than semantic

oneHis clothing collapsed in a heap. She did not see it, seeing only the naked man who stood before the chair in which the

new President had sat.

Chapter 1. the beginning.

So tall was he that his head nearly brushed the ceiling; and so glorious

was he that one felt that the ceiling had risen so that his head would not

brush it.

His clothing collapsed in a heap. She did not see it, seeing only the naked man who stood before the chair in which the

new President had sat.

Chapter 1. the beginning.

So tall was he that his head nearly brushed the ceiling; and so glorious

was he that one felt that the ceiling had risen so that his head would not

brush it.

Go “chapter 1”Go “chapter 1”

Indexed Text(2)

Indexed Text(3)

No semantic information, again!

A Gazebo, 15 century, China

Picture taken by Kim, 2006 05 02

A Gazebo, 15 century, China

Picture taken by Kim, 2006 05 02

How can we index this?

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion

Markup language

The telephone in the study rang ten minutes before the

news came on. The new President picked it up and said

hello.

"Mister President?"

The telephone in the study rang ten minutes before the

news came on. The new President picked it up and said

hello.

"Mister President?"

Title of the Text is : Copperhead

Author of the Text is : Gene Wolfe

Title of the Text is : Copperhead

Author of the Text is : Gene Wolfe

Markup language

Copperhead

By Gene Wolfe

The telephone in the stud

Copperhead

By Gene Wolfe

The telephone in the stud

<b>Copperhead</b>

<I>Gene Wolfe</I>

<br>

The telephone in the stud

<b>Copperhead</b>

<I>Gene Wolfe</I>

<br>

The telephone in the stud<Title>Copperhead

<Author>Gene Wolfe

<Contents>

The telephone in the stud

<Title>Copperhead

<Author>Gene Wolfe

<Contents>

The telephone in the stud

Early History of Markup language

GenCode William W. Tunnicliffe, 1967 Gave rough sketch of the “Markup Language”

troff/nroff Typesetting tool for Unix, mid-1960

Tex Publishing standard, 1978

Scribe Charles Goldfarb, 1960’s

SGML

Standard Generalized Markup Language Distinct structure and presentation Separately had syntax for describing what

tags were allowed, and where Ancestor of the HTML Invented by Charles Goldfarb, 1970s

<QUOTE TYPE="example">

typically something like <ITALICS>this</ITALICS>

</QUOTE>

Cons of SGML

Standardized too late ISO 8879, 1986

Very complex, hard to learn Cumbersome, as a side-effect of flexibility

ex) Start-tag ( or end-tag, or both) sometimes optional -> why? to save keystroke

HTML

HyperText Markup Language Tim Berners-Lee, 1993 Procedural and Descriptive A profile of SGML

Simple, restricted format

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion

XML

Developed by the World Wide Web Consortium (1998)

Focusing a particular problem by simplifying SGML “The Internet Documents”

DTD also brought with XML 1.0 (1998) slightly different with SGML

XML Schema introduced (2001) W3C recommendation

Characteristics of XML

Derived from SGML All XML documents are also SGML document Availability of grammar-based validation (DTDs) Separation of contents and additional

information about the contents (elements and attributes)

Improvements in XML Eliminates complexity Improves internationalization Can be parsed in hierarchical structure

<?xml version="1.0" encoding="UTF-8"?> < 俄语>Данные</ 俄语 >

<?xml version="1.0" encoding="UTF-8"?> < 俄语>Данные</ 俄语 >

Structured use of XML(1)

XML documents can be parsed into hierarchical diagram tree-based Parsers following DOM, SAX

<?xml version="1.0" ?>

<Address>

<city>Seoul</city>

<street>Sejongro</street>

<number>145</number>

Structured use of XML(2)

Structured use of XML(3)

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion

Reason why schema is required

It is impossible to recognize structure of XML without metadata

An XML file can’t cover every possible form

book

book

title author

titlebook

title author author

book (title, author*)

book (title, author)

book (title)

book (title, author+)

Concept of XML Schema, DTD

XML Schema and DTD represent the structure of an XML

Main purpose is to validate XML

class object

DB schema DB instance

XML Schema, DTD

XML instance

DTD and XML Schema (1/6)

DTD (Document Type Definitions) Adopted with XML 1.0 proposal by W3C Unable to satisfy requirements for data

transfer

XML Schema Invented as alternative schema language by

W3C Requirement was released at Feb 1999 Adopted at May 2001

DTD and XML Schema (2/6)

DTD DTD constraints structure of XML data

What elements can occur What attributes can/must an element have What subelements can/must occur inside each

element, and how many times.

DTD does not constrain data types DTD syntax

<!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) >

DTD and XML Schema (3/6)

DTD (Cont.) Subelements can be specified as

names of elements, or #PCDATA (parsed character data), i.e., character strings EMPTY (no subelements) or ANY (anything can be a

subelement)

Subelement specification may have regular expressions <!ELEMENT library ( ( book | magazine | newspaper)+) > Notation:

“|” - alternatives “+” - 1 or more occurrences “*” - 0 or more occurrences

DTD and XML Schema (4/6) XML sample<?xml version = "1.0"?><address> <!--(street , city , zip)--> <street>Jongro</street> <city>Seoul</city> <zip>123456</zip></address>

<?xml version='1.0' encoding='UTF-8' ?><!ELEMENT address (street , city , zip)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT zip (#PCDATA)>

DTD sample

address

street city zip

address

(street, city, zip)

street city zip

#PCDATA #PCDATA #PCDATA

Jongro Seoul 123456

DTD and XML Schema (5/6)

XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs Typing of values

E.g. integer, string, etc Also, constraints on min/max values

User-defined, complex types Many more features, including

uniqueness and foreign key constraints, inheritance

XML Schema is itself specified in XML syntax, unlike DTDs

XML Scheme is integrated with namespaces

DTD and XML Schema (6/6)

XML Schema sample<?xml version='1.0' encoding='UTF-8' ?><xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:element name="address"> <xs:complexType> <xs:sequence> <xs:element name="street" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="zip" type="xs:int"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

OO Concepts in DTD, Schema (1/8)

Complex Data in DTD Element can have elements as child nodes Child elements can also have elements as child nodes

address

(street, city, zip)

street city zip

#PCDATA #PCDATA #PCDATA

name

(first, last)

first last

#PCDATA #PCDATA

(name, address)

contact

OO Concepts in DTD, Schema (2/8)

Complex Data in DTD (Cont.)

<?xml version='1.0' encoding='UTF-8' ?><!ELEMENT contact (name, address)>

<!ELEMENT name (first, last)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)>

<!ELEMENT address (street , city , zip)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT zip (#PCDATA)>

OO Concepts in DTD, Schema (3/8)

Complex Data in XML Schema Separation of element and complex type Sharing of one type with several elements

Named ComplexType

Unnamed ComplexType

OO Concepts in DTD, Schema (4/8) Complex Data in XML Schema (Cont.)

<?xml version='1.0' encoding='UTF-8' ?><xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:element name="contact"> <xs:complexType> <xs:sequence> <xs:element name="name" type="nameType" /> <xs:element name="address" type="addressType" /> </xs:sequence> </xs:complexType> </xs:element><xs:complexType name="addressType"> <xs:sequence> <xs:element name="street" type="xs:string" /> <xs:element name="city" type="xs:string" /> <xs:element name="zip" type="xs:int" /> </xs:sequence> </xs:complexType> <xs:complexType name="nameType"> <xs:sequence> <xs:element name="first" type="xs:string" /> <xs:element name="last" type="xs:string" /> </xs:sequence> </xs:complexType></xs:schema>

OO Concepts in DTD, Schema (5/8)

Inheritance in DTD DTD implements inheritance using parameter entity Parameter entity is similar to ‘#define’ statement in C/C++ Polymorphism is unavailable

<!-- define Address.extra as empty string --><!ENTITY % Address.extra “”>

<!--Address’s content = “city, street” + Address.extra --><!ELEMENT Address (city, street %Address.extra; )>

<!-- redefine – Address’s content = city, street, zip--><!ENTITY % Address.extra “, zip”>

address

(street, city, zip)

street city zip

address

(street, city)

street city

OO Concepts in DTD, Schema (6/8)

Inheritance in XML Schema XML Schema supports inheritance naturally Polymorphism is available with ‘substitution group’

feature Extension and restriction options are available

<xs:complexType name=“USA_addressType”> <xs:complexContent> <xs:extension base=“addressType”> <xs:sequence> <xs:element name=“zip” type=“xs:int” /> </xs:sequence> </xs:extension> </xs:complexContent></xs:complexType>

OO Concepts in DTD, Schema (7/8)

Object identity in DTD DTD implements object identity using ID, IDREF DTD shares one unique index for every ID in an XML Performance is poor for this one big unique index

<?xml version="1.0" ?> <books>

<book id=“b1” authorref=“a1” ><title>Database Concepts</title>

</book><book id=“b2” authorref=“a2” >

<title>Operating Systems</title></book><author id=“a1”>Korth</author><author id=“a2”>Ullman</author>

</book>

book

@id, @authorref (title)

id titleauthorref

author

@id (#PCDATA)

id #PCDATA

OO Concepts in DTD, Schema (8/8)

Object identity in XML Schema Key in XML Schema is designed to support key in RDB There can be various keys with different scopes in an

XML Several elements may build up one key

<?xml version="1.0" ?> <tables> <table1> <row id=“1” field1=“value1” field2=“value2”> <row id=“2” field1=“value1” field2=“value2”> </table1> <table2> <row id=“1” field1=“value1” field2=“value2”> </table2></tables>

DTD vs XML Schema

DTD XML Schema

data type

Can’t describe data types

Support data types (int, string, complex type) - Easy to express object in OOPL, data in RDB

language Special kind of language - ex) <!ELEMENT city (#PCDATA)>

A kind of XML - Enable to use DOM, XPATH, XSLT

inheritance

Partial support Support for inheritance and polymorphism

object identity

Unique attribute for selecting element

Support for key in RDB

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML

Schema Demonstration Conclusion

SOAP : Application of XML Schema

Platform-dependence

RPC (COM, CORBA)

IDL

Binary Data

CBD

Platform-independence

SOAP

WSDL

XML

SOA

SOA SOA has many advantages compared to CBD The benefits come from XML, XML Schema

SOAP: Application of XML Schema

SOAP (Simple Object Access Protocol) Remote procedure call protocol for exchanging object UDDI (registry of Web services) WSDL (Web Service Description Language)

web serviceconsumer

web serviceprovider

UDDIregistry

1. Build web service

2. Register web service

3. Discover web service

4. Get WSDL

5. Build proxy and client

6. Call Web service (SOAP)

SOAP: Application of XML Schema

WSDL (Web Service Description Language) WSDL specifies

names of methods names and data types of parameters data types of return values exceptions which can be thrown URL of Web service

Data types are defined using XML Schema platform-independent machine-understandable

SOAP: Application of XML Schema

Sample of WSDL XML

Schema

Conclusion

Recent CS advancement causes application to process large text

XML overcomes cons of previous object description languages

DTD has been introduced with XML to explain XML document

XML Schema enhanced XML with OO Paradigm

Reference

Jon Duckett et al, Professional XML Schemas, WROX, 2001

Elliotte Rusty Harold, Effective XML: 50 Specific Ways to Improve Your XML, Addison Wesley, 2003

Russ Basiura et al, Professional ASP.NET Web Services, WROX, 2001

W3C, Extensible Markup Language (XML) 1.0, W3C, 2006

Brett McLaughlin et al., Java and XML, O’Reilly, 2006

top related