object orientation in xml dtd & schema

48
Object Orientation in XML DTD & Schema Dunam Kim / Jongdae Han IDB Lab. / SE Lab. SNU CSE April 25, 2007

Upload: lita

Post on 14-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Object Orientation in XML DTD & Schema. Dunam Kim / Jongdae Han IDB Lab. / SE Lab. SNU CSE April 25, 2007. Contents. Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Object Orientation  in XML DTD & Schema

Object Orientation in XML DTD & Schema

Dunam Kim / Jongdae HanIDB Lab. / SE Lab.

SNU CSE

April 25, 2007

Page 2: Object Orientation  in XML DTD & Schema

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion

Page 3: Object Orientation  in XML DTD & Schema

Background

Needs were arisen to process complex and large text data

Certain kind of ‘language’ was requested to describe such complicate text

Page 4: Object Orientation  in XML DTD & Schema

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion

Page 5: Object Orientation  in XML DTD & Schema

Text Processing & Storage

Primitive : Long, simple sequence of string

DO primitive Lacks of semantic information Not machine intuitive

Advanced : Organized structure

Title<cr>This text is a sample.<eof>Title<cr>This text is a sample.<eof>

Title

This entity is title of the article.

It should be common string, with maximum length of 10.

This entity is title of the article.

It should be common string, with maximum length of 10.

Page 6: Object Orientation  in XML DTD & Schema

Serialization(1)

Ruby, Smaltalk, Python, ObjC, Java, .NET process of saving an object onto a storage

medium or transmit it over network deflating or marshalling example of ObjC

Sender

GNU TypedStream 1D@îC¡

Receiver

received 1089356705

1089356705

Page 7: Object Orientation  in XML DTD & Schema

Serialization(2)

Simple, non-structured Focuses on efficiency

Not applicable to long text document How can we find certain phrase in a 5MB

document?

Page 8: Object Orientation  in XML DTD & Schema

Indexed Text(1)

Inspired by RDB Increased search speed Syntax resolution rather than semantic

oneHis clothing collapsed in a heap. She did not see it, seeing only the naked man who stood before the chair in which the

new President had sat.

Chapter 1. the beginning.

So tall was he that his head nearly brushed the ceiling; and so glorious

was he that one felt that the ceiling had risen so that his head would not

brush it.

His clothing collapsed in a heap. She did not see it, seeing only the naked man who stood before the chair in which the

new President had sat.

Chapter 1. the beginning.

So tall was he that his head nearly brushed the ceiling; and so glorious

was he that one felt that the ceiling had risen so that his head would not

brush it.

Go “chapter 1”Go “chapter 1”

Page 9: Object Orientation  in XML DTD & Schema

Indexed Text(2)

Page 10: Object Orientation  in XML DTD & Schema

Indexed Text(3)

No semantic information, again!

A Gazebo, 15 century, China

Picture taken by Kim, 2006 05 02

A Gazebo, 15 century, China

Picture taken by Kim, 2006 05 02

How can we index this?

Page 11: Object Orientation  in XML DTD & Schema

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion

Page 12: Object Orientation  in XML DTD & Schema

Markup language

The telephone in the study rang ten minutes before the

news came on. The new President picked it up and said

hello.

"Mister President?"

The telephone in the study rang ten minutes before the

news came on. The new President picked it up and said

hello.

"Mister President?"

Title of the Text is : Copperhead

Author of the Text is : Gene Wolfe

Title of the Text is : Copperhead

Author of the Text is : Gene Wolfe

Page 13: Object Orientation  in XML DTD & Schema

Markup language

Copperhead

By Gene Wolfe

The telephone in the stud

Copperhead

By Gene Wolfe

The telephone in the stud

<b>Copperhead</b>

<I>Gene Wolfe</I>

<br>

The telephone in the stud

<b>Copperhead</b>

<I>Gene Wolfe</I>

<br>

The telephone in the stud<Title>Copperhead

<Author>Gene Wolfe

<Contents>

The telephone in the stud

<Title>Copperhead

<Author>Gene Wolfe

<Contents>

The telephone in the stud

Page 14: Object Orientation  in XML DTD & Schema

Early History of Markup language

GenCode William W. Tunnicliffe, 1967 Gave rough sketch of the “Markup Language”

troff/nroff Typesetting tool for Unix, mid-1960

Tex Publishing standard, 1978

Scribe Charles Goldfarb, 1960’s

Page 15: Object Orientation  in XML DTD & Schema

SGML

Standard Generalized Markup Language Distinct structure and presentation Separately had syntax for describing what

tags were allowed, and where Ancestor of the HTML Invented by Charles Goldfarb, 1970s

<QUOTE TYPE="example">

typically something like <ITALICS>this</ITALICS>

</QUOTE>

Page 16: Object Orientation  in XML DTD & Schema

Cons of SGML

Standardized too late ISO 8879, 1986

Very complex, hard to learn Cumbersome, as a side-effect of flexibility

ex) Start-tag ( or end-tag, or both) sometimes optional -> why? to save keystroke

Page 17: Object Orientation  in XML DTD & Schema

HTML

HyperText Markup Language Tim Berners-Lee, 1993 Procedural and Descriptive A profile of SGML

Simple, restricted format

Page 18: Object Orientation  in XML DTD & Schema

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion

Page 19: Object Orientation  in XML DTD & Schema

XML

Developed by the World Wide Web Consortium (1998)

Focusing a particular problem by simplifying SGML “The Internet Documents”

DTD also brought with XML 1.0 (1998) slightly different with SGML

XML Schema introduced (2001) W3C recommendation

Page 20: Object Orientation  in XML DTD & Schema

Characteristics of XML

Derived from SGML All XML documents are also SGML document Availability of grammar-based validation (DTDs) Separation of contents and additional

information about the contents (elements and attributes)

Improvements in XML Eliminates complexity Improves internationalization Can be parsed in hierarchical structure

<?xml version="1.0" encoding="UTF-8"?> < 俄语>Данные</ 俄语 >

<?xml version="1.0" encoding="UTF-8"?> < 俄语>Данные</ 俄语 >

Page 21: Object Orientation  in XML DTD & Schema

Structured use of XML(1)

XML documents can be parsed into hierarchical diagram tree-based Parsers following DOM, SAX

<?xml version="1.0" ?>

<Address>

<city>Seoul</city>

<street>Sejongro</street>

<number>145</number>

Page 22: Object Orientation  in XML DTD & Schema

Structured use of XML(2)

Page 23: Object Orientation  in XML DTD & Schema

Structured use of XML(3)

Page 24: Object Orientation  in XML DTD & Schema

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion

Page 25: Object Orientation  in XML DTD & Schema

Reason why schema is required

It is impossible to recognize structure of XML without metadata

An XML file can’t cover every possible form

book

book

title author

titlebook

title author author

book (title, author*)

book (title, author)

book (title)

book (title, author+)

Page 26: Object Orientation  in XML DTD & Schema

Concept of XML Schema, DTD

XML Schema and DTD represent the structure of an XML

Main purpose is to validate XML

class object

DB schema DB instance

XML Schema, DTD

XML instance

Page 27: Object Orientation  in XML DTD & Schema

DTD and XML Schema (1/6)

DTD (Document Type Definitions) Adopted with XML 1.0 proposal by W3C Unable to satisfy requirements for data

transfer

XML Schema Invented as alternative schema language by

W3C Requirement was released at Feb 1999 Adopted at May 2001

Page 28: Object Orientation  in XML DTD & Schema

DTD and XML Schema (2/6)

DTD DTD constraints structure of XML data

What elements can occur What attributes can/must an element have What subelements can/must occur inside each

element, and how many times.

DTD does not constrain data types DTD syntax

<!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) >

Page 29: Object Orientation  in XML DTD & Schema

DTD and XML Schema (3/6)

DTD (Cont.) Subelements can be specified as

names of elements, or #PCDATA (parsed character data), i.e., character strings EMPTY (no subelements) or ANY (anything can be a

subelement)

Subelement specification may have regular expressions <!ELEMENT library ( ( book | magazine | newspaper)+) > Notation:

“|” - alternatives “+” - 1 or more occurrences “*” - 0 or more occurrences

Page 30: Object Orientation  in XML DTD & Schema

DTD and XML Schema (4/6) XML sample<?xml version = "1.0"?><address> <!--(street , city , zip)--> <street>Jongro</street> <city>Seoul</city> <zip>123456</zip></address>

<?xml version='1.0' encoding='UTF-8' ?><!ELEMENT address (street , city , zip)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT zip (#PCDATA)>

DTD sample

address

street city zip

address

(street, city, zip)

street city zip

#PCDATA #PCDATA #PCDATA

Jongro Seoul 123456

Page 31: Object Orientation  in XML DTD & Schema

DTD and XML Schema (5/6)

XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs Typing of values

E.g. integer, string, etc Also, constraints on min/max values

User-defined, complex types Many more features, including

uniqueness and foreign key constraints, inheritance

XML Schema is itself specified in XML syntax, unlike DTDs

XML Scheme is integrated with namespaces

Page 32: Object Orientation  in XML DTD & Schema

DTD and XML Schema (6/6)

XML Schema sample<?xml version='1.0' encoding='UTF-8' ?><xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:element name="address"> <xs:complexType> <xs:sequence> <xs:element name="street" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="zip" type="xs:int"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

Page 33: Object Orientation  in XML DTD & Schema

OO Concepts in DTD, Schema (1/8)

Complex Data in DTD Element can have elements as child nodes Child elements can also have elements as child nodes

address

(street, city, zip)

street city zip

#PCDATA #PCDATA #PCDATA

name

(first, last)

first last

#PCDATA #PCDATA

(name, address)

contact

Page 34: Object Orientation  in XML DTD & Schema

OO Concepts in DTD, Schema (2/8)

Complex Data in DTD (Cont.)

<?xml version='1.0' encoding='UTF-8' ?><!ELEMENT contact (name, address)>

<!ELEMENT name (first, last)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)>

<!ELEMENT address (street , city , zip)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT zip (#PCDATA)>

Page 35: Object Orientation  in XML DTD & Schema

OO Concepts in DTD, Schema (3/8)

Complex Data in XML Schema Separation of element and complex type Sharing of one type with several elements

Named ComplexType

Unnamed ComplexType

Page 36: Object Orientation  in XML DTD & Schema

OO Concepts in DTD, Schema (4/8) Complex Data in XML Schema (Cont.)

<?xml version='1.0' encoding='UTF-8' ?><xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:element name="contact"> <xs:complexType> <xs:sequence> <xs:element name="name" type="nameType" /> <xs:element name="address" type="addressType" /> </xs:sequence> </xs:complexType> </xs:element><xs:complexType name="addressType"> <xs:sequence> <xs:element name="street" type="xs:string" /> <xs:element name="city" type="xs:string" /> <xs:element name="zip" type="xs:int" /> </xs:sequence> </xs:complexType> <xs:complexType name="nameType"> <xs:sequence> <xs:element name="first" type="xs:string" /> <xs:element name="last" type="xs:string" /> </xs:sequence> </xs:complexType></xs:schema>

Page 37: Object Orientation  in XML DTD & Schema

OO Concepts in DTD, Schema (5/8)

Inheritance in DTD DTD implements inheritance using parameter entity Parameter entity is similar to ‘#define’ statement in C/C++ Polymorphism is unavailable

<!-- define Address.extra as empty string --><!ENTITY % Address.extra “”>

<!--Address’s content = “city, street” + Address.extra --><!ELEMENT Address (city, street %Address.extra; )>

<!-- redefine – Address’s content = city, street, zip--><!ENTITY % Address.extra “, zip”>

address

(street, city, zip)

street city zip

address

(street, city)

street city

Page 38: Object Orientation  in XML DTD & Schema

OO Concepts in DTD, Schema (6/8)

Inheritance in XML Schema XML Schema supports inheritance naturally Polymorphism is available with ‘substitution group’

feature Extension and restriction options are available

<xs:complexType name=“USA_addressType”> <xs:complexContent> <xs:extension base=“addressType”> <xs:sequence> <xs:element name=“zip” type=“xs:int” /> </xs:sequence> </xs:extension> </xs:complexContent></xs:complexType>

Page 39: Object Orientation  in XML DTD & Schema

OO Concepts in DTD, Schema (7/8)

Object identity in DTD DTD implements object identity using ID, IDREF DTD shares one unique index for every ID in an XML Performance is poor for this one big unique index

<?xml version="1.0" ?> <books>

<book id=“b1” authorref=“a1” ><title>Database Concepts</title>

</book><book id=“b2” authorref=“a2” >

<title>Operating Systems</title></book><author id=“a1”>Korth</author><author id=“a2”>Ullman</author>

</book>

book

@id, @authorref (title)

id titleauthorref

author

@id (#PCDATA)

id #PCDATA

Page 40: Object Orientation  in XML DTD & Schema

OO Concepts in DTD, Schema (8/8)

Object identity in XML Schema Key in XML Schema is designed to support key in RDB There can be various keys with different scopes in an

XML Several elements may build up one key

<?xml version="1.0" ?> <tables> <table1> <row id=“1” field1=“value1” field2=“value2”> <row id=“2” field1=“value1” field2=“value2”> </table1> <table2> <row id=“1” field1=“value1” field2=“value2”> </table2></tables>

Page 41: Object Orientation  in XML DTD & Schema

DTD vs XML Schema

DTD XML Schema

data type

Can’t describe data types

Support data types (int, string, complex type) - Easy to express object in OOPL, data in RDB

language Special kind of language - ex) <!ELEMENT city (#PCDATA)>

A kind of XML - Enable to use DOM, XPATH, XSLT

inheritance

Partial support Support for inheritance and polymorphism

object identity

Unique attribute for selecting element

Support for key in RDB

Page 42: Object Orientation  in XML DTD & Schema

Contents

Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML

Schema Demonstration Conclusion

Page 43: Object Orientation  in XML DTD & Schema

SOAP : Application of XML Schema

Platform-dependence

RPC (COM, CORBA)

IDL

Binary Data

CBD

Platform-independence

SOAP

WSDL

XML

SOA

SOA SOA has many advantages compared to CBD The benefits come from XML, XML Schema

Page 44: Object Orientation  in XML DTD & Schema

SOAP: Application of XML Schema

SOAP (Simple Object Access Protocol) Remote procedure call protocol for exchanging object UDDI (registry of Web services) WSDL (Web Service Description Language)

web serviceconsumer

web serviceprovider

UDDIregistry

1. Build web service

2. Register web service

3. Discover web service

4. Get WSDL

5. Build proxy and client

6. Call Web service (SOAP)

Page 45: Object Orientation  in XML DTD & Schema

SOAP: Application of XML Schema

WSDL (Web Service Description Language) WSDL specifies

names of methods names and data types of parameters data types of return values exceptions which can be thrown URL of Web service

Data types are defined using XML Schema platform-independent machine-understandable

Page 46: Object Orientation  in XML DTD & Schema

SOAP: Application of XML Schema

Sample of WSDL XML

Schema

Page 47: Object Orientation  in XML DTD & Schema

Conclusion

Recent CS advancement causes application to process large text

XML overcomes cons of previous object description languages

DTD has been introduced with XML to explain XML document

XML Schema enhanced XML with OO Paradigm

Page 48: Object Orientation  in XML DTD & Schema

Reference

Jon Duckett et al, Professional XML Schemas, WROX, 2001

Elliotte Rusty Harold, Effective XML: 50 Specific Ways to Improve Your XML, Addison Wesley, 2003

Russ Basiura et al, Professional ASP.NET Web Services, WROX, 2001

W3C, Extensible Markup Language (XML) 1.0, W3C, 2006

Brett McLaughlin et al., Java and XML, O’Reilly, 2006