object orientation in xml dtd & schema
DESCRIPTION
Object Orientation in XML DTD & Schema. Dunam Kim / Jongdae Han IDB Lab. / SE Lab. SNU CSE April 25, 2007. Contents. Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion. Background. - PowerPoint PPT PresentationTRANSCRIPT
Object Orientation in XML DTD & Schema
Dunam Kim / Jongdae HanIDB Lab. / SE Lab.
SNU CSE
April 25, 2007
Contents
Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion
Background
Needs were arisen to process complex and large text data
Certain kind of ‘language’ was requested to describe such complicate text
Contents
Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion
Text Processing & Storage
Primitive : Long, simple sequence of string
DO primitive Lacks of semantic information Not machine intuitive
Advanced : Organized structure
Title<cr>This text is a sample.<eof>Title<cr>This text is a sample.<eof>
Title
This entity is title of the article.
It should be common string, with maximum length of 10.
This entity is title of the article.
It should be common string, with maximum length of 10.
Serialization(1)
Ruby, Smaltalk, Python, ObjC, Java, .NET process of saving an object onto a storage
medium or transmit it over network deflating or marshalling example of ObjC
Sender
GNU TypedStream 1D@îC¡
Receiver
received 1089356705
1089356705
Serialization(2)
Simple, non-structured Focuses on efficiency
Not applicable to long text document How can we find certain phrase in a 5MB
document?
Indexed Text(1)
Inspired by RDB Increased search speed Syntax resolution rather than semantic
oneHis clothing collapsed in a heap. She did not see it, seeing only the naked man who stood before the chair in which the
new President had sat.
Chapter 1. the beginning.
So tall was he that his head nearly brushed the ceiling; and so glorious
was he that one felt that the ceiling had risen so that his head would not
brush it.
His clothing collapsed in a heap. She did not see it, seeing only the naked man who stood before the chair in which the
new President had sat.
Chapter 1. the beginning.
So tall was he that his head nearly brushed the ceiling; and so glorious
was he that one felt that the ceiling had risen so that his head would not
brush it.
Go “chapter 1”Go “chapter 1”
Indexed Text(2)
Indexed Text(3)
No semantic information, again!
A Gazebo, 15 century, China
Picture taken by Kim, 2006 05 02
A Gazebo, 15 century, China
Picture taken by Kim, 2006 05 02
How can we index this?
Contents
Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion
Markup language
The telephone in the study rang ten minutes before the
news came on. The new President picked it up and said
hello.
"Mister President?"
The telephone in the study rang ten minutes before the
news came on. The new President picked it up and said
hello.
"Mister President?"
Title of the Text is : Copperhead
Author of the Text is : Gene Wolfe
Title of the Text is : Copperhead
Author of the Text is : Gene Wolfe
Markup language
Copperhead
By Gene Wolfe
The telephone in the stud
Copperhead
By Gene Wolfe
The telephone in the stud
<b>Copperhead</b>
<I>Gene Wolfe</I>
<br>
The telephone in the stud
<b>Copperhead</b>
<I>Gene Wolfe</I>
<br>
The telephone in the stud<Title>Copperhead
<Author>Gene Wolfe
<Contents>
The telephone in the stud
<Title>Copperhead
<Author>Gene Wolfe
<Contents>
The telephone in the stud
Early History of Markup language
GenCode William W. Tunnicliffe, 1967 Gave rough sketch of the “Markup Language”
troff/nroff Typesetting tool for Unix, mid-1960
Tex Publishing standard, 1978
Scribe Charles Goldfarb, 1960’s
SGML
Standard Generalized Markup Language Distinct structure and presentation Separately had syntax for describing what
tags were allowed, and where Ancestor of the HTML Invented by Charles Goldfarb, 1970s
<QUOTE TYPE="example">
typically something like <ITALICS>this</ITALICS>
</QUOTE>
Cons of SGML
Standardized too late ISO 8879, 1986
Very complex, hard to learn Cumbersome, as a side-effect of flexibility
ex) Start-tag ( or end-tag, or both) sometimes optional -> why? to save keystroke
HTML
HyperText Markup Language Tim Berners-Lee, 1993 Procedural and Descriptive A profile of SGML
Simple, restricted format
Contents
Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion
XML
Developed by the World Wide Web Consortium (1998)
Focusing a particular problem by simplifying SGML “The Internet Documents”
DTD also brought with XML 1.0 (1998) slightly different with SGML
XML Schema introduced (2001) W3C recommendation
Characteristics of XML
Derived from SGML All XML documents are also SGML document Availability of grammar-based validation (DTDs) Separation of contents and additional
information about the contents (elements and attributes)
Improvements in XML Eliminates complexity Improves internationalization Can be parsed in hierarchical structure
<?xml version="1.0" encoding="UTF-8"?> < 俄语>Данные</ 俄语 >
<?xml version="1.0" encoding="UTF-8"?> < 俄语>Данные</ 俄语 >
Structured use of XML(1)
XML documents can be parsed into hierarchical diagram tree-based Parsers following DOM, SAX
<?xml version="1.0" ?>
<Address>
<city>Seoul</city>
<street>Sejongro</street>
<number>145</number>
Structured use of XML(2)
Structured use of XML(3)
Contents
Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion
Reason why schema is required
It is impossible to recognize structure of XML without metadata
An XML file can’t cover every possible form
book
book
title author
titlebook
title author author
book (title, author*)
book (title, author)
book (title)
book (title, author+)
Concept of XML Schema, DTD
XML Schema and DTD represent the structure of an XML
Main purpose is to validate XML
class object
DB schema DB instance
XML Schema, DTD
XML instance
DTD and XML Schema (1/6)
DTD (Document Type Definitions) Adopted with XML 1.0 proposal by W3C Unable to satisfy requirements for data
transfer
XML Schema Invented as alternative schema language by
W3C Requirement was released at Feb 1999 Adopted at May 2001
DTD and XML Schema (2/6)
DTD DTD constraints structure of XML data
What elements can occur What attributes can/must an element have What subelements can/must occur inside each
element, and how many times.
DTD does not constrain data types DTD syntax
<!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) >
DTD and XML Schema (3/6)
DTD (Cont.) Subelements can be specified as
names of elements, or #PCDATA (parsed character data), i.e., character strings EMPTY (no subelements) or ANY (anything can be a
subelement)
Subelement specification may have regular expressions <!ELEMENT library ( ( book | magazine | newspaper)+) > Notation:
“|” - alternatives “+” - 1 or more occurrences “*” - 0 or more occurrences
DTD and XML Schema (4/6) XML sample<?xml version = "1.0"?><address> <!--(street , city , zip)--> <street>Jongro</street> <city>Seoul</city> <zip>123456</zip></address>
<?xml version='1.0' encoding='UTF-8' ?><!ELEMENT address (street , city , zip)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT zip (#PCDATA)>
DTD sample
address
street city zip
address
(street, city, zip)
street city zip
#PCDATA #PCDATA #PCDATA
Jongro Seoul 123456
DTD and XML Schema (5/6)
XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs Typing of values
E.g. integer, string, etc Also, constraints on min/max values
User-defined, complex types Many more features, including
uniqueness and foreign key constraints, inheritance
XML Schema is itself specified in XML syntax, unlike DTDs
XML Scheme is integrated with namespaces
DTD and XML Schema (6/6)
XML Schema sample<?xml version='1.0' encoding='UTF-8' ?><xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:element name="address"> <xs:complexType> <xs:sequence> <xs:element name="street" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="zip" type="xs:int"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>
OO Concepts in DTD, Schema (1/8)
Complex Data in DTD Element can have elements as child nodes Child elements can also have elements as child nodes
address
(street, city, zip)
street city zip
#PCDATA #PCDATA #PCDATA
name
(first, last)
first last
#PCDATA #PCDATA
(name, address)
contact
OO Concepts in DTD, Schema (2/8)
Complex Data in DTD (Cont.)
<?xml version='1.0' encoding='UTF-8' ?><!ELEMENT contact (name, address)>
<!ELEMENT name (first, last)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)>
<!ELEMENT address (street , city , zip)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT zip (#PCDATA)>
OO Concepts in DTD, Schema (3/8)
Complex Data in XML Schema Separation of element and complex type Sharing of one type with several elements
Named ComplexType
Unnamed ComplexType
OO Concepts in DTD, Schema (4/8) Complex Data in XML Schema (Cont.)
<?xml version='1.0' encoding='UTF-8' ?><xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:element name="contact"> <xs:complexType> <xs:sequence> <xs:element name="name" type="nameType" /> <xs:element name="address" type="addressType" /> </xs:sequence> </xs:complexType> </xs:element><xs:complexType name="addressType"> <xs:sequence> <xs:element name="street" type="xs:string" /> <xs:element name="city" type="xs:string" /> <xs:element name="zip" type="xs:int" /> </xs:sequence> </xs:complexType> <xs:complexType name="nameType"> <xs:sequence> <xs:element name="first" type="xs:string" /> <xs:element name="last" type="xs:string" /> </xs:sequence> </xs:complexType></xs:schema>
OO Concepts in DTD, Schema (5/8)
Inheritance in DTD DTD implements inheritance using parameter entity Parameter entity is similar to ‘#define’ statement in C/C++ Polymorphism is unavailable
<!-- define Address.extra as empty string --><!ENTITY % Address.extra “”>
<!--Address’s content = “city, street” + Address.extra --><!ELEMENT Address (city, street %Address.extra; )>
<!-- redefine – Address’s content = city, street, zip--><!ENTITY % Address.extra “, zip”>
address
(street, city, zip)
street city zip
address
(street, city)
street city
OO Concepts in DTD, Schema (6/8)
Inheritance in XML Schema XML Schema supports inheritance naturally Polymorphism is available with ‘substitution group’
feature Extension and restriction options are available
<xs:complexType name=“USA_addressType”> <xs:complexContent> <xs:extension base=“addressType”> <xs:sequence> <xs:element name=“zip” type=“xs:int” /> </xs:sequence> </xs:extension> </xs:complexContent></xs:complexType>
OO Concepts in DTD, Schema (7/8)
Object identity in DTD DTD implements object identity using ID, IDREF DTD shares one unique index for every ID in an XML Performance is poor for this one big unique index
<?xml version="1.0" ?> <books>
<book id=“b1” authorref=“a1” ><title>Database Concepts</title>
</book><book id=“b2” authorref=“a2” >
<title>Operating Systems</title></book><author id=“a1”>Korth</author><author id=“a2”>Ullman</author>
</book>
book
@id, @authorref (title)
id titleauthorref
author
@id (#PCDATA)
id #PCDATA
OO Concepts in DTD, Schema (8/8)
Object identity in XML Schema Key in XML Schema is designed to support key in RDB There can be various keys with different scopes in an
XML Several elements may build up one key
<?xml version="1.0" ?> <tables> <table1> <row id=“1” field1=“value1” field2=“value2”> <row id=“2” field1=“value1” field2=“value2”> </table1> <table2> <row id=“1” field1=“value1” field2=“value2”> </table2></tables>
DTD vs XML Schema
DTD XML Schema
data type
Can’t describe data types
Support data types (int, string, complex type) - Easy to express object in OOPL, data in RDB
language Special kind of language - ex) <!ELEMENT city (#PCDATA)>
A kind of XML - Enable to use DOM, XPATH, XSLT
inheritance
Partial support Support for inheritance and polymorphism
object identity
Unique attribute for selecting element
Support for key in RDB
Contents
Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML
Schema Demonstration Conclusion
SOAP : Application of XML Schema
Platform-dependence
RPC (COM, CORBA)
IDL
Binary Data
CBD
Platform-independence
SOAP
WSDL
XML
SOA
SOA SOA has many advantages compared to CBD The benefits come from XML, XML Schema
SOAP: Application of XML Schema
SOAP (Simple Object Access Protocol) Remote procedure call protocol for exchanging object UDDI (registry of Web services) WSDL (Web Service Description Language)
web serviceconsumer
web serviceprovider
UDDIregistry
1. Build web service
2. Register web service
3. Discover web service
4. Get WSDL
5. Build proxy and client
6. Call Web service (SOAP)
SOAP: Application of XML Schema
WSDL (Web Service Description Language) WSDL specifies
names of methods names and data types of parameters data types of return values exceptions which can be thrown URL of Web service
Data types are defined using XML Schema platform-independent machine-understandable
SOAP: Application of XML Schema
Sample of WSDL XML
Schema
Conclusion
Recent CS advancement causes application to process large text
XML overcomes cons of previous object description languages
DTD has been introduced with XML to explain XML document
XML Schema enhanced XML with OO Paradigm
Reference
Jon Duckett et al, Professional XML Schemas, WROX, 2001
Elliotte Rusty Harold, Effective XML: 50 Specific Ways to Improve Your XML, Addison Wesley, 2003
Russ Basiura et al, Professional ASP.NET Web Services, WROX, 2001
W3C, Extensible Markup Language (XML) 1.0, W3C, 2006
Brett McLaughlin et al., Java and XML, O’Reilly, 2006