Feb 2001 C.Watters 1
Grammars, SGML, & XML
Agreeing on the rules
Feb 2001 C.Watters 2
Overview
What is a grammarBNF notationRegular expressionsContext free grammars (remember Chomsky)SGMLHTMLXML (finally) and the DTD
Feb 2001 C.Watters 3
What is a Grammar?
A grammar is a set of rules which can generate a construct from a list of
terminals determine if a construct obeys the rules
(i.e. is well formed)
Example - English construct is a sentence terminals are words
Feb 2001 C.Watters 4
Simple GrammarSentence::= Subject Verb
Subject::=(Pnoun | (def noun))Pnoun::=(Joe | Mary)Verb::=(runs | sits)def::=(the | a | an)noun::=(boy | girl)
Feb 2001 C.Watters 5
Syntax not Semantics
Let’s expand definition of noun terminals Noun::=(car | hat | TV) no problem with syntax! Hat running a little vague.
They are flying planes. No problem with syntax. Semantics not so clear
Feb 2001 C.Watters 6
What is a grammar for anyway?
1. Parse Input Examines input and determines if it satisfies
the rules of a given grammar Joe Mary. If(a<3){jump} <html>my page</hjkl>
2. Generate output Use the rules to generate well-formed
entities Joe runs. If(a < 3){b=a} <html>my page </html>
Feb 2001 C.Watters 7
BNFBackus Naur (or Normal) Form
Notation for describing syntax of a languageJohn Backus and Peter Naur, 1960’s for ALGOLMeta-symbols used ::= LHS is defined by RHS | or < > category name (defined somewhere
else)Example <program>::= begin <statements>
end;
Feb 2001 C.Watters 8
Useful extensions to BNF
Optional [else <statements>]repetitive items {<letter> | <digit> }recursion is allowed <integer> ::= <digit> | <integer>
<digit>
brackets for grouping <letter>(<digit> | <letter> <digit>)
Feb 2001 C.Watters 9
Regular Expressions
Simple way to express languages or stringseg. One ‘a’ followed by (one ‘n’ followed by one ‘d’) or by one ‘t’
a((n d) | t)
note: may use + instead of |
Feb 2001 C.Watters 10
Regular Expressions include
Concatenation\sequence A BSelection A|BKleene Closure (0 or more) A*Positive Closure(1 or more) A+
Bounded Repetition (1 to i) Ai
eg. A(n*| t) = {A, An, Ann, …, At}
Feb 2001 C.Watters 11
Context-Free Grammars
CFG is a set of recursive productions used to generate patterns of strings satisfying the construct of the language
SGML and its subsets, HTML and XML are context-free grammars!CFG are more powerful than Regular Expressions
Feb 2001 C.Watters 12
SGML
Standardized General Markup LanguageDeveloped by a committee!Led by Charles Goldfarb, 1978-1986a grammar to define the structure of documents
rules define the construct or structureterminals are <tags> and strings
Feb 2001 C.Watters 13
XML
DTDsHow to use itExamples
Feb 2001 C.Watters 14
DTD - grammar definition for a document type
Defines: element types (structure) attributes (terminals) constraints on combinations of these
Feb 2001 C.Watters 15
Element Type Declaration<!DOCTYPE GarageSale [<!ELEMENT GarageSale (Date, Place, Notes)><!ELEMENT Date (#PCDATA)><!ELEMENT Place (#PCDATA)><!ELEMENT Notes (#PCDATA)> ]>
<GarageSale><Date>today</Date><Place>myhouse</Place><Notes>Rain or
shine</Notes> </GarageSale>
Feb 2001 C.Watters 16
Sub-Elements<!ELEMENT Figure (Graphic|Code)>
<!ELEMENT Figure (Caption, (Graphic|Code))><!ELEMENT Figure (Caption?, (Graphic|Code))>
? Means optional
<!ELEMENT FTNOTE (P+)> 1 or more
<!ELEMENT FTNOTE (P*)> 0 or more
where Graphic etc are also defined as ELEMENTS
Feb 2001 C.Watters 17
AttributesLet you add extra information to elements<!ELEMENT Place (#PCDATA)><!ATTLIST Place Address CDATA Email CDATA Phone CDATA><Place Address=“1234 Oak”>
<!ATTLIST SHIRT Size (small|medium|large)><Shirt size=“small”>
CDATA=> character data
Feb 2001 C.Watters 18
Validating Parser
A DTD (document type definition) defines the grammar of a type of document
memo web page book
Validating Parser uses a DTDto check if a given document satisfies the rules of that grammar
Feb 2001 C.Watters 19
HTML & XMLHTML is a subset of SGML with a
shared DTD HTMLDOC::=(<html> HEAD BODY
</html>)
XML is a subset of SGML with many DTD’s allowed
“XML is like HTML with the training wheels off” -Dan Connolly, leader of XML activity at W3C
Feb 2001 C.Watters 20
XMLUses tags to identify semantics of data
looks like HTML, but isn’t<slide><title>Introduction</title> <author><first>Carolyn</first>
<last>Watters</last> </author>
<content>XML this and that</content></slide>
is license free, platform-independent and well-supported
Feb 2001 C.Watters 21
HTMLHypertext Markup Language
Hypertext Markup Language
Presents documents via WWW browsers
Document layout and hyperlink specifications
Predefined set of tags (ie. Common DTD)
Feb 2001 C.Watters 22
<HTML><TITLE>Statistics Canada</TITLE><BODY><H3>Welcome to Stats Canada</H3>Statistics Canada ……. . <p> We like numbers…..<img src=“mapleleaf.gif><ul>What we do<li><a href=“census.html”>Census</a><li><a href=“special.html”>Special surveys</a><li>a href=“online.html”>Online data</a></ul></BODY></HTML>
Feb 2001 C.Watters 23
HTML
HTML - Advantages
Simple - fixed set of tags
Portable - used with all browsers
Linking - within and to external documents
HTML - Disadvantages
Limited tag set
Can’t separate the definition from content
Can’t define structure of contents
Feb 2001 C.Watters 24
XML
XML allows anyone to define a document structure separate from its display structure
Explicit Definition - DTD
25C.WattersFeb 2001
Some CodeSome Code
Schema
Entity Passport Details
SubEntities Last Name First Name Address
Entity Address
SubEntities Street City Town State Province ……..
Feb 2001 C.Watters 26
<!ELEMENT passport_details (last_name,first_name+,address)><!ELEMENT last_name (#PCDATA)><!ELEMENT first_name (#PCDATA)><!ELEMENT address (street,(city|town),(state|province),(ZIP|
postal_code),country,contact_no?,email*)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT town (#PCDATA)><!ELEMENT state (#PCDATA)><!ELEMENT province (#PCDATA)><!ELEMENT ZIP (#PCDATA)><!ELEMENT postal_code (#PCDATA)><!ELEMENT country (#PCDATA)><!ELEMENT phone_home (#PCDATA)><!ELEMENT email (#PCDATA)>
DTD
27C.WattersFeb 2001
Internal DTD and InstanceInternal DTD and Instance
<?xml version='1.0'?><!DOCTYPE passport_details [<!ELEMENT passport_details
(last_name,first_name+,address)><!ELEMENT last_name (#PCDATA)><!ELEMENT first_name (#PCDATA)><!ELEMENT address (street,(city|town),
(state|province),(ZIP|
postal_code),country,contact_no?,email*)>
<!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT town (#PCDATA)><!ELEMENT state (#PCDATA)><!ELEMENT province (#PCDATA)><!ELEMENT ZIP (#PCDATA)><!ELEMENT postal_code (#PCDATA)><!ELEMENT country (#PCDATA)><!ELEMENT phone_home (#PCDATA)><!ELEMENT email (#PCDATA)>]>
<passport_details><last_name>Smith</last_name><first_name>Jo</first_name><first_name>Stephen</first_name><address>
<street>1 Great Street</street>
<city>GreatCity</city><state>GreatState</state><postal_code>1234</
postal_code><country>GreatLand</
country>
<email>[email protected]</email></address>
</passport_details>
28C.WattersFeb 2001
Shared DTDShared DTD
XML Document specifies the DTD<?xml version='1.0'?>
<!DOCTYPE passport_details SYSTEM "PassportExt.dtd">
<passport_details><last_name>Smith</last_name><first_name>Jo</first_name><first_name>Stephen</first_name><address>
<street>1 Great Street</street><city>GreatCity</city><state>GreatState</state><postal_code>1234</postal_code><country>GreatLand</country>
<email>[email protected]</email></address>
</passport_details>
Feb 2001 C.Watters 29
Coordinating Heterogenous Databases
Separation of Structure / Content / Display
Document Validity Checking
Potential Use in Standards
Importance of XML
Feb 2001 C.Watters 30
Example
Boeing
Boeing places a DTD on its site
part purchasers use this DTD
Boeing can use multiple XSL stylesheets
Feb 2001 C.Watters 31
Boeing (cont’d)
customer creates an order document, they can verify the validity of that document against the DTD.
this ensures they are transmitting only type-valid orders.
in turn, Boeing can ensure they are receiving only type-valid documents.
Feb 2001 C.Watters 32
2. Using XML: DOM & SAX
DOM: Document Object Model
The DOM is a standard object application programming interface that gives developers programmatic control of XML document content, structure, formats, and more.
DOM defines a programmatic API for accessing XML documents.
Feb 2001 C.Watters 33
3. Using XML: presenting data
Need to convert XML tags into appropriate HTML tags for use in a browser!!
<lastname>Smith</lastname>
<b>Smith</b> Smith
Feb 2001 C.Watters 34
Stylesheets are used to present XML: The Cascading Stylesheet Specification (CSS)
The Extensible Style Language (XSL)
CSS XSL
Can be used with HTML? Yes No
Can be used with XML? Yes Yes
Transformation language? No Yes
Syntax CSS XML
Feb 2001 C.Watters 35
CSS and XSL
CSS - Cascading Style Sheets can predefined HTML display (font etc) these are shared and reused
XSL - XML Style language predefine display characteristics for XML
entities transform into CSS for browsers to use
36C.WattersFeb 2001
Cascading Style SheetsCascading Style Sheets
CSSlast_name
{font-family: verdana, arial;font-size: 15pt;font-weight:bold;display: block;margin-bottom: 5pt;
}first_name
{font-family: verdana, arial;font-size: 15pt;font-weight:bold;display: block;margin-bottom: 5pt;
}
street, city, town, state, province, ZIP, postal_code{font-family: verdana, arial;font-size: 12pt;font-weight:bold;color:green;display:block;margin-bottom: 20pt;margin-top: 40pt;
{font-family: verdana, arial;font-size: 12pt;font-weight:bold;color:blue;display:block;margin-top: 5pt;
}
Feb 2001 C.Watters 37
CSS
Most local definition has precidenceMay be referred to (shared)
38C.WattersFeb 2001
<?xml version='1.0'?><xsl:stylesheet
xmlns:xsl="http://www.w3.org/TR/WD-xsl"xmlns="http://www.w3.org/TR/REC-html40"result-ns="">
<xsl:template><xsl:apply-templates/></xsl:template><xsl:template match="/"> <html>
<head><title><xsl:value-of select="/passport/last_name"/></title></head><body> <H1><xsl:value-of select="/pastport/last_name, first_name"/></H1> <H2>Address</H2>
<BLOCKQUOTE><xsl:apply-templates select="/passport/address"/></BLOCKQUOTE>
</body> </html>
XSL (Style Language)
Feb 2001 C.Watters 39
Understanding A Template
Most templates have the following form:<xsl:template match="para"> <p><xsl:apply-templates/></p> </xsl:template>
The whole <xsl:template> element is a template
The match pattern determines where this template applies
Literal result elements come from non-XSL namespace(s)
XSLT elements come from the XSL namespace
Feb 2001 C.Watters 40
Options for displaying XML
XMLDocument
CSSStylesheet
CSSStylesheet
XSLStylesheet
XSLStylesheet
XML enabledWeb BroswerXML enabledWeb Broswer
XML DisplayEngine
XML DisplayEngine
XSLTransformation
spec
HTMLDocument
Web BroswerWeb BroswerXSL
Transformation
Feb 2001 C.Watters 41
2. Using XML:How does browser read XML ?
XML parser: A tool for reading XML documents
Microsoft's Internet Explorer 4.0 was the first Web browser to implement XML
Netscape will support XML metadata in Communicator/Navigator 5.0 as a delivery component code-named Aurora.
Feb 2001 C.Watters 42
Desktop
Middle - Tier
Storage
Display
Multiple view created from the XML-base data
Data Delivery,
Manipulation:XML exchanged over HTTP manipulated via the DOM
Data Integration
XML emitted or generated from multiple source
XML delivered to other applications or objects for further processing
HTML view #1(eg.
Purchasing Agent)
HTML view #2(eg.
Consumer)
Web ServerDB Access, Integration
Business Rules(eg. Purchase order)
Web ServerDB Access, Integration
Business Rules(eg. Purchase order)
Mainframe Database
XML
XML Architecture
Feb 2001 C.Watters 43
4. Case Study
An example of XML Tree structure
A simply example: Portfolio.xml Portfolio.xsl
http://msdn.microsoft.com/xml/samples/review/review-xsl.xml
Feb 2001 C.Watters 44..
storystory
addressaddressbookstorebookstore
menumenu
bodybody
reviewreview
logologo
namename
phonephone
datedate
reviewerreviewer personperson
summarysummary personperson
booksbooks
office suppliesoffice supplies
..
Tree Structure of the example
Feb 2001 C.Watters 45
Feb 2001 C.Watters 46
In the major Web Browser products.
In Microsoft Office 2000.
In every major database tool by end of 2000.
In every HTML tool by end of 2000.
CommerceNet believes that XML may just be the “killer application” needed to open up the Worldwide Web for Electronic Commerce.
Is this for real?
Feb 2001 C.Watters 47
XML - AdvantagesPlatform and system independentUser-defined tagsDoesn’t require explicit DTDDisplay format and content are separate
XML - DisadvantagesRequires a processing application“Pickier” than HTMLMust be converted to HTML to view in browser
Summary
Feb 2001 C.Watters 48
W3 Consortium: www.w3.com
kazillions of XML books in every bookstore!
Resources
Feb 2001 C.Watters 49
6. Reference
Jon Bosak and Tim Bray, Scientific American, May 1999 [http://www.sciam.com/1999/0599issue/0599bosak.html]
Norman Walsh: What is XML? Oct. 3, 1998 [http://xml.com/xml/pub/98/10/guide1.html#AEN58]
Graphic Communications Association web site [http://www.gca.org/whats_xml/default.htm]
University College Cork [http://www.ucc.ie/xml/]
Microsoft MSDN online samples [http://msdn.microsoft.com/xml/samples/review/review-xsl.xml]
[http://www.oasis-open.org/cover/xsl.html]
Charles F. Goldfarb, Paul Prescod, The XML Handbook, 1998