cp3024 lecture 9 xml: extensible markup language
TRANSCRIPT
![Page 1: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/1.jpg)
CP3024 Lecture 9
XML: Extensible Markup Language
![Page 2: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/2.jpg)
What is a markup language?
Textual (i.e. person readable) language where significant elements are indicated by markers– <TITLE>XML</TITLE>
Examples are RTF, HTML, VRML, TEX etc.
Easy to process and can be manipulated by a variety of application programs
![Page 3: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/3.jpg)
What does the Web use?
HTML– Hypertext Markup Language
Defined as the original Web languageBased on SGML (see later)Suited for hypertext, multimedia, small
simple documentsCurrently at version 4.01 (the last?)
![Page 4: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/4.jpg)
Why change? - 1
Change in Web usage– no longer a mechanism for exchanging
scientific papers– presentational aspects are now seen as of
greater importance– extracting the meaning of a document using a
program will be a new growth area
HTML can't grow much more!
![Page 5: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/5.jpg)
Why change? - 2
Extensibility– HTML does not allow users to specify their own tags
Structure– HTML cannot represent database schemas or object-
oriented hierarchies
Validation– HTML does not allow applications to check that the
structure of data is valid
![Page 6: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/6.jpg)
What is SGML?
Standard Generalised Markup LanguageISO 8879Can define any document format of any
complexityEnables, extensibility, structure and
validationToo many optional features for the Web
![Page 7: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/7.jpg)
What is XML?
Simplified subset of SGML designed for Web applications
Differs from HTML– Can define new tags– Structures may be nested to any level of
complexity– XML documents may define a grammar which
enables structural validation of that document
![Page 8: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/8.jpg)
Where has XML come from?
Emanates from the Word Wide Web consortium (W3C)
Developed by XML working group chaired by Jon Bosak (Sun Microsystems)
Group includes representatives from Microsoft, Netscape, HP, Adobe, etc.
Last bastion against proprietary markup and Web fragmentation
![Page 9: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/9.jpg)
Design Goals for XML - 1
XML shall be straightforwardly usable over the Internet
XML shall support a wide variety of applications
XML shall be compatible with SGML It shall be easy to write programs which
process XML documentsThe number of optional features is to be kept
to the absolute minimum
![Page 10: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/10.jpg)
Design Goals for XML - 2
XML documents should be human-legibleThe XML design should be prepared
quicklyThe design of XML shall be formal and
conciseXML documents shall be easy to createTerseness in XML markup is of minimum
importance
![Page 11: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/11.jpg)
The XML View of a Document
Taken from an example given by Jon Bosak
![Page 12: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/12.jpg)
Structured Publishing
Taken from an example given by Jon Bosak
![Page 13: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/13.jpg)
XML Example
<?xml version="1.0"?><sweepjoke><harry>Say <quote>Bye Bye </quote>, Sweep </harry><sweep> <quote>Bye Bye, Sweep</quote></sweep><laughter/></sweepjoke>
![Page 14: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/14.jpg)
XML Markup
ElementsEntity referencesCommentsProcessing InstructionsMarked sectionsDocument type declarations (DTD)
![Page 15: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/15.jpg)
Elements
Commonest form of markupDelimited by angle brackets (<, >)May be empty but normally consist of start
tag and end tagStart tag may contain attributes
– <a href="www.scit.wlv.ac.uk">
![Page 16: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/16.jpg)
Entity References
In XML (and HTML) certain characters are reserved e.g. <
Entity references are used to insert these into documents
Entity references begin with an ampersand (&) and end with a semicolon (;)
You can define your own entitiesCan be used to insert Unicode characters
![Page 17: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/17.jpg)
Comments
Begin with <!--End with -->Can contain any data except --XML processors are not required to pass
comments to an application
![Page 18: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/18.jpg)
Processing Instructions (PIs)
Provide information to an applicationXML processors required to pass them onHave the form <?name pidata?>The name (PI target) identifies the PIData is optional and meaningful to an
application that recognises the target
![Page 19: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/19.jpg)
Marked Sections
Parsers ignore everything in CDATA sections<![CDATA[
<head>if p < <</head>
]]>
Only character string not allowed is ]]>Data is passed on to the application
![Page 20: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/20.jpg)
Document Type Declarations
Optional in XML (not in SGML)Specify constraints on the sequence and
nesting of tagsCommunicates meta-information to the
parser about contentSequence and nesting of tags, attribute
values, external files, entities
![Page 21: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/21.jpg)
Kinds of Declaration
Element type declarationsAttribute list declarationsEntity declarationsNotation declarations
![Page 22: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/22.jpg)
Element Type Declaration
<!ELEMENT sweepjoke (harry+, sweep, laughter?)>
A sweepjoke consists of a harry element followed by a sweep element and a laughter element
The harry element may be repeated (+)– + indicates one or more
The laughter element is optional (?)
![Page 23: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/23.jpg)
Sweepjoke Declaration
<!ELEMENT sweepjoke (harry+, sweep, laughter?)>
<!ELEMENT harry (#PCDATA | quote)*>
<!ELEMENT sweep (#PCDATA | quote)*>
<!ELEMENT quote (#PCDATA)*>
<!ELEMENT laughter EMPTY>
PCDATA indicates parseable character data
| indicates 'or'* indicates 'zero or more'
![Page 24: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/24.jpg)
Attribute List Declaration
Identifies– which elements may have attributes– what attributes they may have– what values are permitted for an attribute– what value is the default
<!ATTLIST sweepjoke
name ID #REQUIRED
label CDATA #IMPLIED
status ( funny | notfunny ) 'funny'>
![Page 25: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/25.jpg)
Entity Declarations
Allow a name to be associated with some other content
Internal entities associate a name with a string of literal text (e.g. <)
External entities associate a name with the content of another file
Parameter entities enable text replacement within the DTD
![Page 26: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/26.jpg)
Adding a DTD to an XML File
InlineExternal
– <?xml version="1.0"?>– <!DOCTYPE sweepjoke SYSTEM “sweep.dtd">
![Page 27: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/27.jpg)
Links in XML
HTML anchors are a very limited form of hypertext
XML introduces– XPointers– XLinks
These standards are outside the scope of the XML standard
![Page 28: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/28.jpg)
Presentation Issues
Use of a stylesheet is implicitPossible standards:
– DSSSL Document Style and Semantics Specification Language (ISO 10179)
– CSS Cascading Stylesheet Specification– XSL Extensible Style Language (uses XML
syntax)
![Page 29: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/29.jpg)
XSL
XSL is an XML sylesheet language– XSLT is a language for transforming XML
documents– XSL formatting objects specify formatting
semantics
A set of rules to transform a documentXML can be transformed into HTML
![Page 30: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/30.jpg)
XML Application Areas
Mediation between heterogeneous databases on the Web
Client centric web applicationsApplications requiring different views of
the same dataInformation discovery tailored to the needs
of differing individuals
![Page 31: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/31.jpg)
Languages based on XML
MathMLSMILRDFXHTMLCML
![Page 32: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/32.jpg)
RDF
Resource Description FrameworkIntegrates a variety of web-based metadata
activitiesProvides interoperability between
applications that exchange metadataAllows machine readable description of
Web resources
![Page 33: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/33.jpg)
RDF Example
<?xml version="1.0"?> <?xml:namespace
ns = "http://www.w3.org/RDF/RDF/" prefix ="RDF" ?>
<?xml:namespace ns = "http://purl.oclc.org/DC/" prefix = "DC" ?>
<RDF:RDF> <RDF:Description RDF:
HREF = "http://uri-of-Document-1"> <DC:Creator>John Smith</DC:Creator>
</RDF:Description> </RDF:RDF>
![Page 34: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/34.jpg)
XHTML
New Web languages are defined using XML
HTML 4.0 cannot be defined using XMLXHTML is XML compliant HTML
![Page 35: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/35.jpg)
Major Changes
Documents must be well-formedElements and attributes must have lower
case namesEnd tags required in non-empty elementsAttribute values must be in quotesEmpty tags must be terminatedScripts will be processed by XHTML
![Page 36: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/36.jpg)
XHTML Compatibility
Current browsers unlikely to understand all XHTML
E.g. <br/> may cause an errorCompatibility guidelines defined in
XHTML standardSee http://www.w3.org/TR/xhtml1/
Appendix C
![Page 37: CP3024 Lecture 9 XML: Extensible Markup Language](https://reader036.vdocuments.net/reader036/viewer/2022062408/56649edb5503460f94beaf0e/html5/thumbnails/37.jpg)
Summary
XML significantly expands what is possible on the Web
XML preserves the basic Web ideasUsing XML is an order of magnitude more
difficult than writing HTMLSoftware is out there and more will soon
followThe opportunities are endless!