xml_guide

3
A Quick Introduction to XML This document provides a quick introduction to some of the terms and concepts used in the analysis of the XML documents in the Tutorial section of the CellML website. The terms are taken from the origin- al XML specification [http://www.w3.org/TR/1998/REC-xml-19980210] published in February 1998 by the World Wide Web Consortium. The following online resources provide more thorough documentation on XML: http://www.w3.org/XML/ [http://www.w3.org/XML/] — the W3C's XML page. http://www.ucc.ie/xml/ — the official XML FAQ. http://www.xml.com/axml/testaxml.htm — the annotated XML specification. http://www.oasis-open.org/cover/xml.html — the XML Cover pages. The following list of terms is by no means exhaustive, and the definitions are in some cases incomplete: XML XML stands for eXtensible Markup Language, and it is a standard for structured text documents developed by the World Wide Web Consortium [http://www.w3.org/] (W3C). The W3C represents about 500 paying member companies and is responsible for many of the standards relating to the internet, including HTML. XML can be used to structure text in such a way that it is readable by both hu- mans and machines, and it presents a simple format for the exchange of information across the internet between computers. As such, elec- tronic commerce is the principal application area for XML. XML is a simplification (or subset) of the Standard Generalized Markup Language (SGML) which was developed in the 1970s for the large-scale storage of structured text documents. XML document An XML document contains a prolog and a body. The prolog con- sists of an XML declaration, possibly followed by a document type declaration. The body is made up of a single root element, possibly with some comments and/or processing instructions. An XML docu- ment is typically a computer file whose contents meet the require- ments laid out in the XML specification. However, XML documents may also be generated "on the fly" by a computer responding to a re- quest from another computer. For instance, an XML document may be dynamically compiled from information contained in a database.) XML Declaration The first few characters of an XML document must make up an XML declaration. The declaration is used by the processing soft- ware to work out how to deal with the subsequent XML content. A typical XML declaration is shown below. The encoding of a docu- ment is particularly important, as XML processors will default to UTF-8 when reading an 8-bit-per-character document. This will cause characters to be rendered incorrectly if the document uses Lat- in encoding (iso-8859-1). XML processing applications are required 1

Upload: zl2abv

Post on 18-Aug-2015

212 views

Category:

Documents


0 download

DESCRIPTION

XML Guide

TRANSCRIPT

A Quick Introduction to XMLThis document provides a quick introduction to some of the terms and concepts used in the analysis ofthe XML documents in the Tutorial section of the CellML website. The terms are taken from the origin-al XML specification [http://www.w3.org/TR/1998/REC-xml-19980210] published in February 1998 bythe World Wide Web Consortium.The following online resources provide more thorough documentation on XML: http://www.w3.org/XML/ [http://www.w3.org/XML/] the W3C's XML page. http://www.ucc.ie/xml/ the official XML FAQ. http://www.xml.com/axml/testaxml.htm the annotated XML specification. http://www.oasis-open.org/cover/xml.html the XML Cover pages.The following list of terms is by no means exhaustive, and the definitions are in some cases incomplete:XMLXML stands for eXtensible Markup Language, and it is a standardforstructuredtext documentsdevelopedbytheWorldWideWebConsortium [http://www.w3.org/] (W3C). The W3C representsabout 500 paying member companies and is responsible for many ofthe standards relating to the internet, including HTML. XML can beused to structure text in such a way that it is readable by both hu-mans and machines, and it presents a simple format for the exchangeof information across the internet between computers. As such, elec-tronic commerce is the principal application area for XML.XMLis asimplification(or subset) of theStandardGeneralizedMarkup Language (SGML) which was developed in the 1970s forthe large-scale storage of structured text documents.XML documentAn XML document contains a prolog and a body. The prolog con-sists of an XML declaration, possibly followed by a document typedeclaration. The body is made up of a single root element, possiblywith some comments and/or processing instructions. An XML docu-ment is typically a computer file whose contents meet the require-ments laid out in the XML specification. However, XML documentsmay also be generated "on the fly" by a computer responding to a re-quest from another computer. For instance, an XML document maybe dynamically compiled from information contained in a database.)XML DeclarationThefirst fewcharactersof anXMLdocument must makeupanXMLdeclaration. Thedeclarationisusedbytheprocessingsoft-ware to work out how to deal with the subsequent XML content. Atypical XML declaration is shown below. The encoding of a docu-ment isparticularlyimportant, asXMLprocessorswill default toUTF-8whenreadingan8-bit-per-character document. This willcause characters to be rendered incorrectly if the document uses Lat-in encoding (iso-8859-1). XML processing applications are required1to handle 16-bit-per-character documents in the Unicode encoding,which makes XML a truly international format, able to handle mostmodern languages.

Document Type DeclarationA document author can use an optional document type declarationafter the XML declaration to indicate what the root element of theXMLdocument will beandpossiblytopoint toadocument typedefinition. A typical document type declaration for a CellML docu-ment is shown below. Note that the document type declaration facil-ity defined in the XML specification provides a lot more functional-ity than what is discussed or shown here.

Start / End TagThesimplest wayof encodingthemeaningof apieceof text inXML is to enclose it inside start and end tags. A start tag consists ofthetag-nameinbetweenless-thanandgreater-thansigns, andthematching end tag has a slash preceding the tag-name, as shown be-low. Awell-formedXMLdocument hasanend-tagthat matchesevery start-tag. the text data ElementThe combination of start-tag, data and end-tag is known as an ele-ment. The data may be plain text (as in the example above), furtherelements (sub-elements), or a combination of text and sub-elements.A document is usually made up of a tree of elements with a singleroot element as shown below.

data for sub-element 1 data for sub-element 2

AttributeAnother way of putting data into an XML document is by adding at-tributes to start tags. The value of the attribute is usually intended tobe data relevant to the content of the current element. Whitespace isused to separate attributes from the tag-name and each other. Eachattribute has a name followed by an equals sign and the value of theattribute. The value of the attribute is enclosed in single or doublequotes. In the example below, has two attributes:att_1 and att_2. the text data Empty ElementIf an element has no content, the end-tag can be left out. In this case,a slash is added to the end of the start-tag to indicate that this is anempty element. Element content is anything that the XML specifica-tionallowstoappearbetweenastart-tagandanend-tag, suchastext, sub-elements, comments and processing instructions. An emptyelement may still have attributes, as shown below.

Document Type DefinitionA Quick Introduction to XML2The Uniform Resource Identifier (URI) in a document type declara-tion can point to a document known as a document type definition(DTD). The format for a DTD is defined in the XML Specificationand is not the same as for an XML document. A DTD may contain aset of rules that specify how the different tags in an XML documentcan be used together and the attributes that may belong to each tag.Most XML processors provide checking of XML documents againstaDTD,allowingapplicationstoquicklyandpainlesslycheckthatthe structure of an XML document is roughly correct.DTDs do not allow the specification of constraints on element andattributecontentlikethevalueoftheatt_1attributemustbeanumber. Thiskindof validationcanbehandledbyusingXMLSchema [http://www.w3.org/XML/Schema], the successor to DTDswhich defines an XML-based file format.CommentA document author can place comments in XML documents to addannotationsintendedforotherhumansreadingthedocument. Thecontentsofacommentarenotregardedaspartofthedocument'sdata. A comment is started with a less-than sign, exclamation mark,and two hyphens, and is ended with two-hyphens and a greater-thansign, as shown below. Comments may not be placed inside start- orend-tags. content XML NamespaceNamespaces in XML [http://www.w3.org/TR/REC-xml-names/] is acompanion specification to the main XML specification. It providesa facility for associating the elements and/or attributes in all or partof a document with a particular schema, as indicated by a URI. Thekey aspect of the URI is that it is unique. The value of the URI neednot haveanythingtodowiththeXMLdocument that usesit, al-though typically it would be a good location for the XML Schema orDTD that defines the rules for the document type. The URI may bemapped to a prefix which may then be used in front of tag and at-tribute names, separated by a colon. If not mapped to a prefix, theURIsetsthedefaultschemaforthecurrentelementandallofitschildren.Anamespacedeclarationlookslikeanattributeonastarttag,butmay be identified by the keyword xmlns. In the following example,thedefault namespaceis set totheCellMLnamespace, andtheMathML namespace is declared and mapped to the mathml prefix,which is then used on a element. Note that the elementandanychildrenelementswithnodefaultnamespacede-claration or namespace prefix (such as the element)will be in the CellML namespace.

... ... math goes here ...

A Quick Introduction to XML3