internet technologies1 xml grammars 95-733 internet technologies

Download Internet Technologies1 XML Grammars 95-733 Internet Technologies

Post on 15-Jan-2016

213 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Internet Technologies*XML Grammars95-733 Internet Technologies

    Internet Technologies

  • Internet Technologies*XML Grammars: Three Major Uses

    1. Validation

    Code Generation

    Communication

    Internet Technologies

  • Internet Technologies*XML ValidationSources for this lecture:

    Data on the Web Abiteboul, Buneman and Suciu XML in a Nutshell Harold and Means The XML Companion Bradley

    The validation examples were originally tested with an older parserand so the specific outputs may differ from those shown.

    Internet Technologies

  • Internet Technologies*XML ValidationA batch validating process involves comparing the DTD against a complete document instance and producing a report containing any errors or warnings.

    Consider batch validation to be analogous to program compilation, with similar errors detected.

    Interactive validation involves constant comparison of the DTDagainst a document as it is being created.

    Internet Technologies

  • Internet Technologies*XML ValidationThe benefits of validating documents against a DTD include:

    Programmers can write extraction and manipulation filters without fear of their software ever processing unexpected input.

    Using an XML-aware word processor, authors and editors can be guided and constrained to produce conforming documents. Consider how Netbeans allows you to edit web.xml files.

    Internet Technologies

  • Internet Technologies*XML Validation ExamplesXML elements may contain further, embedded elements, andthe entire document must be enclosed by a single documentelement.

    These are recursive hierarchical structures.

    A Document Type Definition (DTD) contains rules for each element allowed within a specific class of documents.

    Internet Technologies

  • Internet Technologies*Things the DTD does not do: Specify the document root. Specify the number of instances of each kind of element. (Or, its rather hard to do.) Describe the character data inside an element (the precise syntax).DTDs dont naturally handle namespaces. The XML schema language is much more recent and improves on DTDs. We have programmer level type specifications. To see a real DTD, view source on http://www.silmaril.ie/software/rss2.dtd

    Internet Technologies

  • Internet Technologies*Well run this program against several xml fileswith DTDs. Well study thecode soon.// Validate.java using Xerces

    import java.io.*;

    import org.xml.sax.ErrorHandler;import org.xml.sax.SAXException;import org.xml.sax.SAXParseException;import org.xml.sax.XMLReader;import org.xml.sax.InputSource;import org.xml.sax.helpers.XMLReaderFactory;import org.xml.sax.helpers.DefaultHandler;

    This slide shows the importedclasses.

    Internet Technologies

  • Internet Technologies*public class Validate { public static boolean valid = true;

    public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); }

    Here we check if the commandline is correct.

    Internet Technologies

  • Internet Technologies*try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser");

    // request validation reader.setFeature("http://xml.org/sax/features/validation", true);

    // associate an InputSource object with the file name InputSource inputSource = new InputSource(argv[0]);

    // go ahead and parse reader.parse(inputSource); }

    Internet Technologies

  • Internet Technologies* catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); }}// Catch any errors or fatal errors here.// The parser will handle simple warnings.

    Internet Technologies

  • Internet Technologies*

    100 5 3 6

    XML DocumentDTDValid document is true

    Internet Technologies

  • Internet Technologies*

    100 5 3 6

    XML DocumentDTD on the Web?VERY NICEValid document is true

    Internet Technologies

  • Internet Technologies*

    ]>

    100 5 3 6

    XML Document withan internal subsetValid document is true

    Internet Technologies

  • Internet Technologies*

    100 5 3 6

    XML DocumentDTDValid document is false

    Internet Technologies

  • Internet Technologies*

    100 5 3 6

    100 5 3 6

    XML Document

    Internet Technologies

  • Internet Technologies*

    DTDC:\McCarthy\www\examples\sax>java Validate FixedFloatSwap.xmlQuantity Indicators ? 0 or 1 time + 1 or more times * 0 or more timesValid document is true

    Internet Technologies

  • Internet Technologies*Is this a valid document?

    ]>

    Alan Turing computer scientist cryptographer

    Sure!

    Internet Technologies

  • Internet Technologies*The locations where document text data is allowed are indicated by the keyword PCDATA (Parsed Character Data).

    100 5 2000 2002 6

    XML Document

    Internet Technologies

  • Internet Technologies*

    C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xmlorg.xml.sax.SAXParseException: Element "NumYears" does not allow "StartYear" --(#PCDATA)org.xml.sax.SAXParseException: Element type "StartYear" is not declared.org.xml.sax.SAXParseException: Element "NumYears" does not allow "EndYear" -- (#PCDATA)org.xml.sax.SAXParseException: Element type "EndYear" is not declared.Valid document is falseOutput DTD

    Internet Technologies

  • Internet Technologies*There are strict rules which must be applied when an element is allowed to contain both text and child elements.The PCDATA keyword must be the first token in the group, and the group must be a choice group (using | not ,).The group must be optional and repeatable.This is known as a mixed content model.

    Mixed Content

    Internet Technologies

  • Internet Technologies*

    DTD

    H2O is water.

    XML DocumentValid document istrue

    Internet Technologies

  • Internet Technologies*Is this a valid document?

    ]>

    Alan Turing broke codes during World War II. He very precisely defined the notion of "algorithm". And so he had several professions: computer scientist cryptographer And mathematician

    Sure!

    Internet Technologies

  • Internet Technologies*How about this one?java Validate mixed.xmlorg.xml.sax.SAXParseException:The content of element type "page" must match "(paragraph)+".Valid document is false

    ]>

    The following is a paragraph marked up in XML. Alan Turing broke codes during World War II. He very precisely defined the notion of "algorithm". And so he had several professions: computer scientist cryptographer And mathemetician

    Internet Technologies

  • Internet Technologies*

    100 5 3 6 will not be parsed for markup]]>

    XML DocumentDTDCDATA Section

    Internet Technologies

  • Internet Technologies*Recursion

    ]>

    A DTD is a context-free grammar

    java Validate recursive1.xmlValid document is true

    Internet Technologies

  • Internet Technologies*How about this one?

    ]>

    Alan Turing would like this Alan Turing would like this

    java Validate recursive1.xmlorg.xml.sax.SAXParseException:The content of element type"tree" must match "(node)".Valid document is false

    Internet Technologies

  • Internet Technologies*Relational Databases and XMLConsider the relational database r1(a,b,c), r2(c,d)

    r1: a b c r2: c d a1 b1 c1 c2 d2 a2 b2 c2 c3 d3 c4 d4

    How can we represent this database with an XML DTD?

    Internet Technologies

  • Internet Technologies*Relations

    ]>

    a1 b1 c1 a1 b1 c1 c2 d2 c3 d3 c4 d4

    java Validate Db.xmlValid document is trueThere is a small problem.

    Internet Technologies

  • Internet Technologies*Relations

    ]>

    a1 b1 c1 a1 b1 c1 c2 d2 c3 d3 c4 d4

    The order of the relationsshould not count and neithershould the order ofcolumns within rows.

    Internet Technologies

  • Internet Technologies*AttributesAn attribute is associated with a particular element by the DTDand is assigned an attribute type.

    The attribute type can restrict the range of values it can hold.

    Example attribute types include :

    CDATA indicates a simple string of characters NMTOKEN indicates a word or