java and xml platform independence meets language independence! cc432 / short course 507 lecturer:...
Post on 20-Dec-2015
213 views
TRANSCRIPT
Java and XMLPlatform independence meets
language independence!
CC432 / Short Course 507
Lecturer: Simon Lucas
University of Essex
Spring 2002
Main Topics
• Introduction
• Reading and Writing XML
• SAX
• DOM and JDOM
• Serializing Objects to XML
• XMLC
• Concluding remarks
Introduction
• Java is a platform independent language – runs anywhere where we have a JVM
• And is well-connected – powerful java.net library
• Yet – many people persist in using other languages – C/C++, VB etc!
Why Java and XML?
• The common format that allows applications written in any language to communicate is XML
• Therefore, very important to make Java read and write XML
• Can also design object models in Java – and translate them into XML
• Leverage powerful design tools such as Together for this purpose
Reading and Writing XML• To gain an insight into what this involves –
we’ll work through a simplified model of XML• Our simplified model is as follows:
– A tree of elements
• Each element either has:– Text,
• OR– A set of Child Elements
Element.java
• The Element class defines the object model for this kind of document
• It also includes some String constants that dictate what characters will be used to delimit the elements
• These are chosen to be standard XML characters
• Currently, no checking that node text does not contain these special characters!!!
Element.java - Ipackage xml.serial;import java.util.*;import java.io.*;
public class Element { static String TAG_OPEN = "<"; static String TAG_CLOSE = ">"; static String END_TAG_OPEN = "</"; static int TAB = 2; static int INIT_INDENT = 0; static char SPACE = ' ';
protected Vector children; protected StringBuffer text; protected String name;
public Element( String name ) { this.name = name; children = null; text = null; }
Element.java II
final public Vector getChildren() {
return children;
}
final public String getText() {
return text.toString();
}
final public String getName() {
return name;
}
Element.java III final public void setText(String text)
throws Exception {
// should substitute for any nasty characters
// e.g. at least < and >
if ( children == null) {
this.text = new StringBuffer( text );
}
else {
throw new Exception(
"Cannot add text to a node
that already has child elements");
}
}
Element.java IV final public void addChild(Element child)
throws Exception
{
if ( text == null ) {
if (children == null) {
children = new Vector();
}
children.addElement( child );
}
else {
throw new Exception(
"Cannot add elements to a node
that already has text");
}
}
Reading and Writing Elements• Given this simple Element class• We can now write code to serialize a tree of
these elements to an XML doc• And to de-serialize such a document back to
the tree of Elements in memory• Hence, we get to write a simple parser for this
subset of XML!• ElementTest creates an element-only
document and writes it to a file
ElementTest.javapackage xml.serial;import java.io.*;
public class ElementTest { public static void main(String[] args) throws
Exception { Element el = new Element("object"); PrintWriter pw = new PrintWriter( System.out ); // el.write( pw ); Element value = new Element( "value" ); value.setText( "Hello" ); el.addChild( value ); el.write( pw ); pw.println( "And now the static version..." ); ElementWriter.write( el , pw ); pw.flush(); }}
Running ElementTest
>java xml.serial.ElementTest
<object>
<value>
Hello
</value>
</object>
SAX
Event-based XML processing
SAX – Main Features
• Serial processing of an XML document
• Register an event handler
• The SAX parser then reads the XML document from start to end
• Calls the methods of the event handler in response to various parts of the document
Example Events
• startDocument()• startElement()• characters()• endElement()• endDocument()• + many others!
SAX-based program pattern
• Define a class that implements the ContentHandler interface
• Easiest way is to extend DefaultHandler• DefaultHandler provides NO-OP
implementations of all the methods in the ContentHandler interface
• Override whichever methods you need to for your application
Using your Custom ContentHandler
• Import the necessary packages• Create a new SAXParser• Get an XMLReader from the Parser• Set the ContentHandler for the XMLReader to
be your own Customized ContentHandler• Set up an ErrorHandler for the XMLReader –
this is a class to handle any parsing errors• Call the XMLReader to parse an XML
Document
Counting Node Types
• This program is the Hello World of SAX• At the start of the document we create a
Hashtable to count the occurrences of each type of element
• We override startElement() to update the count in the Hashtable with each element name we see
• Override endDocument() to print a summary
SAXTest Program Structure
• SAXTest uses CountNodes
• CountNode extends DefaultHandler
SAXTest
DefaultHandler
CountNodes
SAXTestpackage courses.xml;import javax.xml.parsers.*;import org.xml.sax.*;import org.xml.sax.helpers.*;
public class SAXTest extends DefaultHandler { static String parserClass = "org.apache.xerces.parsers.SAXParser"; public static void main(String[] args) throws Exception { XMLReader reader = XMLReaderFactory.createXMLReader( parserClass ); reader.setContentHandler( new CountNodes() ); reader.setErrorHandler( new SimpleErrorHandler(System.err)); reader.parse( args[0] ); }}
CountNodes
• We shall override the following:– startDocument()– startElement()– endElement()
CountNodes - declaration
package courses.xml;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
public class CountNodes extends DefaultHandler
{
private Hashtable tags;
// …
CountNodes: startDocument()
• Create a new hashtable for each new document
public void startDocument() throws SAXException
{
tags = new Hashtable();
}
CountNodes: startElement()public void startElement(String namespaceURI, String localName, String rawName, Attributes atts) throws SAXException{ String key = localName; Object value = tags.get(key); if (value == null) { // Add a new entry tags.put(key, new Integer(1)); } else { // Get the current count and increment it int count = ((Integer)value).intValue(); count++; tags.put(key, new Integer(count)); }}
CountNodes: endDocument()
• Summarise the Hashtable contents public void endDocument() throws SAXException {
Enumeration e = tags.keys();
while (e.hasMoreElements()) {
String tag = (String)e.nextElement();
int count =
((Integer) tags.get(tag)).intValue();
System.out.println(
"Tag <" + tag + "> occurs " +
count + " times");
}
}
Running SAXTest: Hello.xml
<?xml version="1.0" ?><greetings> <greeting lang="english"> hello </greeting> <greeing> bonjour </greeing> <greeting> hola! </greeting></greetings>
Output
>java courses.xml.SAXTest courses\xml\hello.xml
Tag <greeing> occurs 1 times
Tag <greetings> occurs 1 times
Tag <greeting> occurs 2 times
Notes on CountNodes
• Note the parameters to startElement()• We get direct access to that element only –
that is its:– Namespace– Attributes– Element Name (local name)– Raw Name (namespace + local name)
• We must work for any access beyond this!
SAX Exercise• By overriding:
– startElement()– endElement()– startDocument()– endDocument()
• provide a ContentHandler prints out how many times a greeting element was that child of another greeting element
SAX Filter Pipelines
• In the Count Nodes example, the XMLReader read from an XML document source
• Also possible to read from the output of a ContentHandler
• In this way can plug together modular filters to achieve complex effects
DOM and JDOM
Document Object Model
and
Java Document Object Model
DOM
• A language-independent object model of XML documents
• Memory-based• The entire document is parsed – read in to
memory• This allows direct access to any part of the
document• But limits the size of document that can be
handled
JDOM
• Because DOM is a language-independent spec., there are features that seem awkward from a Java perspective
• JDOM is a Java-based system, developed by Brett McLaughlin and Jason Hunter
• It aims to offer most of the features of DOM, but make them easier to exploit to Java programmers
Hello JDOM World
• We’ll look at a program that– creates a document– adds a few elements to it– writes it to an output stream
package xml.jdom;
import org.jdom.Element;import org.jdom.Document;import org.jdom.output.XMLOutputter;
public class HelloWorld { public static void main(String[] args) throws Exception
{ Element root = new Element("Greeting"); root.setText("Hello world!"); Element child = new Element("Gday"); child.setText("The kid <bold> is \"cool </bold>"); child.addAttribute( "color" , "red" ); root.addContent( child );
Document doc = new Document(root);
XMLOutputter output =
new XMLOutputter( " " , true ); output.output(
doc, new java.io.PrintWriter( System.out ) );
String text = root.getText(); }}
Reading XML into JDOMpackage xml.jdom;
import org.jdom.Document;import org.jdom.DocType;import org.jdom.Element;import org.jdom.input.SAXBuilder;import org.jdom.output.XMLOutputter;
public class InputTest { public static void main(String[] args) throws Exception
{ String filename1 = "xml/slides/slides.xml"; SAXBuilder builder = new SAXBuilder(); System.out.println("Building..."); Document doc = builder.build( filename1 ); System.out.println( doc ); }}
Processing XML with JDOM
• Now we have the document tree in memory
• Processing is typically much simpler than with SAX
• Though for simple programs, this is not always so
• Let’s begin by considering how to write the Count Nodes program with JDOM
Some API
• Commonly used functions:– getChildren() – gets all the child elements– getContent() – gets all the content of a node – Pis,
Entities, Child elements etc– addContent() – adds any kind of content to a node– addChild()– get/setText() deals with the text of a node– getParent() – does what you expect!
Count Nodes in JDOM
• Strategy:– Create a hashtable– Read in the document– Walk the tree, keeping count in the
hashtable– We walk the tree by recursively visiting all
the children of a node
CountNodes - Structure– CountTest.java reads in the XML doc as a JDOM
Document– Creates an instance of CountNodes– Calls the walkTree method of CountNodes on the
document root element– CountNodes defines three methods
• Constructor – initialises the Hashtable• walkTree – recursively walks the document• count – updates entries in the Hastable• printSummary
– Compare this with the SAX implementation
CountTest.javapackage xml.jdom;import org.jdom.*;import org.jdom.input.SAXBuilder;
public class CountTest { public static void main(String[] args) throws Exception { String filename1 = "courses/xml/hello.xml"; SAXBuilder builder = new SAXBuilder(); Document doc = builder.build( filename1 );
CountNodes counter = new CountNodes(); counter.walkTree( doc.getRootElement() ); counter.printSummary( System.out ); }}
CountNodes.javapackage xml.jdom;
import java.util.*;import java.io.*;import org.jdom.*;
public class CountNodes { Hashtable h;
public CountNodes() { h = new Hashtable(); } // … continued
CountNodes – walkTree()
public void walkTree(Element el) {
count( el.getName() );
List children = el.getChildren();
for (Iterator i = children.iterator(); i.hasNext() ; ) {
walkTree( (Element) i.next() );
}
}
CountNodes – count() public void count(String key) {
Object value = h.get(key);
if (value == null) {
// Add a new entry
h.put(key, new Integer(1));
}
else {
// Get the current count and increment it
int count = ((Integer) value).intValue();
count++;
h.put(key, new Integer(count));
}
}
CountNodes – printSummary()
public void count(String key) { Object value = h.get(key); if (value == null) { // Add a new entry h.put(key, new Integer(1)); } else { // Get the current count and increment it int count = ((Integer) value).intValue(); count++; h.put(key, new Integer(count)); } }
JDOM Exercise• Write a JDOM program to print out how
many times a greeting element was that child of another greeting element
• (e.g. given a doc like Hello.xml – see above)
• (same task that we previously attempted with SAX)
JDOM Exercise Hints
• Consider the following methods:– getParent()– getName()– getChildren()
Serializing Objects to XML
Homebrew version
JSX
Serialization to XML
• First we’ll consider a home-made version
• This will be a bit simplistic – but will work on a restricted range of object classes
• BUT: the Java code to do this will be easy to understand and to analyse
Home-made Serializer
• Aim: serialize a Java Object to an XML document
• Use the Java Reflection API to navigate an Object Graph
• For each object in the graph– Create XML elements/attributes to describe
it
• Write the XML elements to a stream
Issues
• Object attributes will be mapped as elements• What about primitive attributes?
– Can either use elements– Or attributes– Attributes lead to shorter documents and are
easier to read
• Shadowed attributes – must access these and name them properly
• Arrays – full or sparse?
De-serializing from XML
• What if the class details have changed?
• What if the class is not on the classpath?
• Fatal error, or graceful degreadation with warnings?
JSX – Main Features
http://www.csse.monash.edu.au/~bren/JSX/• Developed by Brendan Macmillan at Monash
University, Melbourne• Free for non-commercial use, charge for
commercial use• Has evolved rapidly from an early prototype with
many limitations• To the current version that works well and handles
most cases• To use, just add jsx.jar to your classpath
MyClass• Simple class, with Object, double, String and
byte[] fields
package xml.serial;
/** Simple class to play with serialization to XML */
public class MyClass { MyClass child; String message; double x; byte[] a;}
JSX – Test Program
package xml.serial;import JSX.*;import java.io.*;
public class SimpleJSXTest { public static void main(String[] args) throws Exception { MyClass mc = new MyClass(); mc.a = new byte[]{0 , 1 , 2 , 3}; mc.child = new MyClass(); mc.child.message = "Middle one!"; mc.child.child = mc; ObjOut out = new ObjOut( System.out ); out.writeObject( mc ); out.flush(); }}
JSX – Example Output>javac xml\serial\SimpleJSXTest.java
>java xml.serial.SimpleJSXTest
<?jsx version="1"?>
<xml.serial.MyClass x="0.0">
<xml.serial.MyClass obj-name="child"
message="Middle one!"
x="0.0">
<alias-ref obj-name="child" alias="0"/>
<null obj-name="a"/>
</xml.serial.MyClass>
<binary-data obj-name="a" valueOf="00 01 02 03"/>
</xml.serial.MyClass>
Java, XML and Relational Databases
Creating Virtual XML Documents from ResultSets
ResultSet -> XML
• For more details see Chapter 17 of Professional Java XML
• Basic idea is this:
• Iterate over a result set– Either use this as source of SAX events– OR build an in-memory document model of
it (DOM / JDOM)
SAX Version• Wrapper around a ResultSet that implements
the XMLReader interface• In response to the parse() method, iterates
over the ResultSet• For each Row in the result set, generate a
sequence of startElement(), characters() and endElement() events
• Can use the ResultSetMetaData to name the elements – depending on the mapping convention used – which depends on final purpose
JDOM Version
• Many ways of doing this – here’s one• Start with the table root element• For each row in the result set
– Add a new <row> element– For each field in the row
• Add a new <field> element to the row element
• OR: could build from the SAX version• Usual tradeoffs apply between SAX and DOM
XMLC
Auto-generation of classes from XML Document Types (Check
this!)
XMLC
• XMLC creates Java classes from HTML or XML documents
• See tutorial at– http://staff.plugged.net.au/dwood/xmlc/
• These notes were derived from the above tutorial
• The Java classes faithfully model the document
XMLC II
• By modifying the instance variables of a class, we can insert dynamic content into the document
• Argued to be more efficient than some dynamic generation methods
A Claim for XMLC: “The best single advantage of XMLC is the
ability to completely separate HTML templates (the pages that an artist creates) from Java code (the controlling logic that programmers create).” XMLC allows artists to generate and edit HTML from design tools that support the HTML 4.0 standard”
• Homework: Read the tutorial and Discuss!!!
Concluding Remarks - I
• Rapidly evolving technology
• Can Model objects in Java and convert them to/from XML
• Can write home-made solutions using reflection
• Or use the very good JSX package
Data Modelling
• Can model data using– Relational modelling– Object modelling– XML schemas / DTDs
• BUT: try to stick to once and once only• Model in the chosen way, and use tools
to map between the different representations
SAX and DOM
• Looked at SAX and JDOM for processing XML in Java
• SAX more suitable for massive documents (but where would these come from?)
• DOM + JDOM easier to work with
Concluding Remarks ||
• Java and XML are natural partners
• Used with XSLT, can be used to create well designed web applications with:– Separation of content from presentation– Adherence to Once and Once Only
principle