5.java parser
TRANSCRIPT
-
8/14/2019 5.Java Parser
1/33
XML Parsers
Parsing
-
8/14/2019 5.Java Parser
2/33
Parsing XML parsing is required so that our application can
inspect, retrieve and modify the document contents.XML parser program this sits between XMLdocument and our application. In an attempt tostandardize the way parser should work, twospecification has come out, that spells out theinterfaces that an application can expect from aparser:
SAX: the Simple API for XML: SAX processes theXML document a tag at a time and generates events.
DOM: the Document Object Model: describes thedocument as a data-structure in the form of tree. Itfirst loads the entire xml in the form of tree. Then
application can edit any traverse and edit any node.
-
8/14/2019 5.Java Parser
3/33
SAX Vs. DOM When it comes to fast, efficient reading of XML data,
SAX is hard to beat. It requires little memory, because
it does not construct an internal representation (treestructure) of the XML data. Instead, it simply sendsdata to the application as it is read your applicationcan then do whatever it wants to do with the data it
sees.But you cant go back to an earlier position orleap ahead to a different position.
In general, it works well when you simply wantto read data and have the application act on it.
DOM is not suitable for the above since it has to readthe entire data before it acts on it. Also it requiresmore memory.
But when you need to modify an XML structure
especially when you need to modify it interactively,an in-memory structure like the Document Object
-
8/14/2019 5.Java Parser
4/33
JAXP APITHE Java API for XML Processing (JAXP) is for processing
XML data using applications written in the Java programming
language.
JAXP leverages the parser standards SAX (Simple API for
XML Parsing) and DOM (Document Object Model) so that you
can choose to parse your data as a stream of events or to build
an object representation of it.JAXP also supports the XSLT (XML Stylesheet Language
Transformations) standard, giving you control over the
presentation of the data and enabling you to convert the data to
other XML documents or to other formats, such as HTML.JAXP also provides namespace support, allowing you to work
with DTDs that might otherwise have naming conflicts.
JAXP comes with standard java SDK.
-
8/14/2019 5.Java Parser
5/33
Steps to write application
1. Obtain a parser object2. Obtain a source of XML data
3. Give that source to the parser to parse. JAXP has just Interfaces for SAX and DOM
and abstract classes that provide factorymethods for obtaining instances of parserand an XML data source.
4 packages:
org.xml.sax: SAX Distribution org.xml.sax.helper: SAX Distribution org.w3c.dom: DOM in java javax.xml.parsers: JAXP distribution
-
8/14/2019 5.Java Parser
6/33
SAX Programming model
Not a W3C standard but widely adopted includingIBM and Sun.
The standard SAX distribution for java contains 2
packages: org.xml.sax
org.xml.sax.helpers.
They contain 11 classes and interfaces.
-
8/14/2019 5.Java Parser
7/33
Classes
Classes related to Parser: org.xml.sax.XMLReader is the interface that
an XML parser's SAX2 driver must implement. It isan Interface for reading an XML document using
callbacks. javax.xml.parsers.SAXParser defines the
API that wraps an XMLReader implementationclass. An instance of this class can be obtainedfrom thejavax.xml.parsers.SAXParserFactory.newSAXParser() method.
-
8/14/2019 5.Java Parser
8/33
Classes related to application that we write:
Contain interface calledorg.xml.sax.ContentHandler:This is the main interface
that most SAX applications implement.This interface define themethods which the parser class will use as call backs. TheParser class excepts an object of this type to be passed in itsconstructor.
org.xml.sax.helpers.DefaultHandleris a class thatimplements ContentHandler. Default base class for SAX2event handlers.
Exception classes:SAXException,
SAXParserExceptionHelper classes: SAXParserFactory
When parser reaches the end of the document, the only data inthe memory is what your application saved.
-
8/14/2019 5.Java Parser
9/33
SAX Programming model
XML source
DTD
(optional) SAXParser
calls
handlermethods
startDocument
startElement
characters
endElement
endDocument
etc
output
Class implementing ContentHandler
SAXParserFactory
2.input
2. input
1. creates 2. input
e
v
e
n
t
s
-
8/14/2019 5.Java Parser
10/33
org.xml.sax.ContentHandler
It is this interface which declares the eventhandling methods of SAX.void characters(char ch[], intstart, int length)
void startDocumentvoid endDocument()public void startElement(String uri,String localName, String qName,
Attributes attributes)void endElement(String uri, StringlocalName, String qName)
void processingInstruction(Stringtarget, String data)
-
8/14/2019 5.Java Parser
11/33
DefaultHandler andSAXParser
DefaultHandler: The easiest way to implementContentHandler interface is to extend theDefaultHandler class, defined in theorg.xml.sax.helpers package.
SAXParserFactory, SAXParser:SAXParseris an abstract class. The staticnewInstance()method ofSAXParserFactory returns a newconcrete implementation of this class. It throws aParserConfigurationExceptionif it is unable
to produce a parser that matches the specifiedconfiguration of options.
Xerces Parser from Apache: implements the Parserand uses JAXP API (org.apache.xerces.jaxp).
-
8/14/2019 5.Java Parser
12/33
//Program 1: Counting no. of elements
import java.io.*;
import org.xml.sax.Attributes;
import javax.xml.parsers.SAXParser;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParserFactory;public class CountSax extends DefaultHandler{
public static void main(String s[]) throwsException{
if (s.length !=1){
System.out.println("Usage: cmd filename");
System.exit(0);
-
8/14/2019 5.Java Parser
13/33
// Use the default (non-validating) parser
SAXParserFactoryfactory=SAXParserFactory.newInstance();
/*Creates a new instance of a SAXParser using the currently
configured factory parameters.*/
SAXParser saxParser=factory.newSAXParser();
File f= new File(s[0]);
if(f.exists())
// Parse the input
saxParser.parse(f,new CountSax());
else
System.out.println("unknown file");
}
-
8/14/2019 5.Java Parser
14/33
static private int ele=0;
public void startDocument(){ele=0;}
public void startElement(String uri, StringlocalName, String qName, Attributes attrs)
{ ele++;}
public void endDocument(){
System.out.println("Number of elements :"+ele);
}}
Execution:
java CountSax note.xml
Number of elements :4
-
8/14/2019 5.Java Parser
15/33
/*Program 2: Creating HTML document to represent
note.xml*/
import java.io.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
import org.xml.sax.helpers.DefaultHandler;
public class NoteSax extends DefaultHandler{
PrintWriter out;
public NoteSax()throws Exception{out= new PrintWriter(new BufferedWriter(newFileWriter("note.html")));
}
-
8/14/2019 5.Java Parser
16/33
public static void main(String s[]) throwsException{
if (s.length !=1){
System.out.println("Usage: cmd filename");
System.exit(0);}
SAXParserFactory
factory=SAXParserFactory.newInstance();
SAXParser saxParser=factory.newSAXParser();
File f= new File(s[0]);
if(f.exists())
saxParser.parse(f,new NoteSax());
else
System.out.println("unknown file");}
-
8/14/2019 5.Java Parser
17/33
public void startDocument(){}
public void startElement(String uri, StringlocalName, String qName, Attributes attrs){
if(qName.equals("note"))
out.println("Note");
if(qName.equals("to"))out.println(" To, ");
if(qName.equals("from"))
out.println("
-from ");if(qName.equals("body") && (attrs.getLength()>0)){for (int i = 0; i < attrs.getLength(); i++) {
String aName = attrs.getQName(i);
Strin value=attrs. etValue(i);
-
8/14/2019 5.Java Parser
18/33
if(aName.equals("type")){
if( value.equals("warm"))
out.println("");
if( value.equals("cold"))
out.println("");
if( value.equals("formal"))
out.println(""); }
if(aName.equals("subject"))
out.println("" +value+":");}//end of for
}// end of if
}
-
8/14/2019 5.Java Parser
19/33
public void endElement(String uri, StringlocalName, String qName, Attributes attrs){
if(qName.equals("body"))
out.println("");
if(qName.equals("from"))
out.println("
");}public void endDocument(){
out.println("");
out.close();}
public void characters(char buf[], int offs, intl) throws SAXException{
String s = new String(buf, offs, l);
out.println(s+ "
");}} -
8/14/2019 5.Java Parser
20/33
you
If today was aperfect day then there would be notomorrowGod
Execution:
java CountSax note.xml
creates note.html
note.xml
-
8/14/2019 5.Java Parser
21/33
Note
To,
you
Contemplation:
If today was a perfect day then therewould be no tomorrow
-from
God
note.html
-
8/14/2019 5.Java Parser
22/33
Document object model. It is a standard
produced by W3C .All DOM processing assumes that you haveread and parsed a complete document into
memory so that all parts are equally accessible.The data is represented in the form of tree.
Disadvantages
4.It is pretty clumsy if you want to pick out a fewelements.
5.Memory requirement could get restrictive
DOM
-
8/14/2019 5.Java Parser
23/33
org.w3c.dom package
Interfaces:Node
Document(extends
Node
):The Documentinterface represents the entire HTML or XMLdocument.
NodeList interface provides the abstractionof an ordered collection of nodes
There are static methods inNode interface tocheck element type.Node.ELEMENT_NODE,Node. CDATA_SECTION_NODE
-
8/14/2019 5.Java Parser
24/33
Methods
Document Methods:public NodeListgetElementsByTagName(String tagname )
public Element
createElement(String tagName) throwsDOMException
public Comment createComment(String data)public Text createTextNode(String data)
NodeList Methods:public int getLength()public Node item(int index)
-
8/14/2019 5.Java Parser
25/33
Node Methods:
Methods to access information about current node:
public String getNodeName()public short getNodeType()
public NodeList getChildNodes()
Methods to modify the nodes children
public Node appendChild(Node newChild) throwsDOMException
public Node removeChild(Node oldChild) throwsDOMException
public Node replaceChild(Node newChild,Node oldChild) throws DOMException
-
8/14/2019 5.Java Parser
26/33
DOM Programming model
XML source
DocumentBuilderNode
DTD
(optional)
Search Mechanism
Output
Recursively search nodes
3.Parse
and build
the tree
Document (DOM)2.input 2.input
DocumentBuilderFactory
1.creates
// Program 1: counting no of elements
-
8/14/2019 5.Java Parser
27/33
// Program 1: counting no. of elements
import org.w3c.dom.*;
import
javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import java.io.*;
public class CountDom{
public static void main(String str[])throwsException{
File f= new File(str[0]);
Node n= readFile(f);
int ele=getElementCount(n);
System.out.println(ele);}
-
8/14/2019 5.Java Parser
28/33
public static Document readFile(File f) throwsException{
Document d;
DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
DocumentBuilder db=dbf.newDocumentBuilder();
d=db.parse(f);
return d;}
public static int getElementCount(Node node){
if(node==null)
return 0;
int sum=0;
boolean
-
8/14/2019 5.Java Parser
29/33
booleanisElement=(node.getNodeType()==Node.ELEMENT_NODE);
if(isElement)sum=1;
NodeList children= node.getChildNodes();
if(children==null)return sum;
for(int i=0;i
-
8/14/2019 5.Java Parser
30/33
// Program 2: Adding a comment and a node anddisplaying
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import java.io.*;
import org.w3c.dom.*;
public class AddNodeDom{
static Node n1;
static Comment c;
public static void main(String str[])throwsException{
File f= new File(str[0]);
Document n= readFile(f);
setElements(n);
-
8/14/2019 5.Java Parser
31/33
setElements(n);
display(n);
System.out.println("done");
}
public static Document readFile(File f) throwsException{
Document d;
DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
DocumentBuilder db=dbf.newDocumentBuilder();
d=db.parse(f);
return d;
}
-
8/14/2019 5.Java Parser
32/33
public static void display(Node node){
if(node.getNodeType()==Node.ELEMENT_NODE)
System.out.print(node.getNodeName()+":");if(node.getNodeType()==Node.TEXT_NODE ||node.getNodeType()==Node.COMMENT_NODE )
System.out.println(node.getNodeValue().trim());
NodeList children= node.getChildNodes();
if(children!=null)
for(int i=0;i
-
8/14/2019 5.Java Parser
33/33
public static void setElements(Node node){
if(node==null) return;
booleanisEle=(node.getNodeType()==Node.ELEMENT_NODE);
if(isEle && node.getNodeName().equals("display-name")) n1= node;
if(isEle && node.getNodeName().equals("servlet"))
{ node.appendChild(c);
node.appendChild(n1);}
NodeList children= node.getChildNodes();
if(children!=null)
for(int i=0;i