sax. what is sax sax 1.0 was released on may 11, 1998. sax is a common, event-based api for parsing...
TRANSCRIPT
SAX
What is SAX
SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML doc
uments Primarily a Java API but there implementations in most
languages The current version is SAX 2.0.1, and there are versions
for several programming language environments other than Java
How does SAX work
An XML document is seen as a series of “events” Unlike DOM, SAX does not store information in an internal tre
e structure SAX is able to parse huge documents (think gigabytes) withou
t having to allocate large amounts of system resources If processing is built as a pipeline, it doesn’t have to wait for th
e data to be converted to an object; it can go to the next process once it clears the preceding callback method
SAX does not allow random access to the file; it proceeds in a single pass, firing events as it goes
SAX Structure(1/4)
SAX Structure(2/4)
SAXParserFactory:A SAXParserFactory object creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory.
SAXParser:The SAXParser interface defines several kinds of parse() methods. In general, it passes an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.
SAXReader:The SAXParser wraps a SAXReader. Typically, it doesn't care about that, but every once in a while it needs to get hold of it using SAXParser's getXMLReader() so that it can configure it. It is the SAXReader that carries on the conversation with the SAX event handlers it defines.
SAX Structure(3/4)
DefaultHandler:Not shown in the diagram, a DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so it can override only the ones it is interested in.
ContentHandler:Methods such as startDocument, endDocument, startElement, and endElement are invoked when an XML tag is recognized. This interface also defines the methods characters and processingInstruction, which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively.
EntityResolver:The resolve Entity method is invoked when the parser must identify data identified by a URI
SAX Structure(4/4)
ErrorHandler:Methods error, fatalError, and warning are invoked in response to various parsing errors. The default error handler throws an exception for fatal errors and ignores other errors (including validation errors). That's one reason you need to know something about the SAX parser, even if you are using the DOM. Sometimes, the application may be able to recover from a validation error. Other times, it may need to generate an exception. To ensure the correct handling, you'll need to supply your own error handler to the parser.
DTDHandler:Defines methods you will generally never be called upon to use. Used when processing a DTD to recognize and act on declarations for an unparsed entity.
SAX Event
startDocument endDocument startElement endElement characters
Pull Parsing Versus Push Parsing Streaming pull parsing refers to a programming model
in which a client application calls methods on an XML parsing library when it needs to interact with an XML infoset--that is, the client only gets (pulls) XML data when it ex
plicitly asks for it. Streaming push parsing refers to a programming mode
l in which an XML parser sends (pushes) XML data to the client as the parser encounters elements in an XML infoset--that is, the parser sends the data whether or not the client is ready to use it at that time.
XML Parser API Feature Summary Feature StAX SAX DOM API Type Pull,streaming Push,streaming In memory tree Ease of Use High Medium High
XPathCapability No No Yes
CPU and MemoryEfficiency Good Good Varies
Forward Only Yes Yes No
Read XML Yes Yes Yes
Write XML Yes No Yes
Create, Read, Update, Delete No No Yes
XML Parser and APIs supporting SAX Xerces
Xerces is a family of software packages for parsing and manipulating XML, part of the Apache XML project
MSXML Microsoft XML Core Services (MSXML) is a set of services that allow a
pplications written in JScript, VBScript and Microsoft Visual Studio 6.0 to build XML-based applications
Crimson XML JAXP: Java API for XML Processing
The Java API for XML Processing, or JAXP, is one of the Java XML programming APIs. It provides the capability of validating and parsing XML documents
SAX Example
public class MySAXApp extends DefaultHandler{
XMLReader xr = XMLReaderFactory.createXMLReader();MySAXApp handler = new MySAXApp();xr.setContentHandler(handler);xr.setErrorHandler(handler);FileReader r = new FileReader(file);xr.parse(new InputSource(r));
//////////////////////////////////////////////////////////////////// // Event handlers. ////////////////////////////////////////////////////////////////////}
public void startDocument (){
// TODO: add customized code here}public void endDocument (){
// TODO: add customized code here}public void startElement (String uri, String name, String qName, Attrib
utes atts) {
// TODO: add customized code here }public void endElement (String uri, String name, String qName){
// TODO: add customized code here}
Applications of XML Stream Processing
content-based XML routing selective dissemination of information continuous queries processing of scientific data stored in large X
ML files
Selective Dissemination of Information
The use of selective approaches to dissemination in order to avoid users with unnecessary information.
Applications: stock and sports tickers traffic information systems electronic personalized newspapers entertainment delivery
Typical SDI Systems
Representation of user profiles simple keyword matching “bag of words” Information Retrieval (IR) techniqu
es Limited ability Inefficiency of filtering
Selective Dissemination of Information
References
M. Altinel, M. J. Franklin. Efficient Filtering of XML Documents for Selective Dissemination of Information. In VLDB Conf., Sep. 2000.
Y. Diao, P. Fischer, M. Franklin, and R. To. Yfilter: Efficient and scalable Filtering of XML documents. In Proceedings of the International Conference on Data Engineering, San Jose, California, February 2002.