sdpl 20113.2: document object model1 n how access structured documents uniformly in parsers,...
TRANSCRIPT
SDPL 2011 3.2: Document Object Model 1
How access structured documents uniformly How access structured documents uniformly in parsers, browsers, editors, databases,...? in parsers, browsers, editors, databases,...?
Overview of the W3C DOM SpecOverview of the W3C DOM Spec» Level 1, W3C Rec, Oct. 1998Level 1, W3C Rec, Oct. 1998» Level 2Level 2, W3C Rec, Nov. 2000, W3C Rec, Nov. 2000» Level 3 Level 3 ValidationValidation, , CoreCore, and , and Load and SaveLoad and Save
W3C Recs (Spring 2004)W3C Recs (Spring 2004)
W3C DOM Activity has been closedW3C DOM Activity has been closed
3.2 Document Object Model (DOM)3.2 Document Object Model (DOM)
SDPL 2011 3.2: Document Object Model 2
DOM: What is it? DOM: What is it?
An object-based, language-neutral API for An object-based, language-neutral API for XML and HTML documentsXML and HTML documents
– Allows programs and scripts to build, access, and modify Allows programs and scripts to build, access, and modify documentsdocuments
– Supports designing of Supports designing of querying, filtering, querying, filtering, transformation, formatting etc. transformation, formatting etc.
applications on top of DOM implementationsapplications on top of DOM implementations
Instead of “Instead of “SSerial erial AAccess ccess XXML” could think as ML” could think as ““DDirectly irectly OObtainable in btainable in MMemory”emory”
SDPL 2011 3.2: Document Object Model 3
DOM DOM structure model structure model
Based on O-O concepts:Based on O-O concepts:– objectsobjects (encapsulation of data and methods) (encapsulation of data and methods)– methodsmethods (to access or change object’s state) (to access or change object’s state)– interfacesinterfaces (declaration of a set of methods) (declaration of a set of methods)
Somewhat similar to the XPath data model (to be Somewhat similar to the XPath data model (to be discussed with XSLT and XQuery) discussed with XSLT and XQuery) syntax-tree syntax-tree– Tree structure implied by abstract relationships defined Tree structure implied by abstract relationships defined
by the API; Data structures of an implementation may by the API; Data structures of an implementation may differdiffer
SDPL 2011 3.2: Document Object Model 4
<invoice form="00" <invoice form="00" type="estimated">type="estimated"> <addressdata><addressdata> <name>John Doe</name><name>John Doe</name> <address><address> <streetaddress>Pyynpolku 1<streetaddress>Pyynpolku 1 </streetaddress></streetaddress> <postoffice>70460 KUOPIO<postoffice>70460 KUOPIO </postoffice></postoffice> </address></address> </addressdata></addressdata> ......
DOM structure modelDOM structure model
invoiceinvoice
namename
addressdataaddressdata
addressaddress
form="00"form="00"type="estimated"type="estimated"
John DoeJohn Doe streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
......
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
SDPL 2011 3.2: Document Object Model 5
Structure of DOM Level 1Structure of DOM Level 1
I: DOM I: DOM Core InterfacesCore Interfaces– Fundamental interfacesFundamental interfaces
» basic interfaces: Document, Element, Attr, Text, ...basic interfaces: Document, Element, Attr, Text, ...– "Extended" (XML specific) interfaces "Extended" (XML specific) interfaces
» CDATASection, DocumentType, Notation, Entity, CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstructionEntityReference, ProcessingInstruction
II: DOM II: DOM HTML InterfacesHTML Interfaces– more convenient access to HTML documentsmore convenient access to HTML documents– we'll ignore thesewe'll ignore these
SDPL 2011 3.2: Document Object Model 6
DOM Level 2DOM Level 2
– Level 1: basic representation and manipulation of Level 1: basic representation and manipulation of document structure and content document structure and content (No access to the contents of a DTD)(No access to the contents of a DTD)
DOM Level 2 adds DOM Level 2 adds – support for namespacessupport for namespaces– Document.getElementById("id_val")Document.getElementById("id_val"), ,
to access elements by ID attr valuesto access elements by ID attr values– optional features (we’ll skip these)optional features (we’ll skip these)
» interfaces to document views and style sheetsinterfaces to document views and style sheets» an event model (for user actions on elements)an event model (for user actions on elements)» methods for traversing the document tree and manipulating methods for traversing the document tree and manipulating
regions of document (e.g., selected in an editor)regions of document (e.g., selected in an editor)
SDPL 2011 3.2: Document Object Model 7
DOM Language BindingsDOM Language Bindings
Language-independence:Language-independence:– DOM interfaces are defined using OMG Interface DOM interfaces are defined using OMG Interface
Definition Language (IDL, defined in Corba Definition Language (IDL, defined in Corba Specification)Specification)
Language bindings (implementations of Language bindings (implementations of interfaces) defined in the Recommendation forinterfaces) defined in the Recommendation for– Java (See the Java API doc) andJava (See the Java API doc) and– ECMAScript (standardised JavaScript)ECMAScript (standardised JavaScript)
SDPL 2011 3.2: Document Object Model 8
Core Interfaces: Core Interfaces: NodeNode & its variants & its variants
NodeNode
CommentComment
DocumentFragmentDocumentFragment AttrAttr
TextText
ElementElement
CDATASectionCDATASection
ProcessingInstructionProcessingInstruction
CharacterDataCharacterData
EntityEntityDocumentTypeDocumentType NotationNotation
EntityReferenceEntityReference
““Extended Extended interfaces”interfaces”
DocumentDocument
SDPL 2011 3.2: Document Object Model 9
DOM interfaces: DOM interfaces: NodeNode
invoiceinvoice
namename
addressdataaddressdata
addressaddress
form="00"form="00"type="estimatedbill"type="estimatedbill"
John DoeJohn Doe streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
NodeNodegetNodeType, getNodeName, getNodeType, getNodeName, getNodeValuegetNodeValuegetOwnerDocumentgetOwnerDocumentgetParentNodegetParentNodehasChildNodes, getChildNodeshasChildNodes, getChildNodesgetFirstChild, getLastChildgetFirstChild, getLastChildgetPreviousSibling, getNextSiblinggetPreviousSibling, getNextSiblinghasAttributes, getAttributeshasAttributes, getAttributesappendChild(newChild)appendChild(newChild)insertBefore(newChild,refChild)insertBefore(newChild,refChild)replaceChild(newChild,oldChild)replaceChild(newChild,oldChild)removeChild(oldChild)removeChild(oldChild)
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
...
SDPL 2011 3.2: Document Object Model 10
Type and Name of aType and Name of a NodeNode
node.getNodeType()node.getNodeType():: short intshort int constants 1, 2, …, 12 forconstants 1, 2, …, 12 for Node.ELEMENT_NODENode.ELEMENT_NODE,, Node.ATTRIBUTE_NODENode.ATTRIBUTE_NODE,, Node.TEXT_NODENode.TEXT_NODE, …, …
node.getNodeName()node.getNodeName()– for an for an Element = element.getTagName()Element = element.getTagName()– for an for an AttrAttr:: the name of the attribute the name of the attribute– for anonymous nodesfor anonymous nodes: : "#text""#text", , "#document""#document", , "#comment""#comment" etc etc
SDPL 2011 3.2: Document Object Model 11
The Value of aThe Value of a NodeNode
node.getNodeValuenode.getNodeValue()() – content of a text node, content of a text node,
value of attribute, …; value of attribute, …; nullnull for an for an ElementElement ((NoticeNotice !) !)
– (C.f. XPath, where node’s value is its full textual (C.f. XPath, where node’s value is its full textual content)content)
– DOM DOM 3 3 provides full text content with methodprovides full text content with method node.getTextContentnode.getTextContent()()
SDPL 2011 3.2: Document Object Model 12
Object Creation in DOMObject Creation in DOM
Each DOM Node Each DOM Node nn belongs to a belongs to a DocumentDocument: : nn..getOwnerDocument()getOwnerDocument()
Objects that implement interface Objects that implement interface XX are are created by created by factory methodsfactory methods
Document.createDocument.createXX(…)(…)E.g: when E.g: when docdoc is a is a DocumentDocument objectobject
doc.createElement("A"), doc.createElement("A"), doc.createAttribute("href"), doc.createAttribute("href"), doc.createTextNode("Hello!")doc.createTextNode("Hello!")
Loading & saving specified in DOM3 (or Loading & saving specified in DOM3 (or implementation-specific , or via JAXP)implementation-specific , or via JAXP)
SDPL 2011 3.2: Document Object Model 13
invoiceinvoice
namename
addressdataaddressdata
addressaddress
form="00"form="00"type="estimated"type="estimated"
John DoeJohn Doe streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
DocumentDocumentgetDocumentElementgetDocumentElementgetElementById(IdVal)getElementById(IdVal)getElementsByTagName(tagName)getElementsByTagName(tagName)
createElement(tagName)createElement(tagName)createTextNode(data)createTextNode(data)
NodeNode DOM interfaces: DOM interfaces: DocumentDocument
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
......
SDPL 2011 3.2: Document Object Model 14
DOM interfaces: DOM interfaces: ElementElement
invoiceinvoice
invoicepageinvoicepage
namename
addresseeaddressee
addressdataaddressdata
addressaddress
form="00"form="00"type="estimatedbill"type="estimatedbill"
John DoeJohn Doe streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
ElementElementgetTagName()getTagName()
hasAttribute(name)hasAttribute(name)getAttribute(name)getAttribute(name)setAttribute(attrName, value)setAttribute(attrName, value)removeAttribute(name)removeAttribute(name)
getElementsByTagName(name)getElementsByTagName(name)
NodeNode
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
SDPL 2011 3.2: Document Object Model 15
Text Content Manipulation in DOMText Content Manipulation in DOM
for objects for objects cc that implement the that implement the CharacterDataCharacterData interface interface (Text, Comments, CDATASections)(Text, Comments, CDATASections)::– c.c.substringData(substringData(offset, countoffset, count)) – c.c.appendData(appendData(stringstring)) – c.c.insertData(insertData(offset, stringoffset, string)) – c.c.deleteData(deleteData(offset, countoffset, count)) – c.c.replaceData(replaceData(offset, count, stringoffset, count, string))
( = ( = c. c.deleteData(offset, count);deleteData(offset, count); c.c.insertData(offset, string) )insertData(offset, string) )
SDPL 2011 3.2: Document Object Model 16
DOMDOM CharacterData CharacterData
DOM strings are 0-based sequences of DOM strings are 0-based sequences of 16-bit characters:16-bit characters:
CC: Hello world, nice to see you!: Hello world, nice to see you! 0 1 2 0 1 2 01234567890123456789012345678 01234567890123456789012345678
C.getLength()C.getLength()-1-1
C.substringData(6, 5) C.substringData(6, 5) = ?= ? C.substringData(0, C.getLength()) C.substringData(0, C.getLength()) = ?= ?
SDPL 2011 3.2: Document Object Model 17
Interfaces to node collections (1)Interfaces to node collections (1)
NodeListNodeList for ordered lists of nodesfor ordered lists of nodes<- Node.getChildNodes()<- Node.getChildNodes() and and Element/DocumentElement/Document.getElementsByTagName("name").getElementsByTagName("name")
» (proper) descendant elements of type (proper) descendant elements of type ""namename" " in in document order (document order (""**" " ~ any element type)~ any element type)
AA
EE
EE EE
EE
11
AA
22 33 44
55 66
.getElementsByTagName(“E")= .getElementsByTagName(“E")=
SDPL 2011 3.2: Document Object Model 18
Typical child-node access patternTypical child-node access pattern
Accessing specific nodes, or iterating over a Accessing specific nodes, or iterating over a NodeListNodeList::– to process all children of to process all children of nodenode::for (i=0;for (i=0;
i<node.i<node.getChildNodesgetChildNodes().().getLengthgetLength(); (); i++) i++) process(node.process(node.getChildNodesgetChildNodes().().itemitem(i));(i));
SDPL 2011 3.2: Document Object Model 19
NamedNodeMapNamedNodeMap for unordered sets of nodes for unordered sets of nodes accessed by their name:accessed by their name:<- Node.getAttributes(), <- Node.getAttributes(), DocumentType.getEntities() DocumentType.getEntities()
DocumentFragmentDocumentFragment – Termporary container of child nodesTermporary container of child nodes– Disappears when inserted in treeDisappears when inserted in tree
NodeListNodeLists and s and NamedNodeMapNamedNodeMaps are "live":s are "live":– reflect updates of the doc tree immediatelyreflect updates of the doc tree immediately– See nextSee next
Interfaces to node collections (2)Interfaces to node collections (2)
SDPL 2011 3.2: Document Object Model 20
NodeListNodeLists are “live”s are “live”
E.g., this would delete E.g., this would delete every other every other child of child of nn::
NodeList cList = n.NodeList cList = n.getChildNodesgetChildNodes();(); for (i=0; i<cList.for (i=0; i<cList.getLengthgetLength(); i++) (); i++)
n.n.removeChildremoveChild(cList.(cList.itemitem(i));(i)); – What happens?What happens? nn
AA BB CC DDi=0i=0i=1i=1
i=2i=2
cListcList
SDPL 2011 3.2: Document Object Model 21
DOM: XML ImplementationsDOM: XML Implementations
JavaJava-based parsers -based parsers e.g. e.g. Apache Xerces, Apache Crimson, … Apache Xerces, Apache Crimson, …
In MS IE browser: COM programming interfaces for In MS IE browser: COM programming interfaces for C/C++C/C++ and and Visual BasicVisual Basic; ActiveX object ; ActiveX object programming interfaces for programming interfaces for script languagesscript languages
PerlPerl: : XML::DOMXML::DOM (Implements DOM Level 1) (Implements DOM Level 1) OthersOthers, say, database APIs?, say, database APIs?
– Vendors of different kinds of systems participated in the Vendors of different kinds of systems participated in the W3C DOM WGW3C DOM WG
SDPL 2011 3.2: Document Object Model 22
A Java-DOM ExampleA Java-DOM Example
Command-line tool Command-line tool RegListMgrRegListMgr for maintaining for maintaining a course registration lista course registration list– with single-letter commands for with single-letter commands for llisting, isting, aadding, dding,
uupdating and pdating and ddeleting student recordseleting student records Example:Example:
$ java RegListMgr reglist.xml$ java RegListMgr reglist.xmlDocument loaded succesfullyDocument loaded succesfully> > ……40: Tero Ulvinen, TKM1, [email protected], 240: Tero Ulvinen, TKM1, [email protected], 241: heli viinikainen, tkt5, [email protected], 141: heli viinikainen, tkt5, [email protected], 1
list the contentslist the contentsll
SDPL 2011 3.2: Document Object Model 23
Registration list: the XML fileRegistration list: the XML file
<?xml version="1.0" ?><?xml version="1.0" ?><!DOCTYPE reglist SYSTEM "reglist.dtd"><!DOCTYPE reglist SYSTEM "reglist.dtd"><reglist lastID="41"><reglist lastID="41"> <student id="RDK1"><student id="RDK1"> <name><given>Juho</given><name><given>Juho</given>
<family>Ahopelto</family></name> <family>Ahopelto</family></name> <branchAndYear>TKT4</branchAndYear><branchAndYear>TKT4</branchAndYear> <email>[email protected]</email><email>[email protected]</email> <group>2</group><group>2</group> </student></student> <!-- … and the other students … --><!-- … and the other students … --></reglist></reglist>
SDPL 2011 3.2: Document Object Model 24
Registration List: the DTDRegistration List: the DTD
<!ELEMENT reglist (student*)><!ELEMENT reglist (student*)>
<!ATTLIST reglist <!ATTLIST reglist lastID CDATA #REQUIRED > lastID CDATA #REQUIRED >
<!ELEMENT student <!ELEMENT student (name, branchAndYear, email, group)> (name, branchAndYear, email, group)>
<!ATTLIST student <!ATTLIST student id ID #REQUIRED > id ID #REQUIRED >
<!ELEMENT name (given, family)><!ELEMENT name (given, family)>
<!ELEMENT given (#PCDATA)><!ELEMENT given (#PCDATA)>
<!-- … and the same for family, <!-- … and the same for family, branchAndYear, email,and group -->branchAndYear, email,and group -->
SDPL 2011 3.2: Document Object Model 25
Loading and Saving the RegList Loading and Saving the RegList
Loading of the registration list into Loading of the registration list into DOMDOM DocumentDocument docdoc implemented with a JAXP implemented with a JAXP DocumentBuilder DocumentBuilder – (to be discussed later)(to be discussed later)– docdoc is a handle to the is a handle to the DocumentDocument
Saving implemented with a Saving implemented with a JAXP JAXP TransformerTransformer– to be discussed laterto be discussed later
SDPL 2011 3.2: Document Object Model 26
Listing student records (1)Listing student records (1)
NodeListNodeList students = students = doc.doc.getElementsByTagNamegetElementsByTagName("student");("student");
for (int i=0; i<students.for (int i=0; i<students.getLengthgetLength(); i++) (); i++)
showStudent((showStudent((ElementElement) students.) students.itemitem(i));(i));
private void showStudent(private void showStudent(ElementElement student) { student) { // Collect relevant sub-elements:// Collect relevant sub-elements: NodeNode given = given =
student. student.getElementsByTagNamegetElementsByTagName("given").("given").itemitem(0);(0); NodeNode family = given. family = given.getNextSiblinggetNextSibling();(); NodeNode bAndY = student. bAndY = student.
getElementsByTagNamegetElementsByTagName("branchAndYear").("branchAndYear").itemitem(0);(0); NodeNode email = bAndY. email = bAndY.getNextSiblinggetNextSibling();(); NodeNode group = email. group = email.getNextSiblinggetNextSibling();();
SDPL 2011 3.2: Document Object Model 27
Listing student records (2)Listing student records (2)
// Method showStudent continues:// Method showStudent continues:
System.out.print(System.out.print( student.student.getAttributegetAttribute("id").substring(3));("id").substring(3));
System.out.print(": " + System.out.print(": " + given.given.getFirstChildgetFirstChild().().getNodeValuegetNodeValue() );() );// or given.// or given.getTextContentgetTextContent() with DOM3() with DOM3
// .. similarly access and display the // .. similarly access and display the // value of family, bAndY, email, and group// value of family, bAndY, email, and group// … // …
} // showStudent} // showStudent
SDPL 2011 3.2: Document Object Model 28
Lessons of accessing DOMLessons of accessing DOM
Access methods for relevant nodesAccess methods for relevant nodes– getElementsByTagname(getElementsByTagname(“tagName”“tagName”))
» robust wrt structure modificationsrobust wrt structure modifications– Also others, if structure known (validated)Also others, if structure known (validated)
» getFirstChild(), getLastChild(), getFirstChild(), getLastChild(), getPreviousSibling(), getNextSibling()getPreviousSibling(), getNextSibling()
Element nodes have no value!Element nodes have no value!– Get the value from child Text nodes, Get the value from child Text nodes,
or use or use getTextContent()getTextContent()
SDPL 2011 3.2: Document Object Model 29
Adding New RecordsAdding New Records
Example:Example:
First name (or <return> to finish):First name (or <return> to finish):> a> a
… …41: heli viinikainen, tkt5, [email protected], 141: heli viinikainen, tkt5, [email protected], 142: Antti Ahkera, tkt3, [email protected], 242: Antti Ahkera, tkt3, [email protected], 2
add studentsadd students
AnttiAnttiLast name:Last name: AhkeraAhkeraBranch&year:Branch&year:
Finished adding recordsFinished adding records> >
email:email:tkt3tkt3
group:group:[email protected]@fake.addr.fi
First name (or <return> to finish):First name (or <return> to finish):22
ll
SDPL 2011 3.2: Document Object Model 30
Implementing addition of records (1)Implementing addition of records (1)
ElementElement rootElem = doc. rootElem = doc.getDocumentElementgetDocumentElement();();
String lastID = rootElem.String lastID = rootElem.getAttributegetAttribute("lastID");("lastID");
int lastIDnum = java.lang.Integer.parseInt(lastID);int lastIDnum = java.lang.Integer.parseInt(lastID);
System.out.print(System.out.print("First name (or <return> to finish): ");"First name (or <return> to finish): ");
String firstName =String firstName =terminalReader.readLine().trim(); terminalReader.readLine().trim();
while (firstName.length() > 0) {while (firstName.length() > 0) {
// Get the next unused ID:// Get the next unused ID:
ID = "RDK" + new Integer(++lastIDnum).toString();ID = "RDK" + new Integer(++lastIDnum).toString();
// … Read values lastName, bAndY, email, // … Read values lastName, bAndY, email,
// and group from the terminal, and then ...// and group from the terminal, and then ...
SDPL 2011 3.2: Document Object Model 31
Implementing addition of records (2)Implementing addition of records (2)
ElementElement newStudent = newStudent = newStudent(doc, ID, firstName, lastName, newStudent(doc, ID, firstName, lastName, bAndY, email, group); bAndY, email, group);
rootElem.rootElem.appendChildappendChild(newStudent);(newStudent);
System.out.print(System.out.print("First name (or <return> to finish): ");"First name (or <return> to finish): ");
firstName = terminalReader.readLine().trim();firstName = terminalReader.readLine().trim();
} // while firstName.length() > 0} // while firstName.length() > 0
// Update the last ID used:// Update the last ID used:
String newLastID =String newLastID =java.lang.Integer.toString(lastIDnum);java.lang.Integer.toString(lastIDnum);
rootElem.rootElem.setAttributesetAttribute("lastID", newLastID);("lastID", newLastID);
System.out.println("Finished adding records");System.out.println("Finished adding records");
SDPL 2011 3.2: Document Object Model 32
Creating new student records (1)Creating new student records (1)
private private ElementElement newStudent( newStudent(DocumentDocument doc, String ID, doc, String ID, String fName, String lName, String bAndY, String fName, String lName, String bAndY, String email, String grp) { String email, String grp) {
ElementElement stu = doc. stu = doc.createElementcreateElement("student");("student"); stu.stu.setAttributesetAttribute("id", ID);("id", ID); ElementElement newName = doc. newName = doc.createElementcreateElement("name");("name"); ElementElement newGiven = doc. newGiven = doc.createElementcreateElement("given");("given");
newGiven. newGiven.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(fName));(fName)); ElementElement newFamily = doc. newFamily = doc.createElementcreateElement("family");("family"); newFamily.newFamily.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(lName));(lName)); newName.newName.appendChildappendChild(newGiven);(newGiven); newName.newName.appendChildappendChild(newFamily);(newFamily);
stu. stu.appendChildappendChild(newName);(newName);
SDPL 2011 3.2: Document Object Model 33
Creating new student records (2)Creating new student records (2)
// method newStudent(…) continues:// method newStudent(…) continues:ElementElement newBr = newBr =
doc.doc.createElementcreateElement("branchAndYear");("branchAndYear");
newBr.newBr.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(bAndY));(bAndY));
stu.appendChild(newBr);stu.appendChild(newBr);
ElementElement newEmail = doc. newEmail = doc.createElementcreateElement("email");("email");
newEmail.newEmail.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(email));(email));
stu.stu.appendChildappendChild(newEmail);(newEmail);
ElementElement newGrp = doc. newGrp = doc.createElementcreateElement("group");("group");
newGrp.newGrp.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(group));(group));
stu.stu.appendChildappendChild(newGrp);(newGrp);
return stu;return stu;
} // newStudent} // newStudent
SDPL 2011 3.2: Document Object Model 34
Lessons of modifying DOMLessons of modifying DOM
Each node must be created withEach node must be created with– Document.create...(Document.create...(“nameOrValue”“nameOrValue”))
– Attributes of an element more easily with Attributes of an element more easily with setAttribute(setAttribute(“name”, “value”“name”, “value”))
... and connected to the structure... and connected to the structure– Normally with Normally with parent.appendChild(newChild)parent.appendChild(newChild)
Updates and deletions in the RegListMgr Updates and deletions in the RegListMgr similarly, by manipulating the DOM structuressimilarly, by manipulating the DOM structures
-> exercises-> exercises
SDPL 2011 3.2: Document Object Model 35
Efficiency of SAX vs DOM?Efficiency of SAX vs DOM?
DOM has reputation of requiring more DOM has reputation of requiring more resources than streaming interfaces like SAXresources than streaming interfaces like SAX
Small experiment of this hypothesis:Small experiment of this hypothesis: Test task: Retrieve the title of the last section Test task: Retrieve the title of the last section
that mentions "XML Schema definition that mentions "XML Schema definition language"language"– Target docs: repeats of fragments from W3C XML Target docs: repeats of fragments from W3C XML
Schema Recommendation (Part 1)Schema Recommendation (Part 1)– Environment: JDK 1.6, Red Hat Linux 6, 3 GHz Environment: JDK 1.6, Red Hat Linux 6, 3 GHz
Pentium with 1 GB RAMPentium with 1 GB RAM
SDPL 2011 3.2: Document Object Model 36
The speed of DOM vs SAXThe speed of DOM vs SAX
On small documents, up to ~ 2 MB, the SAX & On small documents, up to ~ 2 MB, the SAX & DOM based solutions are roughly equal:DOM based solutions are roughly equal:
400
600
800
1000
1200
1400
500 1000 1500 2000 2500 3000 3500 4000
tim
e (
ms
)
document size (KB)
SAX v s DOM processing times
SAXDOM
~ 3.0 MB/s~ 3.0 MB/s
~ 3.9 MB/s~ 3.9 MB/s
SDPL 2011 3.2: Document Object Model 37
Resource needs of DOM vs SAXResource needs of DOM vs SAX
On larger documents, up to ~ 60 MB, the On larger documents, up to ~ 60 MB, the DOM application becomes faster than SAX(!)DOM application becomes faster than SAX(!)– throughput ~ 8 MB/sthroughput ~ 8 MB/s– SAX ~ 4 MB/sSAX ~ 4 MB/s
But DOM takes relatively much of RAMBut DOM takes relatively much of RAM– here ~ 6 x the size of the input XML documenthere ~ 6 x the size of the input XML document
The SAX application runs in fixed space of ~ 6 MB The SAX application runs in fixed space of ~ 6 MB
SDPL 2011 3.2: Document Object Model 38
Summary of XML APIs so farSummary of XML APIs so far
Give applications access to the structure and Give applications access to the structure and contents of XML documentscontents of XML documents
Event-based APIs (e.g. SAX)Event-based APIs (e.g. SAX)– notify application through parsing eventsnotify application through parsing events– efficientefficient
Object-model (or tree) based APIs (e.g. DOM)Object-model (or tree) based APIs (e.g. DOM)– provide a full parse treeprovide a full parse tree– more convenient, but require much resources with more convenient, but require much resources with
large documentslarge documents Major parsers support both SAX and DOMMajor parsers support both SAX and DOM
– used through proprietary methodsused through proprietary methods– used through JAXP used through JAXP ((next)next)