sdpl 20113.2: document object model1 n how access structured documents uniformly in parsers,...

38
SDPL 2011 3.2: Document Object Model 1 How access structured documents How access structured documents uniformly in parsers, browsers, uniformly in parsers, browsers, editors, databases,...? editors, databases,...? Overview of the W3C DOM Spec Overview of the W3C DOM Spec » Level 1, W3C Rec, Oct. 1998 Level 1, W3C Rec, Oct. 1998 » Level 2 Level 2 , W3C Rec, Nov. 2000 , W3C Rec, Nov. 2000 » Level 3 Level 3 Validation Validation , , Core Core , and , and Load and Save Load and Save W3C Recs (Spring 2004) W3C Recs (Spring 2004) W3C DOM Activity has been closed W3C DOM Activity has been closed 3.2 Document Object Model 3.2 Document Object Model (DOM) (DOM)

Upload: katrina-blake

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 1

How access structured documents uniformly How access structured documents uniformly in parsers, browsers, editors, databases,...? in parsers, browsers, editors, databases,...?

Overview of the W3C DOM SpecOverview of the W3C DOM Spec» Level 1, W3C Rec, Oct. 1998Level 1, W3C Rec, Oct. 1998» Level 2Level 2, W3C Rec, Nov. 2000, W3C Rec, Nov. 2000» Level 3 Level 3 ValidationValidation, , CoreCore, and , and Load and SaveLoad and Save

W3C Recs (Spring 2004)W3C Recs (Spring 2004)

W3C DOM Activity has been closedW3C DOM Activity has been closed

3.2 Document Object Model (DOM)3.2 Document Object Model (DOM)

Page 2: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 2

DOM: What is it? DOM: What is it?

An object-based, language-neutral API for An object-based, language-neutral API for XML and HTML documentsXML and HTML documents

– Allows programs and scripts to build, access, and modify Allows programs and scripts to build, access, and modify documentsdocuments

– Supports designing of Supports designing of querying, filtering, querying, filtering, transformation, formatting etc. transformation, formatting etc.

applications on top of DOM implementationsapplications on top of DOM implementations

Instead of “Instead of “SSerial erial AAccess ccess XXML” could think as ML” could think as ““DDirectly irectly OObtainable in btainable in MMemory”emory”

Page 3: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 3

DOM DOM structure model structure model

Based on O-O concepts:Based on O-O concepts:– objectsobjects (encapsulation of data and methods) (encapsulation of data and methods)– methodsmethods (to access or change object’s state) (to access or change object’s state)– interfacesinterfaces (declaration of a set of methods) (declaration of a set of methods)

Somewhat similar to the XPath data model (to be Somewhat similar to the XPath data model (to be discussed with XSLT and XQuery) discussed with XSLT and XQuery) syntax-tree syntax-tree– Tree structure implied by abstract relationships defined Tree structure implied by abstract relationships defined

by the API; Data structures of an implementation may by the API; Data structures of an implementation may differdiffer

Page 4: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 4

<invoice form="00" <invoice form="00" type="estimated">type="estimated"> <addressdata><addressdata> <name>John Doe</name><name>John Doe</name> <address><address> <streetaddress>Pyynpolku 1<streetaddress>Pyynpolku 1 </streetaddress></streetaddress> <postoffice>70460 KUOPIO<postoffice>70460 KUOPIO </postoffice></postoffice> </address></address> </addressdata></addressdata> ......

DOM structure modelDOM structure model

invoiceinvoice

namename

addressdataaddressdata

addressaddress

form="00"form="00"type="estimated"type="estimated"

John DoeJohn Doe streetaddressstreetaddress postofficepostoffice

70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1

......

DocumentDocument

ElementElement

NamedNodeMapNamedNodeMap

TextText

Page 5: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 5

Structure of DOM Level 1Structure of DOM Level 1

I: DOM I: DOM Core InterfacesCore Interfaces– Fundamental interfacesFundamental interfaces

» basic interfaces: Document, Element, Attr, Text, ...basic interfaces: Document, Element, Attr, Text, ...– "Extended" (XML specific) interfaces "Extended" (XML specific) interfaces

» CDATASection, DocumentType, Notation, Entity, CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstructionEntityReference, ProcessingInstruction

II: DOM II: DOM HTML InterfacesHTML Interfaces– more convenient access to HTML documentsmore convenient access to HTML documents– we'll ignore thesewe'll ignore these

Page 6: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 6

DOM Level 2DOM Level 2

– Level 1: basic representation and manipulation of Level 1: basic representation and manipulation of document structure and content document structure and content (No access to the contents of a DTD)(No access to the contents of a DTD)

DOM Level 2 adds DOM Level 2 adds – support for namespacessupport for namespaces– Document.getElementById("id_val")Document.getElementById("id_val"), ,

to access elements by ID attr valuesto access elements by ID attr values– optional features (we’ll skip these)optional features (we’ll skip these)

» interfaces to document views and style sheetsinterfaces to document views and style sheets» an event model (for user actions on elements)an event model (for user actions on elements)» methods for traversing the document tree and manipulating methods for traversing the document tree and manipulating

regions of document (e.g., selected in an editor)regions of document (e.g., selected in an editor)

Page 7: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 7

DOM Language BindingsDOM Language Bindings

Language-independence:Language-independence:– DOM interfaces are defined using OMG Interface DOM interfaces are defined using OMG Interface

Definition Language (IDL, defined in Corba Definition Language (IDL, defined in Corba Specification)Specification)

Language bindings (implementations of Language bindings (implementations of interfaces) defined in the Recommendation forinterfaces) defined in the Recommendation for– Java (See the Java API doc) andJava (See the Java API doc) and– ECMAScript (standardised JavaScript)ECMAScript (standardised JavaScript)

Page 8: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 8

Core Interfaces: Core Interfaces: NodeNode & its variants & its variants

NodeNode

CommentComment

DocumentFragmentDocumentFragment AttrAttr

TextText

ElementElement

CDATASectionCDATASection

ProcessingInstructionProcessingInstruction

CharacterDataCharacterData

EntityEntityDocumentTypeDocumentType NotationNotation

EntityReferenceEntityReference

““Extended Extended interfaces”interfaces”

DocumentDocument

Page 9: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 9

DOM interfaces: DOM interfaces: NodeNode

invoiceinvoice

namename

addressdataaddressdata

addressaddress

form="00"form="00"type="estimatedbill"type="estimatedbill"

John DoeJohn Doe streetaddressstreetaddress postofficepostoffice

70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1

NodeNodegetNodeType, getNodeName, getNodeType, getNodeName, getNodeValuegetNodeValuegetOwnerDocumentgetOwnerDocumentgetParentNodegetParentNodehasChildNodes, getChildNodeshasChildNodes, getChildNodesgetFirstChild, getLastChildgetFirstChild, getLastChildgetPreviousSibling, getNextSiblinggetPreviousSibling, getNextSiblinghasAttributes, getAttributeshasAttributes, getAttributesappendChild(newChild)appendChild(newChild)insertBefore(newChild,refChild)insertBefore(newChild,refChild)replaceChild(newChild,oldChild)replaceChild(newChild,oldChild)removeChild(oldChild)removeChild(oldChild)

DocumentDocument

ElementElement

NamedNodeMapNamedNodeMap

TextText

...

Page 10: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 10

Type and Name of aType and Name of a NodeNode

node.getNodeType()node.getNodeType():: short intshort int constants 1, 2, …, 12 forconstants 1, 2, …, 12 for Node.ELEMENT_NODENode.ELEMENT_NODE,, Node.ATTRIBUTE_NODENode.ATTRIBUTE_NODE,, Node.TEXT_NODENode.TEXT_NODE, …, …

node.getNodeName()node.getNodeName()– for an for an Element = element.getTagName()Element = element.getTagName()– for an for an AttrAttr:: the name of the attribute the name of the attribute– for anonymous nodesfor anonymous nodes: : "#text""#text", , "#document""#document", , "#comment""#comment" etc etc

Page 11: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 11

The Value of aThe Value of a NodeNode

node.getNodeValuenode.getNodeValue()() – content of a text node, content of a text node,

value of attribute, …; value of attribute, …; nullnull for an for an ElementElement ((NoticeNotice !) !)

– (C.f. XPath, where node’s value is its full textual (C.f. XPath, where node’s value is its full textual content)content)

– DOM DOM 3 3 provides full text content with methodprovides full text content with method node.getTextContentnode.getTextContent()()

Page 12: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 12

Object Creation in DOMObject Creation in DOM

Each DOM Node Each DOM Node nn belongs to a belongs to a DocumentDocument: : nn..getOwnerDocument()getOwnerDocument()

Objects that implement interface Objects that implement interface XX are are created by created by factory methodsfactory methods

Document.createDocument.createXX(…)(…)E.g: when E.g: when docdoc is a is a DocumentDocument objectobject

doc.createElement("A"), doc.createElement("A"), doc.createAttribute("href"), doc.createAttribute("href"), doc.createTextNode("Hello!")doc.createTextNode("Hello!")

Loading & saving specified in DOM3 (or Loading & saving specified in DOM3 (or implementation-specific , or via JAXP)implementation-specific , or via JAXP)

Page 13: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 13

invoiceinvoice

namename

addressdataaddressdata

addressaddress

form="00"form="00"type="estimated"type="estimated"

John DoeJohn Doe streetaddressstreetaddress postofficepostoffice

70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1

DocumentDocumentgetDocumentElementgetDocumentElementgetElementById(IdVal)getElementById(IdVal)getElementsByTagName(tagName)getElementsByTagName(tagName)

createElement(tagName)createElement(tagName)createTextNode(data)createTextNode(data)

NodeNode DOM interfaces: DOM interfaces: DocumentDocument

DocumentDocument

ElementElement

NamedNodeMapNamedNodeMap

TextText

......

Page 14: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 14

DOM interfaces: DOM interfaces: ElementElement

invoiceinvoice

invoicepageinvoicepage

namename

addresseeaddressee

addressdataaddressdata

addressaddress

form="00"form="00"type="estimatedbill"type="estimatedbill"

John DoeJohn Doe streetaddressstreetaddress postofficepostoffice

70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1

ElementElementgetTagName()getTagName()

hasAttribute(name)hasAttribute(name)getAttribute(name)getAttribute(name)setAttribute(attrName, value)setAttribute(attrName, value)removeAttribute(name)removeAttribute(name)

getElementsByTagName(name)getElementsByTagName(name)

NodeNode

DocumentDocument

ElementElement

NamedNodeMapNamedNodeMap

TextText

Page 15: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 15

Text Content Manipulation in DOMText Content Manipulation in DOM

for objects for objects cc that implement the that implement the CharacterDataCharacterData interface interface (Text, Comments, CDATASections)(Text, Comments, CDATASections)::– c.c.substringData(substringData(offset, countoffset, count)) – c.c.appendData(appendData(stringstring)) – c.c.insertData(insertData(offset, stringoffset, string)) – c.c.deleteData(deleteData(offset, countoffset, count)) – c.c.replaceData(replaceData(offset, count, stringoffset, count, string))

( = ( = c. c.deleteData(offset, count);deleteData(offset, count); c.c.insertData(offset, string) )insertData(offset, string) )

Page 16: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 16

DOMDOM CharacterData CharacterData

DOM strings are 0-based sequences of DOM strings are 0-based sequences of 16-bit characters:16-bit characters:

CC: Hello world, nice to see you!: Hello world, nice to see you! 0 1 2 0 1 2 01234567890123456789012345678 01234567890123456789012345678

C.getLength()C.getLength()-1-1

C.substringData(6, 5) C.substringData(6, 5) = ?= ? C.substringData(0, C.getLength()) C.substringData(0, C.getLength()) = ?= ?

Page 17: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 17

Interfaces to node collections (1)Interfaces to node collections (1)

NodeListNodeList for ordered lists of nodesfor ordered lists of nodes<- Node.getChildNodes()<- Node.getChildNodes() and and Element/DocumentElement/Document.getElementsByTagName("name").getElementsByTagName("name")

» (proper) descendant elements of type (proper) descendant elements of type ""namename" " in in document order (document order (""**" " ~ any element type)~ any element type)

AA

EE

EE EE

EE

11

AA

22 33 44

55 66

.getElementsByTagName(“E")= .getElementsByTagName(“E")=

Page 18: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 18

Typical child-node access patternTypical child-node access pattern

Accessing specific nodes, or iterating over a Accessing specific nodes, or iterating over a NodeListNodeList::– to process all children of to process all children of nodenode::for (i=0;for (i=0;

i<node.i<node.getChildNodesgetChildNodes().().getLengthgetLength(); (); i++) i++) process(node.process(node.getChildNodesgetChildNodes().().itemitem(i));(i));

Page 19: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 19

NamedNodeMapNamedNodeMap for unordered sets of nodes for unordered sets of nodes accessed by their name:accessed by their name:<- Node.getAttributes(), <- Node.getAttributes(), DocumentType.getEntities() DocumentType.getEntities()

DocumentFragmentDocumentFragment – Termporary container of child nodesTermporary container of child nodes– Disappears when inserted in treeDisappears when inserted in tree

NodeListNodeLists and s and NamedNodeMapNamedNodeMaps are "live":s are "live":– reflect updates of the doc tree immediatelyreflect updates of the doc tree immediately– See nextSee next

Interfaces to node collections (2)Interfaces to node collections (2)

Page 20: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 20

NodeListNodeLists are “live”s are “live”

E.g., this would delete E.g., this would delete every other every other child of child of nn::

NodeList cList = n.NodeList cList = n.getChildNodesgetChildNodes();(); for (i=0; i<cList.for (i=0; i<cList.getLengthgetLength(); i++) (); i++)

n.n.removeChildremoveChild(cList.(cList.itemitem(i));(i)); – What happens?What happens? nn

AA BB CC DDi=0i=0i=1i=1

i=2i=2

cListcList

Page 21: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 21

DOM: XML ImplementationsDOM: XML Implementations

JavaJava-based parsers -based parsers e.g. e.g. Apache Xerces, Apache Crimson, … Apache Xerces, Apache Crimson, …

In MS IE browser: COM programming interfaces for In MS IE browser: COM programming interfaces for C/C++C/C++ and and Visual BasicVisual Basic; ActiveX object ; ActiveX object programming interfaces for programming interfaces for script languagesscript languages

PerlPerl: : XML::DOMXML::DOM (Implements DOM Level 1) (Implements DOM Level 1) OthersOthers, say, database APIs?, say, database APIs?

– Vendors of different kinds of systems participated in the Vendors of different kinds of systems participated in the W3C DOM WGW3C DOM WG

Page 22: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 22

A Java-DOM ExampleA Java-DOM Example

Command-line tool Command-line tool RegListMgrRegListMgr for maintaining for maintaining a course registration lista course registration list– with single-letter commands for with single-letter commands for llisting, isting, aadding, dding,

uupdating and pdating and ddeleting student recordseleting student records Example:Example:

$ java RegListMgr reglist.xml$ java RegListMgr reglist.xmlDocument loaded succesfullyDocument loaded succesfully> > ……40: Tero Ulvinen, TKM1, [email protected], 240: Tero Ulvinen, TKM1, [email protected], 241: heli viinikainen, tkt5, [email protected], 141: heli viinikainen, tkt5, [email protected], 1

list the contentslist the contentsll

Page 23: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 23

Registration list: the XML fileRegistration list: the XML file

<?xml version="1.0" ?><?xml version="1.0" ?><!DOCTYPE reglist SYSTEM "reglist.dtd"><!DOCTYPE reglist SYSTEM "reglist.dtd"><reglist lastID="41"><reglist lastID="41"> <student id="RDK1"><student id="RDK1"> <name><given>Juho</given><name><given>Juho</given>

<family>Ahopelto</family></name> <family>Ahopelto</family></name> <branchAndYear>TKT4</branchAndYear><branchAndYear>TKT4</branchAndYear> <email>[email protected]</email><email>[email protected]</email> <group>2</group><group>2</group> </student></student> <!-- … and the other students … --><!-- … and the other students … --></reglist></reglist>

Page 24: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 24

Registration List: the DTDRegistration List: the DTD

<!ELEMENT reglist (student*)><!ELEMENT reglist (student*)>

<!ATTLIST reglist <!ATTLIST reglist lastID CDATA #REQUIRED > lastID CDATA #REQUIRED >

<!ELEMENT student <!ELEMENT student (name, branchAndYear, email, group)> (name, branchAndYear, email, group)>

<!ATTLIST student <!ATTLIST student id ID #REQUIRED > id ID #REQUIRED >

<!ELEMENT name (given, family)><!ELEMENT name (given, family)>

<!ELEMENT given (#PCDATA)><!ELEMENT given (#PCDATA)>

<!-- … and the same for family, <!-- … and the same for family, branchAndYear, email,and group -->branchAndYear, email,and group -->

Page 25: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 25

Loading and Saving the RegList Loading and Saving the RegList

Loading of the registration list into Loading of the registration list into DOMDOM DocumentDocument docdoc implemented with a JAXP implemented with a JAXP DocumentBuilder DocumentBuilder – (to be discussed later)(to be discussed later)– docdoc is a handle to the is a handle to the DocumentDocument

Saving implemented with a Saving implemented with a JAXP JAXP TransformerTransformer– to be discussed laterto be discussed later

Page 26: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 26

Listing student records (1)Listing student records (1)

NodeListNodeList students = students = doc.doc.getElementsByTagNamegetElementsByTagName("student");("student");

for (int i=0; i<students.for (int i=0; i<students.getLengthgetLength(); i++) (); i++)

showStudent((showStudent((ElementElement) students.) students.itemitem(i));(i));

private void showStudent(private void showStudent(ElementElement student) { student) { // Collect relevant sub-elements:// Collect relevant sub-elements: NodeNode given = given =

student. student.getElementsByTagNamegetElementsByTagName("given").("given").itemitem(0);(0); NodeNode family = given. family = given.getNextSiblinggetNextSibling();(); NodeNode bAndY = student. bAndY = student.

getElementsByTagNamegetElementsByTagName("branchAndYear").("branchAndYear").itemitem(0);(0); NodeNode email = bAndY. email = bAndY.getNextSiblinggetNextSibling();(); NodeNode group = email. group = email.getNextSiblinggetNextSibling();();

Page 27: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 27

Listing student records (2)Listing student records (2)

// Method showStudent continues:// Method showStudent continues:

System.out.print(System.out.print( student.student.getAttributegetAttribute("id").substring(3));("id").substring(3));

System.out.print(": " + System.out.print(": " + given.given.getFirstChildgetFirstChild().().getNodeValuegetNodeValue() );() );// or given.// or given.getTextContentgetTextContent() with DOM3() with DOM3

// .. similarly access and display the // .. similarly access and display the // value of family, bAndY, email, and group// value of family, bAndY, email, and group// … // …

} // showStudent} // showStudent

Page 28: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 28

Lessons of accessing DOMLessons of accessing DOM

Access methods for relevant nodesAccess methods for relevant nodes– getElementsByTagname(getElementsByTagname(“tagName”“tagName”))

» robust wrt structure modificationsrobust wrt structure modifications– Also others, if structure known (validated)Also others, if structure known (validated)

» getFirstChild(), getLastChild(), getFirstChild(), getLastChild(), getPreviousSibling(), getNextSibling()getPreviousSibling(), getNextSibling()

Element nodes have no value!Element nodes have no value!– Get the value from child Text nodes, Get the value from child Text nodes,

or use or use getTextContent()getTextContent()

Page 29: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 29

Adding New RecordsAdding New Records

Example:Example:

First name (or <return> to finish):First name (or <return> to finish):> a> a

… …41: heli viinikainen, tkt5, [email protected], 141: heli viinikainen, tkt5, [email protected], 142: Antti Ahkera, tkt3, [email protected], 242: Antti Ahkera, tkt3, [email protected], 2

add studentsadd students

AnttiAnttiLast name:Last name: AhkeraAhkeraBranch&year:Branch&year:

Finished adding recordsFinished adding records> >

email:email:tkt3tkt3

group:group:[email protected]@fake.addr.fi

First name (or <return> to finish):First name (or <return> to finish):22

ll

Page 30: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 30

Implementing addition of records (1)Implementing addition of records (1)

ElementElement rootElem = doc. rootElem = doc.getDocumentElementgetDocumentElement();();

String lastID = rootElem.String lastID = rootElem.getAttributegetAttribute("lastID");("lastID");

int lastIDnum = java.lang.Integer.parseInt(lastID);int lastIDnum = java.lang.Integer.parseInt(lastID);

System.out.print(System.out.print("First name (or <return> to finish): ");"First name (or <return> to finish): ");

String firstName =String firstName =terminalReader.readLine().trim(); terminalReader.readLine().trim();

while (firstName.length() > 0) {while (firstName.length() > 0) {

// Get the next unused ID:// Get the next unused ID:

ID = "RDK" + new Integer(++lastIDnum).toString();ID = "RDK" + new Integer(++lastIDnum).toString();

// … Read values lastName, bAndY, email, // … Read values lastName, bAndY, email,

// and group from the terminal, and then ...// and group from the terminal, and then ...

Page 31: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 31

Implementing addition of records (2)Implementing addition of records (2)

ElementElement newStudent = newStudent = newStudent(doc, ID, firstName, lastName, newStudent(doc, ID, firstName, lastName, bAndY, email, group); bAndY, email, group);

rootElem.rootElem.appendChildappendChild(newStudent);(newStudent);

System.out.print(System.out.print("First name (or <return> to finish): ");"First name (or <return> to finish): ");

firstName = terminalReader.readLine().trim();firstName = terminalReader.readLine().trim();

} // while firstName.length() > 0} // while firstName.length() > 0

// Update the last ID used:// Update the last ID used:

String newLastID =String newLastID =java.lang.Integer.toString(lastIDnum);java.lang.Integer.toString(lastIDnum);

rootElem.rootElem.setAttributesetAttribute("lastID", newLastID);("lastID", newLastID);

System.out.println("Finished adding records");System.out.println("Finished adding records");

Page 32: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 32

Creating new student records (1)Creating new student records (1)

private private ElementElement newStudent( newStudent(DocumentDocument doc, String ID, doc, String ID, String fName, String lName, String bAndY, String fName, String lName, String bAndY, String email, String grp) { String email, String grp) {

ElementElement stu = doc. stu = doc.createElementcreateElement("student");("student"); stu.stu.setAttributesetAttribute("id", ID);("id", ID); ElementElement newName = doc. newName = doc.createElementcreateElement("name");("name"); ElementElement newGiven = doc. newGiven = doc.createElementcreateElement("given");("given");

newGiven. newGiven.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(fName));(fName)); ElementElement newFamily = doc. newFamily = doc.createElementcreateElement("family");("family"); newFamily.newFamily.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(lName));(lName)); newName.newName.appendChildappendChild(newGiven);(newGiven); newName.newName.appendChildappendChild(newFamily);(newFamily);

stu. stu.appendChildappendChild(newName);(newName);

Page 33: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 33

Creating new student records (2)Creating new student records (2)

// method newStudent(…) continues:// method newStudent(…) continues:ElementElement newBr = newBr =

doc.doc.createElementcreateElement("branchAndYear");("branchAndYear");

newBr.newBr.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(bAndY));(bAndY));

stu.appendChild(newBr);stu.appendChild(newBr);

ElementElement newEmail = doc. newEmail = doc.createElementcreateElement("email");("email");

newEmail.newEmail.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(email));(email));

stu.stu.appendChildappendChild(newEmail);(newEmail);

ElementElement newGrp = doc. newGrp = doc.createElementcreateElement("group");("group");

newGrp.newGrp.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(group));(group));

stu.stu.appendChildappendChild(newGrp);(newGrp);

return stu;return stu;

} // newStudent} // newStudent

Page 34: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 34

Lessons of modifying DOMLessons of modifying DOM

Each node must be created withEach node must be created with– Document.create...(Document.create...(“nameOrValue”“nameOrValue”))

– Attributes of an element more easily with Attributes of an element more easily with setAttribute(setAttribute(“name”, “value”“name”, “value”))

... and connected to the structure... and connected to the structure– Normally with Normally with parent.appendChild(newChild)parent.appendChild(newChild)

Updates and deletions in the RegListMgr Updates and deletions in the RegListMgr similarly, by manipulating the DOM structuressimilarly, by manipulating the DOM structures

-> exercises-> exercises

Page 35: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 35

Efficiency of SAX vs DOM?Efficiency of SAX vs DOM?

DOM has reputation of requiring more DOM has reputation of requiring more resources than streaming interfaces like SAXresources than streaming interfaces like SAX

Small experiment of this hypothesis:Small experiment of this hypothesis: Test task: Retrieve the title of the last section Test task: Retrieve the title of the last section

that mentions "XML Schema definition that mentions "XML Schema definition language"language"– Target docs: repeats of fragments from W3C XML Target docs: repeats of fragments from W3C XML

Schema Recommendation (Part 1)Schema Recommendation (Part 1)– Environment: JDK 1.6, Red Hat Linux 6, 3 GHz Environment: JDK 1.6, Red Hat Linux 6, 3 GHz

Pentium with 1 GB RAMPentium with 1 GB RAM

Page 36: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 36

The speed of DOM vs SAXThe speed of DOM vs SAX

On small documents, up to ~ 2 MB, the SAX & On small documents, up to ~ 2 MB, the SAX & DOM based solutions are roughly equal:DOM based solutions are roughly equal:

400

600

800

1000

1200

1400

500 1000 1500 2000 2500 3000 3500 4000

tim

e (

ms

)

document size (KB)

SAX v s DOM processing times

SAXDOM

~ 3.0 MB/s~ 3.0 MB/s

~ 3.9 MB/s~ 3.9 MB/s

Page 37: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 37

Resource needs of DOM vs SAXResource needs of DOM vs SAX

On larger documents, up to ~ 60 MB, the On larger documents, up to ~ 60 MB, the DOM application becomes faster than SAX(!)DOM application becomes faster than SAX(!)– throughput ~ 8 MB/sthroughput ~ 8 MB/s– SAX ~ 4 MB/sSAX ~ 4 MB/s

But DOM takes relatively much of RAMBut DOM takes relatively much of RAM– here ~ 6 x the size of the input XML documenthere ~ 6 x the size of the input XML document

The SAX application runs in fixed space of ~ 6 MB The SAX application runs in fixed space of ~ 6 MB

Page 38: SDPL 20113.2: Document Object Model1 n How access structured documents uniformly in parsers, browsers, editors, databases,...? n Overview of the W3C DOM

SDPL 2011 3.2: Document Object Model 38

Summary of XML APIs so farSummary of XML APIs so far

Give applications access to the structure and Give applications access to the structure and contents of XML documentscontents of XML documents

Event-based APIs (e.g. SAX)Event-based APIs (e.g. SAX)– notify application through parsing eventsnotify application through parsing events– efficientefficient

Object-model (or tree) based APIs (e.g. DOM)Object-model (or tree) based APIs (e.g. DOM)– provide a full parse treeprovide a full parse tree– more convenient, but require much resources with more convenient, but require much resources with

large documentslarge documents Major parsers support both SAX and DOMMajor parsers support both SAX and DOM

– used through proprietary methodsused through proprietary methods– used through JAXP used through JAXP ((next)next)