sdpl 20063.2: document object model1 n how to provide uniform access to structured documents in...
TRANSCRIPT
SDPL 2006 3.2: Document Object Model 1
How to provide uniform access to structured How to provide uniform access to structured documents in parsers, browsers, editors, documents in parsers, browsers, editors, databases,...? - DOM, and how to use itdatabases,...? - DOM, and how to use it
Selective overview of the W3C DOM SpecSelective overview of the W3C DOM Spec– second in “XML-family” of Recssecond in “XML-family” of Recs
» Level 1, W3C Rec, Oct. 1998Level 1, W3C Rec, Oct. 1998
» Level 2Level 2, W3C Rec, Nov. 2000, W3C Rec, Nov. 2000
» Level 3 (of 21 modules!) work-in-progress; Level 3 (of 21 modules!) work-in-progress; ValidationValidation, , CoreCore, and , and Load and SaveLoad and Save Recommendations (Spring 2004)Recommendations (Spring 2004)
3.2 Document Object Model (DOM)3.2 Document Object Model (DOM)
SDPL 2006 3.2: Document Object Model 2
DOM: What is it? DOM: What is it?
An object-based, language-neutral API for An object-based, language-neutral API for XML and HTML documentsXML and HTML documents
– Allows programs and scripts to build, access, and Allows programs and scripts to build, access, and modify documentsmodify documents
– Supports the development of Supports the development of querying, filtering, querying, filtering, transformation, formatting etc. transformation, formatting etc.
applications on top of DOM implementationsapplications on top of DOM implementations In contrast to “In contrast to “SSerial erial AAccess ccess XXML” could think ML” could think
as “as “DDirectly irectly OObtainable in btainable in MMemory”emory”
SDPL 2006 3.2: Document Object Model 3
DOM DOM structure model structure model
Based on O-O concepts:Based on O-O concepts:– methodsmethods (to access or change object’s state) (to access or change object’s state)– interfacesinterfaces (declaration of a set of methods) (declaration of a set of methods) – objectsobjects (encapsulation of data and methods) (encapsulation of data and methods)
Roughly similar to the XSLT/XPath data model (to be Roughly similar to the XSLT/XPath data model (to be discussed later) discussed later) syntax-tree syntax-tree– Tree structure implied by abstract relationships defined Tree structure implied by abstract relationships defined
by the API; Data structures of an implementation may by the API; Data structures of an implementation may differ (but hardly do(?)) differ (but hardly do(?))
SDPL 2006 3.2: Document Object Model 4
<invoice form="00" <invoice form="00" type="estimated">type="estimated"> <addressdata><addressdata> <name>John Doe</name><name>John Doe</name> <address><address> <streetaddress>Pyynpolku 1<streetaddress>Pyynpolku 1 </streetaddress></streetaddress> <postoffice>70460 KUOPIO<postoffice>70460 KUOPIO </postoffice></postoffice> </address></address> </addressdata></addressdata> ......
DOM structure modelDOM structure model
invoiceinvoice
namename
addressdataaddressdata
addressaddress
form="00"form="00"type="estimated"type="estimated"
John DoeJohn Doe streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
......
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
SDPL 2006 3.2: Document Object Model 5
Structure of DOM Level 1Structure of DOM Level 1
I: DOM I: DOM Core InterfacesCore Interfaces– Fundamental interfacesFundamental interfaces
» basic interfaces: Document, Element, Attr, Text, ...basic interfaces: Document, Element, Attr, Text, ...– "Extended" (XML specific) interfaces "Extended" (XML specific) interfaces
» CDATASection, DocumentType, Notation, Entity, CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstructionEntityReference, ProcessingInstruction
II: DOM II: DOM HTML InterfacesHTML Interfaces– more convenient access to HTML documentsmore convenient access to HTML documents– (we'll ignore these)(we'll ignore these)
SDPL 2006 3.2: Document Object Model 6
DOM Level 2DOM Level 2
– Level 1: basic representation and manipulation of Level 1: basic representation and manipulation of document structure and content document structure and content (No access to the contents of a DTD)(No access to the contents of a DTD)
DOM Level 2 adds DOM Level 2 adds – support for namespacessupport for namespaces– accessing elements by ID attribute valuesaccessing elements by ID attribute values– optional features (we’ll skip these)optional features (we’ll skip these)
» interfaces to document views and style sheetsinterfaces to document views and style sheets» an event model (for, say, user actions on elements)an event model (for, say, user actions on elements)» methods for traversing the document tree and manipulating methods for traversing the document tree and manipulating
regions of document (e.g., selected by the user of an editor)regions of document (e.g., selected by the user of an editor)
– Load/Save of documents Load/Save of documents notnot specified specified (until Level 3)(until Level 3)
SDPL 2006 3.2: Document Object Model 7
DOM Language BindingsDOM Language Bindings
Language-independence:Language-independence:– DOM interfaces are defined using OMG Interface DOM interfaces are defined using OMG Interface
Definition Language (IDL; Defined in Corba Definition Language (IDL; Defined in Corba Specification)Specification)
Language bindings (implementations of Language bindings (implementations of interfaces) defined in the Recommendation forinterfaces) defined in the Recommendation for– Java (See the Java API doc) andJava (See the Java API doc) and– ECMAScript (standardised JavaScript)ECMAScript (standardised JavaScript)
SDPL 2006 3.2: Document Object Model 8
Core Interfaces: Core Interfaces: NodeNode & its variants & its variants
NodeNode
CommentComment
DocumentFragmentDocumentFragment AttrAttr
TextText
ElementElement
CDATASectionCDATASection
ProcessingInstructionProcessingInstruction
CharacterDataCharacterData
EntityEntityDocumentTypeDocumentType NotationNotation
EntityReferenceEntityReference
““Extended Extended interfaces”interfaces”
DocumentDocument
SDPL 2006 3.2: Document Object Model 9
DOM interfaces: DOM interfaces: NodeNode
invoiceinvoice
namename
addressdataaddressdata
addressaddress
form="00"form="00"type="estimatedbill"type="estimatedbill"
John DoeJohn Doe streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
NodeNodegetNodeType, getNodeName, getNodeType, getNodeName, getNodeValuegetNodeValuegetOwnerDocumentgetOwnerDocumentgetParentNodegetParentNodehasChildNodes, getChildNodeshasChildNodes, getChildNodesgetFirstChild, getLastChildgetFirstChild, getLastChildgetPreviousSibling, getNextSiblinggetPreviousSibling, getNextSiblinghasAttributes, getAttributeshasAttributes, getAttributesappendChild(newChild)appendChild(newChild)insertBefore(newChild,refChild)insertBefore(newChild,refChild)replaceChild(newChild,oldChild)replaceChild(newChild,oldChild)removeChild(oldChild)removeChild(oldChild)
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
...
SDPL 2006 3.2: Document Object Model 10
Type and Name of aType and Name of a NodeNode
node.getNodeType()node.getNodeType():: short intshort int constants 1, 2, …, 12 forconstants 1, 2, …, 12 for Node.ELEMENT_NODENode.ELEMENT_NODE,, Node.ATTRIBUTE_NODENode.ATTRIBUTE_NODE,, Node.TEXT_NODENode.TEXT_NODE, …, …
node.getNodeName()node.getNodeName()– for an for an Element = node.getTagName()Element = node.getTagName()– for an for an AttrAttr:: the name of the attribute the name of the attribute– for anonymous nodesfor anonymous nodes: : "#text""#text", , "#document""#document", , "#comment""#comment" etc etc
SDPL 2006 3.2: Document Object Model 11
The Value of aThe Value of a NodeNode
node.getNodeValue()node.getNodeValue() – content of a text node, content of a text node,
value of attribute, …; value of attribute, …; nullnull for an for an ElementElement (!!) (!!)
– (in XSLT/XPath the value of a node is its full textual (in XSLT/XPath the value of a node is its full textual content)content)
– DOM DOM 33 gives access to full textual content with the gives access to full textual content with the methodmethod node.getTextContent()node.getTextContent()
SDPL 2006 3.2: Document Object Model 12
Object Creation in DOMObject Creation in DOM
Each DOM Node Each DOM Node nn lives in the scope of a lives in the scope of a DocumentDocument: : nn..getOwnerDocument()getOwnerDocument()
Objects implementing interface Objects implementing interface XX are created are created by factory methods by factory methods
doc.createdoc.createXX(…)(…) , ,where where docdoc is a is a DocumentDocument object. E.g:object. E.g: doc.createElement("A"), doc.createElement("A"), doc.createAttribute("href"), doc.createAttribute("href"), doc.createTextNode("Hello!")doc.createTextNode("Hello!")
Loading and saving of Loading and saving of DocumentDocuments is left s is left implementation-specificimplementation-specific
SDPL 2006 3.2: Document Object Model 13
invoiceinvoice
namename
addressdataaddressdata
addressaddress
form="00"form="00"type="estimated"type="estimated"
John DoeJohn Doe streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
DocumentDocumentgetDocumentElementgetDocumentElementgetElementById(IdVal)getElementById(IdVal)getElementsByTagName(tagName)getElementsByTagName(tagName)
createElement(tagName)createElement(tagName)createTextNode(data)createTextNode(data)
NodeNode DOM interfaces: DOM interfaces: DocumentDocument
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
......
SDPL 2006 3.2: Document Object Model 14
DOM interfaces: DOM interfaces: ElementElement
invoiceinvoice
invoicepageinvoicepage
namename
addresseeaddressee
addressdataaddressdata
addressaddress
form="00"form="00"type="estimatedbill"type="estimatedbill"
John DoeJohn Doe streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
ElementElementgetTagName()getTagName()
hasAttribute(name)hasAttribute(name)getAttribute(name)getAttribute(name)setAttribute(attrName, value)setAttribute(attrName, value)removeAttribute(name)removeAttribute(name)
getElementsByTagName(name)getElementsByTagName(name)
NodeNode
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
SDPL 2006 3.2: Document Object Model 15
Text Content Manipulation in DOMText Content Manipulation in DOM
for an object for an object cc that implements the that implements the CharacterDataCharacterData interface interface (Text, Comments, CDATASections)(Text, Comments, CDATASections)::– c.c.substringData(substringData(offset, countoffset, count)) – c.c.appendData(appendData(stringstring)) – c.c.insertData(insertData(offset, stringoffset, string)) – c.c.deleteData(deleteData(offset, countoffset, count)) – c.c.replaceData(replaceData(offset, count, stringoffset, count, string))
( = ( = c. c.deleteData(offset, count);deleteData(offset, count); c.c.insertData(offset, string) )insertData(offset, string) )
SDPL 2006 3.2: Document Object Model 16
Additional Core Interfaces (1)Additional Core Interfaces (1)
NodeListNodeList for ordered lists of nodesfor ordered lists of nodes– e.g. frome.g. from Node.getChildNodes()Node.getChildNodes() or or Element.getElementsByTagName("name")Element.getElementsByTagName("name")
» all descendant elements of type all descendant elements of type ""namename" " in document in document order (order (""**" " matches any element type)matches any element type)
Accessing a specific node, or iterating over all Accessing a specific node, or iterating over all nodes of a nodes of a NodeListNodeList::– E.g., to process all children of E.g., to process all children of nodenode::for (i=0;for (i=0;
i<node.i<node.getChildNodesgetChildNodes().().getLengthgetLength(); (); i++) i++)
process(node.process(node.getChildNodesgetChildNodes().().itemitem(i));(i));
SDPL 2006 3.2: Document Object Model 17
Additional Core Interfaces (2)Additional Core Interfaces (2)
NamedNodeMapNamedNodeMap for unordered sets of nodes for unordered sets of nodes accessed by their name:accessed by their name:– e.g. frome.g. from Node.getAttributes()Node.getAttributes()
NodeListNodeLists and s and NamedNodeMapNamedNodeMaps are "live":s are "live":– updates of the document structure are reflected to updates of the document structure are reflected to
their contentstheir contents– e.g., this would delete e.g., this would delete every otherevery other child of node child of node nn::
NodeList cList = n.NodeList cList = n.getChildNodesgetChildNodes();(); for (i=0; i<cList.for (i=0; i<cList.getLengthgetLength(); i++) (); i++)
n.n.removeChildremoveChild(cList.(cList.itemitem(i));(i));
» That’s strange! (What happens?)That’s strange! (What happens?)
SDPL 2006 3.2: Document Object Model 18
DOM: XML ImplementationsDOM: XML Implementations
JavaJava-based parsers -based parsers e.g. e.g. Apache Xerces, Apache Crimson, … Apache Xerces, Apache Crimson, …
In MS IE browser: COM programming interfaces for In MS IE browser: COM programming interfaces for C/C++C/C++ and and Visual BasicVisual Basic; ActiveX object ; ActiveX object programming interfaces for programming interfaces for script languagesscript languages
PerlPerl: : XML::DOMXML::DOM (Implements DOM Level 1) (Implements DOM Level 1) Others?Others? APIs for other applications than parsers? APIs for other applications than parsers?
– Vendors of different kinds of systems have participated in Vendors of different kinds of systems have participated in the W3C DOM WGthe W3C DOM WG
SDPL 2006 3.2: Document Object Model 19
A Java-DOM ExampleA Java-DOM Example
Command-line tool Command-line tool RegListMgrRegListMgr for maintaining for maintaining a course registration lista course registration list– with single-letter commands for with single-letter commands for llisting, isting, aadding, dding,
uupdating and pdating and ddeleting student recordseleting student records Example:Example:
$ java RegListMgr reglist.xml$ java RegListMgr reglist.xmlDocument loaded succesfullyDocument loaded succesfully> > ……40: Tero Ulvinen, TKM1, [email protected], 240: Tero Ulvinen, TKM1, [email protected], 241: heli viinikainen, tkt5, [email protected], 141: heli viinikainen, tkt5, [email protected], 1
list the contentslist the contentsll
SDPL 2006 3.2: Document Object Model 20
Registration list: the XML fileRegistration list: the XML file
<?xml version="1.0" encoding="ISO-8859-1"?><?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE reglist SYSTEM "reglist.dtd"><!DOCTYPE reglist SYSTEM "reglist.dtd"><reglist lastID="41"><reglist lastID="41"> <student id="RDK1"><student id="RDK1"> <name><given>Juho</given><name><given>Juho</given>
<family>Ahopelto</family></name> <family>Ahopelto</family></name> <branchAndYear>TKT4</branchAndYear><branchAndYear>TKT4</branchAndYear> <email>[email protected]</email><email>[email protected]</email> <group>2</group><group>2</group> </student></student> <!-- … and the other students … --><!-- … and the other students … --></reglist></reglist>
SDPL 2006 3.2: Document Object Model 21
Registration List: the DTDRegistration List: the DTD
<!ELEMENT reglist (student*)><!ELEMENT reglist (student*)>
<!ATTLIST reglist <!ATTLIST reglist lastID CDATA #REQUIRED > lastID CDATA #REQUIRED >
<!ELEMENT student <!ELEMENT student (name, branchAndYear, email, group)> (name, branchAndYear, email, group)>
<!ATTLIST student <!ATTLIST student id ID #REQUIRED > id ID #REQUIRED >
<!ELEMENT name (given, family)><!ELEMENT name (given, family)>
<!ELEMENT given (#PCDATA)><!ELEMENT given (#PCDATA)>
<!-- … and the same for family, <!-- … and the same for family, branchAndYear, email,and group -->branchAndYear, email,and group -->
SDPL 2006 3.2: Document Object Model 22
Loading and Saving the RegList Loading and Saving the RegList
Loading of the registration list into Loading of the registration list into DOMDOM DocumentDocument docdoc implemented with a implemented with a JAXP JAXP DocumentBuilderDocumentBuilder– assume this has been done: assume this has been done: docdoc is a is a
handle to the handle to the DocumentDocument Saving implemented with a Saving implemented with a
JAXP JAXP TransformerTransformer … … to be discussed laterto be discussed later
SDPL 2006 3.2: Document Object Model 23
Listing student records (1)Listing student records (1)
NodeListNodeList students = students = doc.doc.getElementsByTagNamegetElementsByTagName("student");("student");
for (int i=0; i<students.for (int i=0; i<students.getLengthgetLength(); i++) (); i++)
showStudent((showStudent((ElementElement) students.) students.itemitem(i));(i));
private void showStudent(private void showStudent(ElementElement student) { student) { // Collect relevant sub-elements:// Collect relevant sub-elements: NodeNode given = given =
student. student.getElementsByTagNamegetElementsByTagName("given").("given").itemitem(0);(0); NodeNode family = given. family = given.getNextSiblinggetNextSibling();(); NodeNode bAndY = student. bAndY = student.
getElementsByTagNamegetElementsByTagName("branchAndYear").("branchAndYear").itemitem(0);(0); NodeNode email = bAndY. email = bAndY.getNextSiblinggetNextSibling();(); NodeNode group = email. group = email.getNextSiblinggetNextSibling();();
SDPL 2006 3.2: Document Object Model 24
Listing student records (2)Listing student records (2)
// Method showStudent continues:// Method showStudent continues:
System.out.print(System.out.print( student.student.getAttributegetAttribute("id").substring(3));("id").substring(3));
System.out.print(": " + System.out.print(": " + given.given.getFirstChildgetFirstChild().().getNodeValuegetNodeValue() );() );
// .. similarly access and display the // .. similarly access and display the // value of family, bAndY, email, and group// value of family, bAndY, email, and group// … // …
} // showStudent} // showStudent
SDPL 2006 3.2: Document Object Model 25
Adding New RecordsAdding New Records
Example:Example:
First name (or <return> to finish):First name (or <return> to finish):> a> a
… …41: heli viinikainen, tkt5, [email protected], 141: heli viinikainen, tkt5, [email protected], 142: Antti Ahkera, tkt3, [email protected], 242: Antti Ahkera, tkt3, [email protected], 2
add studentsadd students
AnttiAnttiLast name:Last name: AhkeraAhkeraBranch&year:Branch&year:
Finished adding recordsFinished adding records> >
email:email:tkt3tkt3
group:group:[email protected]@fake.addr.fi
First name (or <return> to finish):First name (or <return> to finish):22
ll
SDPL 2006 3.2: Document Object Model 26
Implementing addition of records (1)Implementing addition of records (1)
ElementElement rootElem = doc. rootElem = doc.getDocumentElementgetDocumentElement();();
String lastID = rootElem.String lastID = rootElem.getAttributegetAttribute("lastID");("lastID");
int lastIDnum = java.lang.Integer.parseInt(lastID);int lastIDnum = java.lang.Integer.parseInt(lastID);
System.out.print(System.out.print("First name (or <return> to finish): ");"First name (or <return> to finish): ");
String firstName =String firstName =terminalReader.readLine().trim(); terminalReader.readLine().trim();
while (firstName.length() > 0) {while (firstName.length() > 0) {
// Get the next unused ID:// Get the next unused ID:
ID = "RDK" + new Integer(++lastIDnum).toString();ID = "RDK" + new Integer(++lastIDnum).toString();
// … Read values lastName, bAndY, email, // … Read values lastName, bAndY, email,
// and group from the terminal, and then ...// and group from the terminal, and then ...
SDPL 2006 3.2: Document Object Model 27
Implementing addition of records (2)Implementing addition of records (2)
ElementElement newStudent = newStudent = newStudent(doc, ID, firstName, lastName, newStudent(doc, ID, firstName, lastName, bAndY, email, group); bAndY, email, group);
rootElem.rootElem.appendChildappendChild(newStudent);(newStudent);
System.out.print(System.out.print("First name (or <return> to finish): ");"First name (or <return> to finish): ");
firstName = readLine();firstName = readLine();
} // while firstName.length() > 0} // while firstName.length() > 0
// Update the last ID used:// Update the last ID used:
String newLastID =String newLastID =java.lang.Integer.toString(lastIDnum);java.lang.Integer.toString(lastIDnum);
rootElem.rootElem.setAttributesetAttribute("lastID", newLastID);("lastID", newLastID);
System.out.println("Finished adding records");System.out.println("Finished adding records");
SDPL 2006 3.2: Document Object Model 28
Creating new student records (1)Creating new student records (1)
private private ElementElement newStudent( newStudent(DocumentDocument doc, String ID, doc, String ID, String fName, String lName, String bAndY, String fName, String lName, String bAndY, String email, String grp) { String email, String grp) {
ElementElement stu = doc. stu = doc.createElementcreateElement("student");("student"); stu.stu.setAttributesetAttribute("id", ID);("id", ID); ElementElement newName = doc. newName = doc.createElementcreateElement("name");("name"); ElementElement newGiven = doc. newGiven = doc.createElementcreateElement("given");("given");
newGiven. newGiven.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(fName));(fName)); ElementElement newFamily = doc. newFamily = doc.createElementcreateElement("family");("family"); newFamily.newFamily.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(lName));(lName)); newName.newName.appendChildappendChild(newGiven);(newGiven); newName.newName.appendChildappendChild(newFamily);(newFamily);
stu. stu.appendChildappendChild(newName);(newName);
SDPL 2006 3.2: Document Object Model 29
Creating new student records (2)Creating new student records (2)
// method newStudent(…) continues:// method newStudent(…) continues:ElementElement newBr = newBr =
doc.doc.createElementcreateElement("branchAndYear");("branchAndYear");
newBr.newBr.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(bAndY));(bAndY));
stu.appendChild(newBr);stu.appendChild(newBr);
ElementElement newEmail = doc. newEmail = doc.createElementcreateElement("email");("email");
newEmail.newEmail.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(email));(email));
stu.stu.appendChildappendChild(newEmail);(newEmail);
ElementElement newGrp = doc. newGrp = doc.createElementcreateElement("group");("group");
newGrp.newGrp.appendChildappendChild(doc.(doc.createTextNodecreateTextNode(group));(group));
stu.stu.appendChildappendChild(newGrp);(newGrp);
return stu;return stu;
} // newStudent} // newStudent
SDPL 2006 3.2: Document Object Model 30
Updates and DeletionsUpdates and Deletions
Updates and deletions implemented Updates and deletions implemented similarly, by manipulating the DOM similarly, by manipulating the DOM structuresstructures
To be treated in the exercisesTo be treated in the exercises
SDPL 2006 3.2: Document Object Model 31
Summary of XML APIs so farSummary of XML APIs so far
Give applications access to the structure and Give applications access to the structure and contents of XML documentscontents of XML documents
Event-based APIs (e.g. SAX)Event-based APIs (e.g. SAX)– notify application through parsing eventsnotify application through parsing events– efficientefficient
Object-model (or tree) based APIs (e.g. DOM)Object-model (or tree) based APIs (e.g. DOM)– provide a full parse treeprovide a full parse tree– more convenient, but require much resources with more convenient, but require much resources with
large documentslarge documents Major parsers support both SAX and DOMMajor parsers support both SAX and DOM
– used through proprietary methodsused through proprietary methods– used through JAXP used through JAXP ((next)next)