xml. dcs – swc 2 data vs. information we often use the terms data and information interchangeably...

52
XML

Upload: lilian-willis

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

XML

Page 2: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 2

Data vs. Information

• We often use the terms data and information interchangeably

• More precisely, data is some ”value” of a certain type, like– 33– ”High Street 7”– false

• Data comes without a context

Page 3: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 3

Data vs. Information

• When we provide a context for the data, the data ( + the context) becomes information, like:– The age of Alan Wake is 33 years– John Peterson lives at High Street 7– Is Petra Wilson married? false

• Data + Context = Information

Page 4: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 4

Data vs. Information

• We could also denote the context as ”data about the data”

• This is often referred to as meta-data

• Information is thus composed of:– Data– Meta-data

Page 5: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 5

Data vs. Information

• This is – more or less – how we structure our communication with each other– The age of Alan Wake is 33 years– John Peterson lives at High Street 7– Is Petra Wilson married? false

• Meta-data and data

• One part is not very useful without the other part…

Page 6: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 6

Data vs. Information

• Of course, we are often somewhat ”implicit” when we communicate:– He is 22 years (who…?)– My dog is named Kaya (what kind of dog…?)– John is ill (Who is John, what illness…?)

• We sometimes assume part of the context implicitly, otherwise it would be very tedious to communicate…

Page 7: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 7

Transmitting information

• When computers transmit information, they can also be more or less implicit

• A method call is a kind of data transmis-sion, which is highly implicit:

CalculateFactorial(int n)• n is ”the number for which we want to

calculate the factorial”

Page 8: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 8

Transmitting information

• Suppose a program needs to receive information about some product

• A product has– A name– A price– A weight

• How can we transmit this information to the program?

Page 9: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 9

Transmitting information

• Perhaps just put the data into a file:

• ”Milk 4.95 1000”

• The meaning being:– The name of the product– The price of the product (in kroner)– The weight of the product (in grams)– Each element separated by a ” ”

Page 10: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 10

Transmitting information

• The program can then just read the file, and ”decode” the data

• However, this assumes that sender and receiver of the data have agreed about how to interpret the file content!

Page 11: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 11

Transmitting information

• Advantages– A compact format, no space wasted– Fast to process

• Disadvantages– Static, hard to change– Receiver and sender tied to each other– What about other recipients?– Not humanly readable

Page 12: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 12

Transmitting information

• Main problem: Meta-data is ”encoded” in the receiving program

• Probably better to make meta-data explicit, to overcome disadvantages

• Use a ”markup language” to include meta-data in the transmission

Page 13: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 13

Markup languages

• In a markup language, we can ”mark” data in a way which conveys the context

• We mark the data with meta-data

• An example of a markup language is HTML (HyperText Markup Language):

This is <b>very</b> good

Page 14: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 14

Markup languages

• The markings <b> and </b> are markings (tags) indicating that some meta-data should be applied to the data between the tags – write it in bold

• In HTML, tags are used for formatting and structure of ”documents”, not for defining structure of data as such

• Enter XML!

Page 15: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 15

What is XML…?

• eXtensible

• Markup

• Language

Page 16: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 16

XML

• XML can be seen as a genera-lisation of HTML – tags can be used for everything!

• All kinds of meta-data can be included as tags in XML

• Important! XML does not define anything about presentation of data

Page 17: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 17

XML

• A product defined in XML:

<product>

<name>Milk</name>

<price>4.95</price>

<weight>1000</weight>

</product>

Start the Product description

End the Product description

Page 18: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 18

XML

• XML is highly recursive

• Inside a definition, we can have a number of ”child” definitions

• At some point, the definitions only contains data, like ”Milk”

• A definition can also have attributes associated with it

Page 19: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 19

XML

<product>

<name>Milk</name>

<price currency=”DKK”>4.95</price>

<weight unit=”gram”>1000</weight>

</product>

Page 20: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 20

XML

• When to use attributes vs a child element

• Attributes should not be data in itself, it should be information about some data element

• Not a strict rule…

• When in doubt, use child elements

Page 21: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 21

XML

<product name=”Milk” price=”4.95” weight=”1000”/>

• Tempting, but not in the spirit of XML…

• Harder to process by recipient

Page 22: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 22

XML

• The general structure of an XML document is then– An XML declaration:

<?xml version=”1.0”?>

– A root element containing the data<products>

</products>

– Inside the root element; all the child elements

Page 23: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 23

XML

<?xml=”version 1.0”?>

<products><product>

<name>Milk</name>

<price currency=”DKK”>4.95</price>

<weight unit=”gram”>1000</weight>

</product>

<product>

<name>Orange Juice</name>

<price currency=”DKK”>8.95</price>

<weight unit=”gram”>500</weight>

</product>

...

</products>

Page 24: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 24

Processing XML documents

• How do we process an XML document, in order to retrieve data from it?

• We apply an XML parser to the document

• The XML parser transforms the XML document into a tree structure

• The tree structure follows the Document Object Model (DOM)

Page 25: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 25

Processing XML documents

products

product product

name price weight

Milk 4.95 1000

Page 26: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 26

Processing XML documents

DocumentBuilderFactory fac =

DocumentBuilderFactory.newInstance();

DocumentBuilder builder = fac.newDocumentBuilder();

String fileName = ...;

File xmlFile = new File(fileName);

Document doc = builder.parse(xmlFile);

// Now doc contains the DOM tree

...

Page 27: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 27

Processing XML documents

• Given a tree following the DOM standard, we can address various elements in the tree, using the XPath syntax– XPath describes a single node in the tree, or

a set of nodes– Syntax similar to directory paths

Page 28: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 28

Processing XML documents

products

product product

name price weight

Milk 4.95 1000

/products/product[1]/weight

Page 29: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 29

Processing XML documents

• Other XPath constructions:– count(/products/product) – get the

number of product instances– /products/product[1]/weight/@unit

– get the value of the attribute unit– name(/products/product[1]/*[1]) –

get the name of the first child of the first product

Page 30: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 30

Processing XML documents

XPathFactory xpfac = XPathFactory.newInstance();

XPath path = xpfac.newXPath();

...

String result = path.evaluate(”/products/product[1]/price”,doc);

// Now result contains the price of the first product

...

Page 31: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 31

Processing XML documents

• In general, we will convert an XML document into a number of Java objects

• We map XML data to Java classes

• Up to us to define proper classes to store the data – XML does not know about classes, data is ”objects”

• Each element in an XML document is like an instance field, not a class…

Page 32: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 32

Creating XML documents

• In addition to processing given XML documents, we often wish to program-matically produce XML documents

• For this purpose, we again use the Document-Builder classes

Page 33: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 33

Creating XML documents

DocumentBuilderFactory fac =

DocumentBuilderFactory.newInstance();

DocumentBuilder builder = fac.newDocumentBuilder();

Document doc = builder.newDocument();

// Now doc contains an empty DOM tree

...

Page 34: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 34

Creating XML documents

• We must now insert node elements into the tree, corresponding to the structure of the data

• Fundamental methods are:– createElement(String name);– setAttribute(String name,String value);– createTextNode(String text);– appendChild(Element e);

Page 35: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 35

Creating XML documents

createElement(String name);• Creates an empty new element, with the

given name

• Is called on the document object

• On a new element, we will– Set value of attributes– Add child elements, or– Add text nodes

Page 36: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 36

Creating XML documents

appendChild(Element e);• Is itself called on an element

• Appends the element e as a child on itself

• This is how we create the structure for the tree!

Page 37: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 37

Creating XML documents

• The previous methods are enough to create a DOM tree

• Usually, we combine the methods into ”helper methods”, designed to insert a certain type of element

• Helper methods will often call other helper methods, depending on tree structure

Page 38: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 38

Creating XML documents

private Element createTextElement(String name, String text)

{

Text t = doc.createTextNode(text);

Element e = doc.createElement(name);

e.appendChild(t);

return e;

}

Page 39: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 39

Creating XML documents

private Element createProduct(Product p)

{

Element e = doc.createElement(”product”);

e.appendChild(createTextElement(”name”, p.getName()));

e.appendChild(createTextElement(”price”, p.getPrice()));

e.appendChild(createTextElement(”weight”, p.getWeight()));

return e;

}

Page 40: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 40

Creating XML documents

private Element createProducts(ArrayList<Product> pList)

{

Element e = doc.createElement(”products”);

for (product p : pList)

{

e.appendChild(createProduct(p));

}

return e;

}

Page 41: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 41

Creating XML documents

DocumentBuilderFactory fac =

DocumentBuilderFactory.newInstance();

DocumentBuilder builder = fac.newDocumentBuilder();

Document doc = builder.newDocument;

// Now doc contains an empty DOM tree

ArrayList<Product> pList = ...;

Element root = createProducts(pList);

doc.appendChild(root);

Page 42: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

Creating XML documents

• Final step – convert the completed DOM tree to a string (which could then be displayed on screen or written to a file)

• Requires a bit of ”black maigic”…

DCS – SWC 42

Page 43: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 43

Creating XML documents

DOMImplementation impl = doc.getImplementation();

DOMImplementationLS implLS =

(DOMImplementationLS) impl.getFeature(”LS”, ”3.0”);

LSSerializer ser = implLS.createLSSerializer();

ser.getDomConfig().setParameter(”format-pretty-print”, true);

String str = ser.writeToString(doc);

Page 44: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 44

Validating XML documents

• It will often be convenient to know if an XML document obeys certain rules about its content

• Can e.g make processing easier – do not need to include error handling

• Specification of such rules can be done in various ways

Page 45: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 45

Validating XML documents

• Original way – use a DTD

• DTD – Document Type Definition

• A DTD is a sequence of rules describing– The valid attributes for each element type– The valid child elements for each element

type

Page 46: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 46

Validating XML documents

• Examples of DTD rules:– <!ELEMENT products (product*)> - a

products element must contain zero or more elements of type product

– <!ELEMENT product (name, price, weight)> - a product element must have the children: one name, one price, one weight, in that order

– <!ELEMENT name (#PCDATA)> - a name element must have a child of type text

Page 47: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 47

Validating XML documents

• In order to validate an XML document against a DTD, the DTD must be specified– Can be included in the XML document– Can be referenced

• NOTE: Validation is optional, it is up to us to do it…

Page 48: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 48

Validating XML documents

• A more modern way of validating XML documents is by using an XSD

• XSD – XML Schema Definition

• Provides a more general framework for specification of the document format

• Is itself written in XML

• Comes closer to actual class definitions

Page 49: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 49

Validating XML documents

<xsd:complexType name=”product”>

<xsd:sequence>

<xsd:element name=”name” type=xsd:string>

<xsd:element name=”price” type=xsd:float>

<xsd:element name=”weight” type=xsd:integer>

</xsd:sequence>

</xsd:complexType>

Page 50: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

Transforming XML documents

• A common task is to transform data given in XML format to ”something else”…

• Reading/writing XML in Java transforms the data to an in-memory object model

• This is a ”programmatic” transformation, we can also imagine more static or declarative transformations

DCS – SWC 50

Page 51: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

DCS – SWC 51

Transforming XML documents

• Such a transformation can be specified by a so-called XSLT (XSL Transformation)

• Specifies a transformation from the XML document into….anything!– A Word document– A HTML page– Java code (!)– …?

Page 52: XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,

Transforming XML documents

• Example: A complex electronic device is described in XML

• We wish to create a software model of the device, with classes, interfaces, etc., to enable software simulation of the device

• The transformation from XML to Java code could be done by an XSLT

• Input: XML, Output: Java code…

DCS – SWC 52