chapter 27 the world wide web and xml. copyright © 2004 pearson addison-wesley. all rights...

25
Chapter 27 The World Wide Web and XML

Upload: linda-perry

Post on 30-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Chapter 27

The World Wide Web and XML

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-2

Topics in this Chapter

• The Web and the Internet• An Overview of XML• XML Data Definition• XML Data Manipulation• XML and Databases• SQL Facilities

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-3

The Web and the Internet

• Often thought of as synonymous, the Web and the Internet refer to two different arenas

• The Web is a gigantic amorphous database • The Internet is a giant network• URL’s are used to locate resources on the

network(Uniform Resource Locator/Identifier)

• Markup languages are used to interact with the database

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-4

Hypertext

• Hypertext Markup Language is a simple language for creating and displaying documents

• Hypertext Transfer Protocol(HTTP) is used to transfer these documents over the internet

• At each server data can be served up from system files, or from databases

• The databases on web servers can be SQL databases

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-5

XML

• XML provides extensions that permit the markup language to interact with hypertext as well as many other languages, including SQL, and so is useful when implementing web databases

• XML normally begins with a header called a declaration, followed by an element, consisting of start tag, character data, and end tag

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-6

XML

• XML normally begins with a header called a declaration, followed by an element, consisting of start tag, character data, and end tag

• XML declaration XML element start tag, character data, end tag

• <?xml version=”1.0”>• <greeting kind=“succinct”>Hello, World.</greeting>

greeting tag;kind=“succinct” XML attributeAttribute name is “kind”; value=“succinct”

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-7

XML History

• XML was created in 1996 to overcome limitations in SGML and HTML

• SGML is large and complicated• HTML fails to separate structural, semantic,

and formatting meta-data, and is not always “well-formed”

• XML has not supplanted HTML in web browsers, but is used in other areas, especially data interchange

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-8

XML History

SGML is large and complicated. It allow user to define their own tags and give their meaning

• <Paragraph> <Sentence>

<Subject> You</ Subject > <verb> Should specify</ verb > <Object>the <adjective> <emph1> first</ emph1 > </adjective> parameter </ Object ></ Sentence ><Sentence>……..</ Sentence >

• </Paragraph>

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-9

XML Applications

• Purchase orders, parts catalogues, and inventory records can be expressed in XML

• A database could consist of XML documents only, but it would NOT be relational

• XML can be used to represent relations, which could facilitate interchange between the internet and relational databases

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-10

XML Applications

• <?xml version+”1.0”>• <!– This is an XML representation of the parts relatoin --- >• <!-This partsdatabase Parts # PNAME COLOR WEIGHT CITY>

• <Partsrelation>

<Partstuple>

<PNUM> P1</PNUM>

<PNAME> NUT</ PNAME >

<COLOR> RED> </COLOR>

<WEIGHT> 12 </WEIGHT>

<CITY>LONDON </ CITY >

</Partstuple>

<Partstuple>

<PNUM> P2</PNUM>

<PNAME>left-wing-part-10th-part-Bolt</ PNAME >

<COLOR> Green </COLOR>

<WEIGHT> 17 </WEIGHT>

<CITY>Paris </ CITY >

</Partstuple>

• </Partsrelation>

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-11

XML Applications

• An XML information set is a document hierarchy

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-12

XML Hierarchy

• The root node is the top, and it has children• Each child has one parent• Relations are structured; XML documents are

said to be semi-structured, because its rules are looser

• An API to XML’s document object model supports retrieval, insertion, deletion and updates(pp901)

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-13

DTDs

• Document Type Definitions can be constructed using the DTD definition language

• DTDs are part of the XML standard• A DTD can mirror the structure of a relation

and then be used to format the output from queries

• In turn, the XML document produced can be used to generate a relation at the other end

• Text objects must be well-formed and valid

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-14

XML Applications

• <?xml version=”1.0”>• <!– This is an XML representation of the parts relatoin --- >• <DOCTYPE. . . .>

• <Partsrelation>

<Note>Revised Version </Note>

<Partstuple CITY= “LONDON” >

<PNUM> P1</PNUM>

<PNAME> NUT</ PNAME >

<WEIGHT> 12 </WEIGHT>

<Note>Part COLOR is Red by Default </Note>

</Partstuple>

<Partstuple COLOR=“GREEN”, CITY= “PARIS” >

<PNUM> P2</PNUM>

<PNAME>left-wing-part-10th-part-Bolt</ PNAME >

<COLOR> Green </COLOR>

<WEIGHT> 17 </WEIGHT>

</Partstuple>

• </Partsrelation>

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-15

XML Applications

• <?xml version+”1.0”>

• <!– This is an XML representation of the parts relation --- >• <!-Marker’s meaning ? Zero or one, * zero or more, +1 or more>• <DOCTYPE. . . .>• <Partsrelation>

• 1. <!elements Partsrelation (Note. Partuple*)><Note>Revised Version </Note> 2. <!elements Note(#PCDATA)>

<Partstuple CITY= “LONDON” >3. <!elements Partuple (PNUM. PNAME, WEIGHT, NOTE?)>

4. <attribute Partuple CITY(LONDON|Oslo|Paris) #required COLOR( Red|Green|Blue) “Red”>

<PNUM> P1</PNUM> 5. <!elementsPNUM (#PCDATA)>

<PNAME> NUT</ PNAME > 6. <!elementsPNAME (#PCDATA)>

<WEIGHT> 12 </WEIGHT> 7. <!elements WEIGHT (#PCDATA)>

<Note>Part COLOR is Red by Default </Note></Partstuple><Partstuple COLOR=“GREEN”, CITY= “PARIS” >

<PNUM> P2</PNUM> <PNAME>left-wing-part-10th-part-Bolt</ PNAME > <COLOR> Green </COLOR> <WEIGHT> 17 </WEIGHT></Partstuple>

• </Partsrelation>

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-16

Well-Formedness

• A textual object is well-formed if and only if:• It conforms to the grammar defined in the

XML standard• Any textual object it references is well-formed• Examples of fatal flaws: • Start and end tags don’t match, or are missing• More than one root element included

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-17

Validity

• A textual object is valid if and only if it is well-formed and it conforms to a specified DTD

• DTDs can support uniqueness and referential constraints via ID and IDREF attribute types

• These constraints do not function as keys, but can be used to transmit information from one relvar to another

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-18

Limitations of DTDs

• DTDs do not use XML syntax, and they cannot be processed by XML parsers

• Since everything in this arena is a character string, data type support is lacking

• They enforce an ordering of elements that is contra-relational

• They are still beneficial because they enforce a standard that is widely used

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-19

XML Schema

• XML Schema is an XML derivative, and can be interpreted by XML parsers

• Are written using a collection of names, from a name space (http://www.w3.org/2002/XMLSchema)

• The name space specification: xmlns:xsd=“http://www.w3.org/2002/XMLSchema”

• It is considerably more prolix• XML can enforce primitive types and some

derived types• XML types have essentially no operators

because “types” are still character strings

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-20

XML Schema

• XML Schema is an XML derivative, and can be interpreted by XML parsers

• It is considerably more prolix• XML can enforce primitive types and some

derived types• XML types have essentially no operators

because “types” are still character strings

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-21

XML Data Manipulation

• XQuery is based on Xpath, which means that it is a read-only facility for traversing XMLs hierarchical paths

• Because XQuery can report horizontal and vertical subsets, and combine the results, it is said to support “select, project, and join”

• XUpdate is in the early planning stages, but presumably will support updates

• For now, only proprietary solutions to update

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-22

XML and Databases

• Three approaches:• Store XML documents as attributes• Shred documents into attributes• Store XML documents in “XML databases”

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-23

XML Documents as Attributes

• Define a new type, XMLDOC• As a new type, XMLDOC should have

operators defined, that can retrieve like XQuery, and that can check for well-formedness and validity

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-24

XML Documents Shred and Publish

• An XML document may be shredded into its components, which are then stored as attributes

• Attributes can be recombined and published as XML Documents

• This is an effective way for SQL databases to interact with the web

• Relational databases do not store hierarchies, nor are they intrinsically ordered, so shred and publish may not be “nonloss”

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-25

SQL Facilities

• XML Collection will offer support for shred and publish, where the publish feature supports publishing the XML data, and its schema

• XML Column will offer a new built-in type, XML that will come an XMLGEN operator to publish XML documents

• Database vendors offer built-in functions that can read and write elements within XML attribute values, e.g., XMLFILETOCLOB