2/6/05salman azhar: database systems1 xml salman azhar semi-structured data xml (extensible markup...

58
2/6/05 Salman Azhar: Database Sy stems 1 XML Salman Azhar Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs These slides use some figures, definitions, and explanations from Elmasri-Navathe’s Fundamentals of Database Systems and Molina-Ullman-Widom’s Database Systems

Post on 19-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

2/6/05 Salman Azhar: Database Systems

1

XML Salman Azhar

Semi-structured DataXML (Extensible Markup Language)

Well-formed and Valid XMLDocument Type Definitions

IDs and IDREFs

These slides use some figures, definitions, and explanations from Elmasri-Navathe’s Fundamentals of Database Systems

and Molina-Ullman-Widom’s Database Systems

2/6/05 Salman Azhar: Database Systems

2

Framework

1. Information Integration : Making databases from various places

work as one.

2. Semi-structured Data : A new data model designed to cope with

problems of information integration.

3. XML : A standard language for describing semi-

structured data schemas and representing data.

2/6/05 Salman Azhar: Database Systems

3

1. Information Integration Generally databases in an enterprises

have: Several underlying database management

systems Oracle, MS SQL Server, DB2, Informix, Sybase (SQL Server), MS

Access, etc. Several underlying database schemas

Information in an employee table can contain Employee Name, SSN, DOB, title, hrsPerWeek.

modifiedTime, modifiedBy Employee Name, SSN, DOB, title, degree, createTime,

createBy Employee Name, SSN, DOB, title, salary, modifiedTime,

modifiedBy, createTime, createBy

2/6/05 Salman Azhar: Database Systems

4

2. Semi-structured Data A new data model designed to

cope with problems of information integration

Accommodates of different DBMS Oracle, MS SQL Server, DB2, Informix, Sybase (SQL Server), MS

Access, etc.

Integrates different schemas Employee Name, SSN, DOB, title, hrsPerWeek, modifiedTime,

modifiedBy Employee Name, SSN, DOB, title, degree, createTime, createBy Employee Name, SSN, DOB, title, salary, createTime, createBy,

modifiedTime, modifiedBy

2/6/05 Salman Azhar: Database Systems

5

3. XML A standard language for

describing semi-structured data schemas and representing data.

2/6/05 Salman Azhar: Database Systems

6

The Information-Integration Problem

Major bottleneck in enterprise application integration

For example… Hewlett Packard split into HP and

Agilent Need to separate data into different

destinations HP bought Compaq

Need to integrate data from different sources

2/6/05 Salman Azhar: Database Systems

7

The Information-Integration Problem

Related data exists in many places and could, in principle, work together.

But different databases differ in:1. Model

relational, object-oriented?2. Schema

normalized/denormalized?3. Terminology

are consultants employees? Retirees? Subcontractors?

4. Conventions meters versus feet?

2/6/05 Salman Azhar: Database Systems

8

Example Consider merger of two stores in a

Mall may be some overlap in the products

sold but the databases are different

2/6/05 Salman Azhar: Database Systems

9

Example Each company has a database

One may use a relational DBMS the other keeps the data in an MS-Word

document One stores the phones of distributors,

the other does not One distinguishes products in one

department the other doesn’t

One counts inventory by number of items, the other by cases

2/6/05 Salman Azhar: Database Systems

10

Two Approaches to Integration

1. Warehousing Makes a copy of the data

More developed of the two

2. Mediation Creates a view of the data

Newer and less developed

2/6/05 Salman Azhar: Database Systems

11

Warehouse Diagram

Warehouse

Wrapper Wrapper

Source 1 Source 2

User query Result

2/6/05 Salman Azhar: Database Systems

12

A Mediator

Mediator

Wrapper Wrapper

Source 1 Source 2

User query

Query

Query

QueryQuery

Result

Result

Result

Result

Result

2/6/05 Salman Azhar: Database Systems

13

Warehousing Make copies of the data sources at a

central site and transform it to a common schema

Reconstruct data daily/weekly Do not try to keep it more up-to-date than that. Pro:

very well-developed several commercial tools are available

Con: data can be old since updates are expensive 24-hour availability threatened by large data updates

2/6/05 Salman Azhar: Database Systems

14

Mediation Create a view of all sources, as if they

were integrated Answers a view query by translating it to

terminology of the sources and querying them

Pro: Current data

Con: Can be slow as it requires real time merger of

different data sources Lack of tools available

2/6/05 Salman Azhar: Database Systems

15

Warehouse Diagram

Warehouse

Wrapper Wrapper

Source 1 Source 2

User query Result

2/6/05 Salman Azhar: Database Systems

16

A Mediator

Mediator

Wrapper Wrapper

Source 1 Source 2

User query

Query

Query

QueryQuery

Result

Result

Result

Result

Result

2/6/05 Salman Azhar: Database Systems

17

Semi-structured: Motivation

Most effective approach to Information Integration: Semi-structured Data Model or Semi-structured Objects

2/6/05 Salman Azhar: Database Systems

18

Semi-structured: Motivation

Main limitation of Object-Oriented Models: Object Models are Strongly Typed

Objects of a class have one structure only

Semi-structured approach solves this problem

2/6/05 Salman Azhar: Database Systems

19

Semi-structured Data Purpose:

Represent data from independent sources more flexibly than

either relational or object-oriented models

2/6/05 Salman Azhar: Database Systems

20

Semi-structured Data Each object has a class of their

own and properties are defined whatever labels are attached to that object Properties mean

attributes, relationships, methods, etc.

2/6/05 Salman Azhar: Database Systems

21

Semi-structured Data Think of objects

but with the type of each object is the objects its own business

not that of its “class”

Labels to indicate meaning of substructures

2/6/05 Salman Azhar: Database Systems

22

Semi-structured Graphs

Easy to think of Semi-structured data as Graphs Nodes = objects Labels on arcs =

attributes leading to a leaf node relationships leading to another node

2/6/05 Salman Azhar: Database Systems

23

Semi-structured Graphs

Atomic values at leaf nodes nodes with no arcs out

Flexibility: no restriction on… labels out of a node number of successors with a given

label

2/6/05 Salman Azhar: Database Systems

24

Example: Data Graph

Pepsi

PepsiCo

BestSeller2003

Main StKFC

Sobe

soda sodarest

manfmanf

sellsAt

name

namename

addr

prize

year award

root

The soda object for Pepsi

(arc-in called soda;

arc-out called name to Pepsi)

Notice anew kindof data.

Root object represents the entire DB. Often look like trees, but are not.

The restaurant object for

KFC (arc-in called rest;

arc-out labeled name to

KFC)

2/6/05 Salman Azhar: Database Systems

25

Stage is Now Set for XML

A technology has application to different situations foundations remain the same applications changes

2/6/05 Salman Azhar: Database Systems

26

Extensible Markup Language (XML)

XML uses tags for semantics (e.g., “this is an

address”) HTML

uses tags for formatting (e.g., “italic”), Key idea:

create tag sets for a domain (e.g., genomics) translate all data into properly tagged XML

docs

2/6/05 Salman Azhar: Database Systems

27

Well-Formed and Valid XML Well-Formed XML

allows you to invent your own tags similar to labels in semi-structured data graph

Valid XML involves a DTD (Document Type Definition) DTD gives

a grammar for the use of labels limits the set of labels our of node the order and number of times a label occurs

2/6/05 Salman Azhar: Database Systems

28

Well-Formed XML All XML documents have

Header Body

Header defines version specifies that the document is in well-

formed XML Body can include

root tag several properly matching tags

2/6/05 Salman Azhar: Database Systems

29

Well-Formed XML: Header

Start the document with a declaration surrounded by <? … ?> .

Normal declaration for Well-Formed XML is:

<? XML VERSION = “1.0” STANDALONE = “yes” ?>

Version indicates version number Standalone = “yes” means no DTD

no DTD means well-formed XML

2/6/05 Salman Azhar: Database Systems

30

Well-Formed XML: Body

Body of document is a root tag surrounding nested tags. Body can include:

several properly matching tags (as in html structure)

special tag called root tag can have a special meaning such as document

type or can be generic

2/6/05 Salman Azhar: Database Systems

31

Tags

Tags, as in HTML are normally matched pairs, as

<BLAH> … </BLAH> may be nested arbitrarily some tags requiring no matching

ending such as <P> in HTML, are also permitted however, we will not use these in

examples

2/6/05 Salman Azhar: Database Systems

32

Example: Well-Formed XML

<? XML VERSION = “1.0” STANDALONE = “yes” ?>

<RESTS><REST>

<NAME>Taco Bell</NAME><SODA><NAME>Pepsi</NAME>

<PRICE>1.00</PRICE></ SODA>

<SODA><NAME>Sobe</NAME><PRICE>2.00</PRICE></

SODA></REST ><REST> …</REST >…

</RESTS>

Root tag RESTS surrounds the

entire document

One of several nested REST tags

representing information about a

single REST<NAME> tag

specifies the REST name

<SODA> tags have names and price for

each Soda nested in

<NAME> and <PRICE> tags

Literal Data items are

contained at the atomic level

2/6/05 Salman Azhar: Database Systems

33

XML and Semi-structured Data Consider this…

Is Well-Formed XML documents with nested tags is exactly the same idea as trees of semi-structured data?

Tags are the labels on edges

Nodes represent data between matching tags

Parent-child relationship is immediate nesting in XML

2/6/05 Salman Azhar: Database Systems

34

XML and Semi-structured Data

Semi-structured approach allows for non-tree structures

We shall see that XML also enables non-tree structures mimics the semi-structured data

model

2/6/05 Salman Azhar: Database Systems

35

Group Exercise

Convert the following into a Semi-structured representation<? XML VERSION = “1.0” STANDALONE = “yes”

?><RESTS>

<REST><NAME>Taco Bell</NAME><SODA><NAME>Pepsi</NAME>

<PRICE>1.00</PRICE></ SODA>

<SODA><NAME>Sobe</NAME><PRICE>2.00</PRICE></

SODA></REST ><REST> …</REST >…

</RESTS>

Note: Do not turn over to the next page before

attempting this exercise yourself!

2/6/05 Salman Azhar: Database Systems

36

Solution:The semi-structured representation

Taco Bell

Pepsi 1.00 Sobe 2.00

PRICE

REST

REST

RESTS

NAME . . .

REST

PRICENAME

SODASODA

NAME

Note: Data is stored in leaf

nodes and structure (tags) in

internal nodes

<? XML VERSION = “1.0” STANDALONE = “yes” ?><RESTS> <REST> <NAME>Taco Bell</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> </REST > <REST> … </REST > …</RESTS>

2/6/05 Salman Azhar: Database Systems

37

Valid XML Switching gears: Well-formed to Valid XML

Valid XML is the most interesting use of XML Essentially a context-free grammar for

describing XML tags and their nesting Specified by DTD

Each domain of interest creates one DTD that describes all the documents this group will share

For example, electronic components, travel industry, etc., will have their own DTDs

2/6/05 Salman Azhar: Database Systems

38

DTD Structure

<!DOCTYPE <root tag> [<!ELEMENT <name> ( <components> )

<more elements>]>

Note: !DOCTYPE is key word with <root tag>

being the name of DOCTYPE

Between [ … ] list of ELEMENT definition

Each !ELEMENT has a <name> with the allowed list of <components> usually in

the order listed

2/6/05 Salman Azhar: Database Systems

39

DTD Elements

Element definition consists of its name (tag) and a parenthesized description of any

nested tags includes order of subtags and their multiplicity (0, 1, or many times)

Leaves (text elements) have #PCDATA in place of nested tags

2/6/05 Salman Azhar: Database Systems

40

Example: DTD

<!DOCTYPE RESTS [<!ELEMENT RESTS (REST*)><!ELEMENT REST (RNAME, SODA+)><!ELEMENT NAME (#PCDATA)>

]>

RESTS can have * (0 or more) REST

REST has NAME and then + (1 or more) SODA… Order matters!

NAME and PRICE are data

(#PCDATA): No more tags just

text

SODA has NAME followed PRICE

SODA’s NAME and PRICE are data (#PCDATA)

GROUP EXERCISE: COMPLETE THE DTD

Note: Do not turn over to the next page before attempting this exercise yourself!

2/6/05 Salman Azhar: Database Systems

41

Example: DTD

<!DOCTYPE RESTS [<!ELEMENT RESTS (REST*)><!ELEMENT REST (RNAME, SODA+)><!ELEMENT NAME (#PCDATA)><!ELEMENT SODA (NAME, PRICE)>

<!ELEMENT NAME (#PCDATA)><!ELEMENT PRICE (#PCDATA)>

]>

RESTS can have * (0 or more) REST

REST has NAME and then + (1 or more) SODA… Order matters!

NAME and PRICE are data

(#PCDATA): No more tags just

textSODA has NAME followed PRICE

2/6/05 Salman Azhar: Database Systems

42

Element Descriptions Rules

Subtags must appear in order shown A tag may be followed by a symbol to

indicate its multiplicity: Identical to UNIX regular expressions. * = zero or more. + = one or more. ? = zero or one.

Alternative sequences of tags can be connected by the symbol |

2/6/05 Salman Azhar: Database Systems

43

Example: Element Description

A name is Either an optional title (e.g., “Dr.”), a

first name, and a last name, in that order,

or it is an IP address

<!ELEMENT NAME (

(TITLE?, FIRST, LAST) | IPADDR

)>

Alternative symbol

2/6/05 Salman Azhar: Database Systems

44

Use of DTDs

In order to specify a document follows a particular DTD

1. Set STANDALONE = “no”a) Either include the DTD as a preamble of

the XML documentb) Follow DOCTYPE and the <root tag> by

SYSTEM and a path to the file where the DTD is stored

2/6/05 Salman Azhar: Database Systems

45

Example (a)<? XML VERSION = “1.0” STANDALONE = “no” ?><!DOCTYPE RESTS [

<!ELEMENT RESTS (REST*)><!ELEMENT REST (NAME, SODA+)><!ELEMENT NAME (#PCDATA)><!ELEMENT SODA (NAME, PRICE)><!ELEMENT NAME (#PCDATA)><!ELEMENT PRICE (#PCDATA)>

]>

<RESTS><REST>

<NAME>Taco Bell</NAME><SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></

SODA><SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA>

</REST ><REST> …</REST >…

</RESTS>

DTD

Document

Same as earlier but this time it conforms to the above DTD

2/6/05 Salman Azhar: Database Systems

46

Example (b) Assume the RESTS DTD is in file

rest.dtd<? XML VERSION = “1.0” STANDALONE = “no” ?><!DOCTYPE Rests SYSTEM “rest.dtd”><RESTS>

<REST><NAME>Taco Bell</NAME><SODA><NAME>Pepsi</NAME>

<PRICE>1.00</PRICE></ SODA><SODA><NAME>Sobe</NAME>

<PRICE>2.00</PRICE></SODA></REST ><REST> …</REST >…

</RESTS>

Get the DTD from the file rest.dtd

DocumentSame as

earlier but this time it conforms to the DTD in

rest.dtd

2/6/05 Salman Azhar: Database Systems

47

Attributes Attributes are another important

component of DTD and XML docs Opening tags in XML can have

attributes like <A HREF = “…”> in HTML

In DTD <!ATTLIST <elementname>… > gives a list of attributes and their data types

for this element

2/6/05 Salman Azhar: Database Systems

48

Example: Attributes Rests can have an attribute kind

which is either qsr, family, or other. The element definition is unchanged However, we add an ATTLIST. <!ELEMENT REST (NAME SODA*)>

<!ATTLIST REST kind “qsr” |

“family” | “other”>

2/6/05 Salman Azhar: Database Systems

49

Example: Attribute Use In a document that allows REST tags, we

might see:<REST kind = “qsr”>

<NAME>KFC</NAME>

<SODA><NAME>Pepsi</NAME>

<PRICE>1.00</PRICE></SODA>

...

</REST>

New info: kind = “qsr”

2/6/05 Salman Azhar: Database Systems

50

IDs and IDREFs Introduce links from one object to

another Allows the structure of an XML

document to be a general graph rather than just a tree.

These are pointers from one object to another in analogy to HTML’s NAME = “blah” and

HREF = “#blah”

2/6/05 Salman Azhar: Database Systems

51

Creating IDs

We give an element Elephant an attribute Attention of type ID in the DTD

When using tag <Elephant> in an XML document, give its attribute Attention a unique value. For example,

<Elephant Attention = “213”>

2/6/05 Salman Azhar: Database Systems

52

Creating IDREFs

IDREFs are similar to IDs: To allow objects of type Fig to refer to

another object with an ID attribute, give Fig an attribute of type IDREF (single

string of type ID) Or, let the attribute have type IDREFS,

so the Fig –object can refer to any number of other objects (any number strings of type ID).

2/6/05 Salman Azhar: Database Systems

53

Example: IDs and IDREFs Let us redesign our RESTS DTD to include

both REST and SODA sub-elements Both rests and sodas will have ID attributes

called name Rests have PRICE sub-objects,

consisting of a number (the price of one soda) and an IDREF theSoda leading to that soda

Sodas have attribute soldBy, which is an IDREFS leading to all the rests that sell it

2/6/05 Salman Azhar: Database Systems

54

The DTD

<!DOCTYPE Rests [<!ELEMENT RESTS (REST*, SODA*)><!ELEMENT REST (PRICE+)>

<!ATTLIST REST name ID><!ELEMENT PRICE (#PCDATA)>

<!ATTLIST PRICE theSoda IDREF><!ELEMENT SODA ()>

<!ATTLIST SODA name ID, soldBy IDREFS>

]>

RESTS have 0+ REST and 0+

SODA

REST objects have name as an ID attribute and have one or more PRICE sub-

objectsPRICE

objects have a

number (the price) and

one reference to

a soda Soda objects have an ID attribute called name,and a soldBy attribute that is a set of Rest

names

2/6/05 Salman Azhar: Database Systems

55

Example Document

<RESTS><REST name = “Taco Bell”>

<PRICE theSoda = “Pepsi”>1.00</PRICE>

<PRICE theSoda = “Sobe”>2.00</PRICE></REST> …<SODA name = “Pepsi”, soldBy = “KFC,

TacoBell,…”></SODA> …

</RESTS>

<!DOCTYPE Rests [<!ELEMENT RESTS (REST*, SODA*)><!ELEMENT REST (PRICE+)>

<!ATTLIST REST name ID><!ELEMENT PRICE (#PCDATA)>

<!ATTLIST PRICE theSoda IDREF><!ELEMENT SODA ()>

<!ATTLIST SODA name ID, soldBy IDREFS>

]>

2/6/05 Salman Azhar: Database Systems

56

Recap

Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs

2/6/05 Salman Azhar: Database Systems

57

Perspective

Here XML is used as a EDI medium EDI = electronic data interchange

There are many other using for XML Each has its own utilization

2/6/05 Salman Azhar: Database Systems

58

Questions?

Questions???

Doesn’t mean you will get all the answers!