design issues in xml databases

26
1 Design Issues in XML Databases Ref: Designing XML Databases by Mark Graves

Upload: winona

Post on 25-Feb-2016

39 views

Category:

Documents


2 download

DESCRIPTION

Design Issues in XML Databases. Ref: Designing XML Databases by Mark Graves. Storing data in XML. XML can be stored as a flat file , in an object-oriented database or in a relational database . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design Issues in XML Databases

1

Design Issues in XML Databases

Ref: Designing XML Databases by Mark Graves

Page 2: Design Issues in XML Databases

2

Storing data in XML• XML can be stored as a flat file, in an object-oriented

database or in a relational database.• Flat File Database: A flat file is a simple storage

mechanism and does not provide indexed queries or easy modification of a document.

• The easiest way to store XML data is a single flat file that stores the entire XML document.

• The data in the file is accessible by a variety of text editors and xml tools (eg: parsers).

• Flat file databases (consisting of many files) are a useful alternative.

• In cases where the functionality of a traditional DBMS may not be necessary and when special-purpose operations like text search, or temporal or spatial operations are needed, flat file databases may be a better choice.

Page 3: Design Issues in XML Databases

3

Flat-file Databases• Usually flat file databases will store large

quantities of static data in a flat file database organized with one document per file.

• Flat file databases have the following limitations: – quick access and indexing are difficult.– Long updates and recovery are not efficient.

Page 4: Design Issues in XML Databases

4

Flat-file Databases• Splitting a document into several document fragments may be more efficient

than storing an entire document in a file.• For example, in applications involving many transaction, each transaction

can be stored in a file.• Another approach is to slice a document with an identifier for each slice.• A simple mechanism to link documents is to have a special type name, for

example, “include” with attributes that provide the necessary information to link to another document.

• Example:<include element=“transaction” id=“12345”/>

These places where the document can be sliced into multiple document fragments are called “slice points”.

When the slice points are already determined by the application, the appropriate element type may used directly, such as:

<transaction id=“12345”

In practice, this approach is too inefficient because of time needed to access each file individually.

Page 5: Design Issues in XML Databases

5

Using a Relational Database to store XML data

• Developing a relational schema for XML may be the most practical approach to integrating XML into a high-throughput enterprise.

• There are three approaches that are used with the choice of a Relational Schema:

• Fine-grained• Coarse-grained and• Medium-grained

Page 6: Design Issues in XML Databases

6

Using a Relational Database to store XML data

• Fine-grained approach: every construct in the document is given a unique identity in the relational database.

• Every element, attribute and character data region can be individually accessed, modified or deleted with minimal effect on other document constructs.

• This provides the most flexibility and ease of access both with the XML DBMS-specific operations as well as the traditional relational ones.

• However, regenerating the entire document can be time-consuming when large.

Page 7: Design Issues in XML Databases

7

Using a Relational Database to store XML dataFine-grained Relational Schema-- Create fine-grained storage tables and constraints

create table xdb_doc (doc_id NUMBER(8) NOT NULL,name VARCHAR2(128) NOT NULL,root NUMBER(8)

);alter table xdb_doc add primary key (doc_id);

create table xdb_ele (doc_id NUMBER(8) NOT NULL,ele_id NUMBER(8) NOT NULL,parent_id NUMBER(8),tag VARCHAR2(32) NOT NULL

);alter table xdb_ele add primary key (doc_id, ele_id);

create table xdb_attr (doc_id NUMBER(8) NOT NULL,attr_id NUMBER(8) NOT NULL,ele_id NUMBER(8) NOT NULL,name VARCHAR2(32) NOT NULL,value VARCHAR2(255)

);

Page 8: Design Issues in XML Databases

8

Using a Relational Database to store XML data

create table xdb_child (doc_id NUMBER(8) NOT NULL,ele_id NUMBER(8) NOT NULL,indx NUMBER(6) NOT NULL,child_class VARCHAR2(4) NOT NULL, -- ELE, STR, or TEXTchild_id NUMBER(8) NOT NULL

);alter table xdb_child add primary key (doc_id, ele_id, indx);

create table xdb_str (doc_id NUMBER(8) NOT NULL,cdata_id NUMBER(8) NOT NULL,ele_id NUMBER(8) NOT NULL,value VARCHAR2(255) NOT NULL

);

Page 9: Design Issues in XML Databases

9

Using a Relational Database to store XML data

create table xdb_text (doc_id NUMBER(8) NOT NULL,cdata_id NUMBER(8) NOT NULL,ele_id NUMBER(8) NOT NULL,value LONG NOT NULL

);alter table xdb_text add primary key (doc_id, cdata_id);

-- Foreign Keysalter table xdb_doc add constraint fk_xdb_doc_root foreign key (doc_id, root) references xdb_ele (doc_id, ele_id);alter table xdb_ele add constraint fk_xdb_ele_doc_id foreign key (doc_id) references xdb_doc (doc_id);alter table xdb_attr add constraint fk_xdb_attr_doc_id foreign key (doc_id) references xdb_doc (doc_id);alter table xdb_attr add constraint fk_xdb_attr_ele_id foreign key (doc_id, ele_id) references xdb_ele (doc_id, ele_id);alter table xdb_child add constraint fk_xdb_child_doc_id foreign key (doc_id) references xdb_doc (doc_id);…

Page 10: Design Issues in XML Databases

10

Using a Relational Database to store XML data

-- Search for all elements with a given tag name.select doc_id, ele_idfrom xdb_elewhere doc_id = 1 and tag = 'TagName';

-- Search for all elements with a given attribute name.select doc_id, attr_idfrom xdb_attrwhere doc_id = 1 and name = 'AttrName' and value = 'AttrValue';

-- Search for all elements with a given tag name that has a-- character data child consisting of a given string.select ele.doc_id, ele.ele_idfrom xdb_ele ele, xdb_str strwhere ele.doc_id = 1 and str.doc_id = ele.doc_id and ele.tag = 'TagName' and ele.ele_id = str.ele_id and str.value = 'SearchedForValue';…

Page 11: Design Issues in XML Databases

11

Using a Relational Database to store XML data

Coarse-grained approach: In this approach, a document is stored in its entirety.

• Even though it is similar to storing the documents in flat files, this approach has the advantage of allowing the documents to be referred to within other structures in the database.

• In addition, it provides the security, recovery and other features of a DBMS in which it is stored.

• Its particular usefulness is part of a hybrid representation.

Page 12: Design Issues in XML Databases

12

Using a Relational Database to store XML data

Coarse-grained approach: create table cgrel_doc (doc_id NUMBER(8) NOT NULL,name VARCHAR2(128),body LONG

);alter table cgrel_doc add primary key (doc_id);

Page 13: Design Issues in XML Databases

13

Using a Relational Database to store XML data

Medium-grained approach: The fine-grained approach works well to

perform tasks that access elements, whereas the tasks to store and retrieve an entire document are difficult.

The coarse-grained approach works well in manipulating entire documents but has difficulty with the element manipulation.

The medium-grained approach is a compromise between the fine and coarse-grained approaches.

Page 14: Design Issues in XML Databases

14

Using a Relational Database to store XML data

Medium-grained approach: The document tree can be sliced into sections where the sub-sections are stored with a coarse-grained approach. This is particularly useful if the sections are accessed individually: for example, in reference books such as dictionaries, a medium grained approach would be to store each entry separately.

Page 15: Design Issues in XML Databases

15

Using a Relational Database to store XML data

Medium-grained approach: Determining the slice points is a complex

issue.• How many slice-points are created?• How many levels of slicing are created?• Does the slicing depend upon the element

type name or the depth in the tree or the size of the document section?

• Are some sections of the document sliced more finely than the other sections?

Page 16: Design Issues in XML Databases

16

Using a Relational Database to store XML dataMedium-grained approach:One way to approach the slicing granularity is to view slicing as a

method to index the database.• An index speeds up access for a particular database request by

creating an index table that provides quick navigation of the indexed information.

• Slices ca be created on a element type name (s) for which frequent access is anticipated.

• A combination of element type names and attribute values can also be used to drive the index slice method.

• Choosing slice points based on desired indexing will reduce the data access time over the coarse-grained approach for queries or other accesses that involve the indexed tags.

• Indexes can be created on a few highly requested element types. • If most of the directly accessed element type names are indexed

then the access time for those queries approaches the access time under the fine-grained approach while also reducing the document regeneration time.

Page 17: Design Issues in XML Databases

17

Using a Relational Database to store XML dataMedium-grained approach:• Another way to approach the slicing granularity is

to view the slicing method as a buffering mechanism.

• Slices may be determined by physical characteristics, such as size of the slice.

• The slice size can be chosen to efficiently use network communication protocols to reduce the time needed to retrieve a portion of the document when the network response is a critical factor.

• Combinations of approaches may be used for particular applications or documents.

Page 18: Design Issues in XML Databases

18

Using a Relational Database to store XML dataMedium-grained approach:One point to be addressed is how to represent the

slice points in the document.One mechanism is to create a specific element

type to represent the necessary information, ensuring that the element type name is unique in the document.

For example, an element type called “slice” or “proxy” could be created with attributes that contain sufficient information to reconstruct the document, namely “document-id” and “element-id”.

Page 19: Design Issues in XML Databases

19

Using a Relational Database to store XML dataMedium-grained approach: tables and constraints for medium-grained storage

create table xdb_frag (doc_id NUMBER(8) NOT NULL,frag_id NUMBER(8) NOT NULL,ele_id NUMBER(8) NOT NULL,body LONG NOT NULL

);alter table xdb_frag add primary key (doc_id, frag_id);

create table xdb_frag_ref (doc_id NUMBER(8) NOT NULL,frag_id NUMBER(8) NOT NULL,ele_ref NUMBER(8) NOT NULL

);alter table xdb_frag_ref add primary key (doc_id, frag_id, ele_ref);

alter table xdb_frag add constraint fk_xdb_frag_doc_id foreign key (doc_id) references xdb_doc (doc_id);alter table xdb_frag add constraint fk_xdb_frag_ele_id foreign key (doc_id, ele_id) references xdb_ele (doc_id, ele_id);alter table xdb_frag_ref add constraint fk_xdb_frag_ref_doc_id foreign key (doc_id) references xdb_doc (doc_id);alter table xdb_frag_ref add constraint fk_xdb_frag_ref_ele_ref foreign key (doc_id, ele_ref) references xdb_ele (doc_id, ele_id);alter table xdb_frag_ref add constraint fk_xdb_frag_ref_frag_id foreign key (doc_id, frag_id) references xdb_frag (doc_id, frag_id);

Page 20: Design Issues in XML Databases

20

Relational Data Server• A relational data server (combined with a

web server) makes data from a relational DBMS available in XML.

• It works by querying the database and formatting the report from the RDBMS as XML.

• The basic process consists of the following steps:– The user specifies a relational query to the

web browser as a URL.

Page 21: Design Issues in XML Databases

21

Relational Data Server• The basic process consists of the following steps:

– The user specifies a relational query to the web browser as a URL.

– The web browser sends the URL to the data server.– The data server parses the URL request and creates a SQL

query.– The data server passes the SQL query to the database server.– The database server executes the query.– The database server returns the relational report to the data

server.– The data server formats the report as XML.– The data server returns the XML report to the Web browser.– The web browser parses the XML report and displays it to the

user.When a stylesheet is used, it must be retrieved and parsed by the

web browser.

Page 22: Design Issues in XML Databases

22

Some Issues To be addressed• Can relational views be displayed in addition to

tables?• Can data be updated? Or, is data read-only?• Can data be retrieved from multiple tables?• How are joins handled?• How complex can the mapping from relations to

XML be?• Can helper tables or joining tables be recognized

and handled differently?• Can foreign key constraints be followed? To what

depth?• How are links between tables handled?• How expressive is the query facility?

Page 23: Design Issues in XML Databases

23

• We will briefly examine a relational data server (implemented in java) called rServe.

• Ref: http://www.xwdb.org/docs/XWDB_RServe_User_Guide.pdf

• Some examples of usage:• http://127.0.0.1/servlets/com.xweave.xmldb.rserve.XMLServlet?tablename=

customer&stylesheet=/ss/generic1.xsl – Which retrieves all the records from the customer table, then

formats the data using the stylesheet http://127.0.0.1/ss/generic.xsl.

Page 24: Design Issues in XML Databases

24

Creating a SQL Query• The relational data server uses the parameters

of the URL to build a SQL statement.• Example:• Select * from tablename;• Much of the functionality of a SQL statement can

be encloded as a URL.• The where condition is given by pre-pending the

condition to the value in a name-value pair.• Example:• …&pno=‘p1’&quantity=<=20&

Page 25: Design Issues in XML Databases

25

Formatting a Report as XML• One way to format data as XML is to use

the table names and column names to create the element types.

• Information on primary keys and foreign keys can be used to create a document hierarchy.

• An issue to address about element naming is:– Special characters may be used in table and

column names that are not valid as xml element names.

Page 26: Design Issues in XML Databases

26

Extracting Dictionary Data• A commercial relational DBMS provides

information in its catalog (Data Dictionary) consisting of system tables.

• The system tables contain information about the “metadata” – primary keys, foreign keys and other integrity constraints that are defined on the data.

• The data in a RDBMS can be mapped to XML as a tree of parent/child relationships (represented using primary and foreign keys).