efficiently publishing relational data as xml documents

Post on 30-Dec-2015

39 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Efficiently Publishing Relational Data as XML Documents. Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo. What drove them?. No…it wasn’t the chaffeur… XML rapidly emerging as a global standard Large amount of data stored in RDBMS and needs to be exchanged. Primary Issues. - PowerPoint PPT Presentation

TRANSCRIPT

Efficiently Publishing Relational Data as XML Documents

Jayavel Shanmugasundaram et al.

Proceedings -VLDB 2000, Cairo.

What drove them? No…it wasn’t the chaffeur… XML rapidly emerging as a global

standard Large amount of data stored in

RDBMS and needs to be exchanged

Primary Issues Language specification Implementation – what method

works best? Adding TAG and STRUCTURE -

when do you do these operations?

Roadmap Language specification Implementation

Early tagging, structuring Late tagging, structuring Early structure, late tagging

Performance Evaluation

Our little sample…<customer id=C1> <name> Dilys Thomas </name> <accounts> <account id=A1>34552211233</account> <account id=A2>98653412223</account> <! Dilys – we know where your money is…> </accounts> <porders> <porder id=PO1 acct=A1> <items> <item id=I1>Gift for Consulate VISA Woman</item> <item id=I1>Traveller’s cheques</item> </items> <payments> <payment id=P1> due Feb 12 </payment> <!that’s today!!> </payments> </porder> </porders></customer>

Note the•Elements•Names/Tags•ID Refs

Underlying tablesCustomer(id int, name varchar)

Account(id varchar, custID int, acctnum int)

Item(id int, poID int, desc varchar)

PurchOrder(id int, custID int, acctID varchar, date varchar)

Payment(id int, poID int, desc varchar)

SQL-based language spec.

Sqlfunctions: Define XMLConstruct ITEM(id int, desc varchar) AS {

<item id=$id>$desc </item>

}

Sqlaggregates: Select XMLAGG(ITEM(id, desc))

From Item

// returns an XML aggregation of items

Sample querySelect cust.name, CUST(cust.id, cust.name, (Select XMLAGG(ACCT(acct.id, acct.acctnum)) From Account acct Where acct.custId=cust.id), (Select XMLAGG(PORDER(porder.id, porder.acct, porder.date, (Select XMLAGG(ITEM(item.id, item.desc)) From Item item Where item.poid=porder.id) (Select XMLAGG(PAYMENT(pay.id,pay.desc)) From Payment pay, Where pay.poid=porder.id))) From PurchOrder porder Where porder.custID=cust.id))From Customer cust

Constructs XML from the relational tables.

Roadmap Language specification Implementation

Early tagging, structuring Late tagging, structuring Early structure, late tagging

Performance Evaluation

Implementation alternatives

Late/early tagging Late/early structuring (No late structuring+early tagging)

DB Result

TAG STRUCTURE

Early tagging and structuring Stored Procedure

Explicitly issue nested queries Get corresponding nested data using

other queries Done outside relational engine. Tag/str

as soon as results are available. Too many queries per tuple. Fixed order (nested loop join)

Contd… Correlated CLOB

Push queries into the engine Plug in XMLAGG, XMLCONSTRUCT support into

engine Have to handle huge CLOBS in the engine Fixed join order

Decorrelated CLOB Decorrelate and use Outer Joins – no longer

fixed order Still carry around CLOBs (due to early

tagging!)

Roadmap Language specification Implementation

Early tagging, structuring Late tagging, structuring Early structure, late tagging

Performance Evaluation

Late tagging and Structuring2 phases -> Content creation +

Tagging/Structuring Redundant Relation:

Blindly join all constituent tables ‘Parent’ data repeated

Unsorted Outer Union: Decorrelate query, compute common

subexpressions and use Outer Joins Take an Outer Union of result tables Columns grow with width/depth of XML

doc. - Path Outer Union

Contd… Alternatively, don’t repeat Parent

node at every child. Feed parent into Outer Union and

only keep parent Ids with children. – Node Outer Union

Greatly increases no of tuples generated

Outer UnionNote: Outer Joins to retain parents.

OU• Separate column in result for each column of input• Unused cols set to NULL• Type column added for each row.

Contd…2 phases -> Content creation +

Tagging/Structuring Inside the Engine

XMLAGG, XMLCons support required Final step after content generated CLOBs not carried around

Outside the Engine GROUP Siblings Eliminate Duplicates Extract info and TAG

Grouping data

HASH!Every row in the final table has a column with name of

element with all parents (a.b.c.d.e) Check if <a.b.c.d> hashes true, TAG accordingly and

add as another child at that level Else check if <a.b.c> hashes true, add <a.b.c.d> and

then <a.b.c.d.e> And so on…till you either find hash or hit root element. Tuples can come in any order. Sufficient mem

required!

Roadmap Language specification Implementation

Early tagging, structuring Late tagging, structuring Early structure, late tagging

Performance Evaluation

Late tagging, Early Structuring As before…only, now SORT the

outer union Ensure

Parent info comes before child Info about node and desc. completed

before any other node info starts Ordering follows user-def condns

Sort and TagSort on Pkeys Define an order on Pkeys (CustID, AcId, POId,

ItemId, PaymentID) – based on structure of XML Doc.

Parent tuples will have filled values for first few cols and null for the later ones

Nulls sort lowTag in constant memory Maximum amount of info to be stored is

proportional only to the depth of the XML Doc.

Roadmap Language specification Implementation

Early tagging, structuring Late tagging, structuring Early structure, late tagging

Performance Evaluation

Modeling transformations Query fanout Query nesting depth

# Root nodes (tuples in the root table)

# Leaf tuples (tuples corresp to all leaf nodes)

Structure

Result size

Results (graph time!!)

• “Inside the engine” versions are about 3 times as fast as “outside” counterparts

Inside –vs- Outside the engine

• Query Execution• Bind out • Tag/Structure• Write XML to file

Bind-out time not required for Inside Engine approaches

Query Fan out Increasing QFO -> Greater Joins ->

More time CorrCLOB has to use Nested Loop Join

Order – bad performance Unsorted OU better than Sorted OU.

Sorting cost > Cost of complex tagging DecorrCLOB – optimized by DB2

engine. CLOBs retained in memory (low fanout)

Query DepthDecorrCLOB – huge increase!• Complexity of queries increases. • Engine makes bad choices (sorting after XMLAGG etc)

Number of Roots• Outer Union approaches not affected• CorrCLOB at #root=1 equiv to just 2 queries!

Number of tuples, Memory If sufficient memory, no great

changes! If not, Unsorted OU which requires

large space for tagging, fails. Overflow!

Sorted OU – based on scalable sorting. Adapt to large size and less mem better.

Roadmap Language specification Implementation

Early tagging, structuring Late tagging, structuring Early structure, late tagging

Performance Evaluation Quo vadis?

top related