Download - XML: Data Driving Business?
![Page 1: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/1.jpg)
XML: Data Driving Business?
Laks V.S.Lakshmanan,
IIT Bombay and Concordia University
![Page 2: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/2.jpg)
XML : Data Model
• What is an XML Document– Linearization of a tree structure– Every node of the tree can have several character
strings associated– Info content of the document is the tree structure
together with the character strings
Is XML just a syntax for data interchange and serialization?
![Page 3: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/3.jpg)
XML: Data Model
Types of nodes Element Eg. <p a1="A1" . . . an="An">c1 . . . cm</p>
Document Eg. <!DOCTYPE name [markedupdeclarations]>
Processing instruction Eg. <?xml version=“1.0”? >
Comment Eg. <!--This is a comment-->
Atomic data Eg. <Data>
![Page 4: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/4.jpg)
What is a DTD?
• Document Type Definition(DTD) serves as grammar
• A document type definition specifies:
– the elements that are permissible in a document of this type
– for each each element the possible attributes, their range of values and defaults
– for each element, the structure of its contents, including:
• which element can occur and in what order
• whether text characters can occur
![Page 5: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/5.jpg)
Example of a DTD
Eg:<!DOCTYPE> Bookslist[
<!ELEMENT Bookslist (book)*><!ELEMENT book
(title,author*,publisher)><!ELEMENT title (#PCDATA)><!ELEMENT author(#PCDATA)><!ELEMENT publisher(#PCDATA)>
]
![Page 6: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/6.jpg)
XML and DTD
• Well formed documents– Tags should be nested properly and attributes should be
unique.
• Valid documents– Well formed documents that confirm to a Document
Type Definition(DTD)
• DTDs are used– Constrain structure
– Declare entities
– Provide some default values for attributes
![Page 7: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/7.jpg)
DTD Limitations
• too much document oriented• too simple and too complicated at the same time• too limited to represent complex structures• IDREFs are not typed• No notion of inheritance/sub-typing• too many ways to represent the same thing• names are global, not locals
![Page 8: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/8.jpg)
DTD vs. Database Schema
• Order is of significance in DTD and not in DB• DTD does not provide for data types• DTD cannot specify keys
![Page 9: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/9.jpg)
XMLSchema
• Why XMLSchema – Based on XML syntax– Can be parsed and manipulated like any XML
document– Supports variety of data types– Allows extensions of vocabularies and inherit from
elements– Provides namespace integration – Provides logical grouping of attributes
![Page 10: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/10.jpg)
XMLSchema: An example
<datatype name="PriceType"> <basetype name="decimal"/> <minExclusive>0.00</minExclusive> <scale>2</scale></datatype><element name="price" type="PriceType"></element>
<element name='Person'> ... </element>
<element name='Employee'>
<refines name='Person'/> ...
</element>
![Page 11: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/11.jpg)
XMLSchema vs. DTDDTD XMLSchema
Syntax Specialized Same as XML
Compactness Compact Verbose
Data types Strings Variety of types
Data model Closed Open
Namespaceintegration
Primitive Full fledged
Attributegrouping
Not supported Supported
![Page 12: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/12.jpg)
XML Data
• Superset of XMLSchema
• Can express Database relationships too..
• Eg: <elementType id="booktable">
<element id="titleID" type="#title”/>
<element type="#author”/>
<element type="#pages”/>
<key id="bookkey"> <keyPart href="#titleID"/> </key> </elementType>
![Page 13: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/13.jpg)
Semistructured data
• Data that is neither raw nor very strictly typed like in databases
• Examples of semistructured data– Html file with one entry per restaurant that
provides info on prices, addresses, styles – BibTex files– Genome and scientific databases– Online documentation
![Page 14: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/14.jpg)
Semistructured data: Main aspects
• Structure– Irregular– Implicit– Partial
• Schema– Very large– Rapidly evolving– Distinction between data and schema is blurred
![Page 15: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/15.jpg)
Semistructured data:Data model
• Object Exchange Model(OEM)– Lightweight and flexible– Data representation
• As a graph with objects as vertices and labels on edges
• Each object has a unique object identifier
• Some objects are atomic, e.g., integer, real,…
• Complex objects have value as set of object references
![Page 16: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/16.jpg)
OEM: An example
![Page 17: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/17.jpg)
Semistructured data: Query Languages
• Lorel– Based on OQL– Eg.,
• Select author:X
from biblio.book.author X
• Computes the set of book authors
• Forms a new node and connects it with edges labelled author to nodes resulting from evaluation of the path expression
![Page 18: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/18.jpg)
Lorel: Salient features
• Coercion• force comparison operators to handle comparisons
between objects of different types like between string and integer
• Eg.Select row:X
from biblio.paper X
where X.year=1998
Comment:
==>Year could have been string or integer
![Page 19: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/19.jpg)
Lorel: Salient Features• Path expressions
• Data model allows arbitrary nesting
• Queries should hence be able to probe arbitrary depth
• Provided by path expressions
• Eg.
select title:t
from chapter(.section)* s, s.title t
where t like "*XML*"
![Page 20: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/20.jpg)
UnQL• Based on Edge labeled Graph Model• Coercion not supported
• More precise knowledge of data needed
• Pattern Usage– Eg.
Select title: X
where {biblio: {paper: {title: X, year:Y}}}
in db, Y>1998
![Page 21: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/21.jpg)
UnQL• Path variables
– Can use path too as data– Eg.
Select @P
from db1 @P.X
where matches(“.*(U|u)biquitin.*”,X)
==>To determine where string “ubiquitin” appears in db1
![Page 22: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/22.jpg)
Semistructured vs. XML• Both are schema-less, self-describing
• XML is ordered and semistructured data is not
• XML can mix text and elements:– XML has lots of other stuff: entities, processing
instructions, comments
![Page 23: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/23.jpg)
Requirements of an XML Query Language
• XML Output• Server-side processing• Query operations
– Selection, Extraction, Reduction, Restructuring, Combination
• No schema required• Exploit available schema• Preserve order and association• Programmatic Manipulation
![Page 24: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/24.jpg)
Requirements of an XML Query Language
• XML representation• Mutual embedding with XML• XLink and XPointer cognizant• Support for new data types• Suitable for metadata
![Page 25: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/25.jpg)
XML Query Languages• XQL
• XML-QL
• Quilt
![Page 26: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/26.jpg)
XQL• Simple expressions
•//product[@maker='BSA'] : All products with attribute maker ‘BSA’
• Filters•author/address[@type='email']: Address nodes with attribute type as email
• Subscripts•section[1,3 to 5]: Nodes with position 1,3,4,5
![Page 27: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/27.jpg)
XQL• Supports boolean and set operators
•q1 and q2
•q1 union q2
• Grouping•//invoice{q1} : Using invoice groups the results of q1
• Sequence •a before b
• Others : node(), text(), ...
![Page 28: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/28.jpg)
XQL: Limitations• Flattening
– As the results of patterns and filters are not modeled by an intermediate relation
• Restructuring– As flattening not permitted cannot restructure
• Tag variables– Not supported
• Sorting
![Page 29: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/29.jpg)
XML Query Languages• XQL
• XML-QL
• Quilt
![Page 30: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/30.jpg)
XML-QL• Simple examples
WHERE <book> <publisher>
<name>Addison-Wesley</name> </publisher>
<title> $t</title> <author> $a</author> </book> IN "www.a.b.c/bib.xml"CONSTRUCT
<result> <author>$a</author>
<title>$t</title> </result>
![Page 31: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/31.jpg)
XML-QL• Grouping
WHERE <book> $p </> IN "www.a.b.c/bib.xml", <title > $t </>, <publisher>
<name>Addison-Wesley</> </publisher> IN $p
CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN $p CONSTRUCT <author> $a</> </>
==> Groups by title.
![Page 32: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/32.jpg)
XML-QL• Tag variables
WHERE <$p> <title> $t </title> <year>1995 </> <$e> Smith </> </> IN "www.a.b.c/bib.xml", $e IN {author, editor}
CONSTRUCT <$p> <title> $t </title> <$e> Smith </> </>
==> List of books where Smith could be either author or editor
![Page 33: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/33.jpg)
XML-QL• Regular Path Expressions
WHERE <part*> <name>$r</> <brand>Ford</>
</> IN "www.a.b.c/bib.xml"CONSTRUCT <result>$r</>
==> Gets list of names of parts irrespective of the nesting of parts in the document.
![Page 34: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/34.jpg)
XML-QL• Skolem functions
WHERE <$> <author> <firstname> $fn </> <lastname> $ln </> </> <title> $t </> </> IN "www.a.b.c/bib.xml",CONSTRUCT <person ID=PersonID($fn, $ln)> <firstname> $fn </> <lastname> $ln </> <publicationtitle> $t </> </>
==> PersonID is a Skolem function
Generates new id for distinct value of ($fn,$ln) else appends to existing node.
![Page 35: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/35.jpg)
XML-QL• Allows integrating data from multiple
sources
• Can query order as well
• Provides for embedding query within data
• Allows function definitions
• Is relationally complete
![Page 36: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/36.jpg)
XML-QL• Is everything fine?
– Pattern specifications are too verbose– Result of the WHERE clause is a relation
composed of scalar values• So cannot preserve information about hierarchy and
sequence
• Can hence not handle hierarchy and sequence related queries
![Page 37: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/37.jpg)
XML Query Languages• XQL
• XML-QL
• Quilt
![Page 38: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/38.jpg)
Quilt• Combines strengths of XML-QL and XQL
• Derives ability to navigate and select nodes based on sequence from XQL
• Binding of variables done like in XML-QL
![Page 39: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/39.jpg)
Quilt• An example
FOR $b in //book
WHERE exists($b/title) AND
NOT exists($b/author)
RETURN $b/title
==> Lists those titles of those books which do not have author info
![Page 40: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/40.jpg)
Quilt XML Input
FOR/LET
Tuples of bound var. WHERE
Tuples selected
RETURN
XML Output
Flow of data in a quilt expression
![Page 41: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/41.jpg)
Quilt: Filtering Documents• Need to preserve the relationships among
selected elements
• Eg:C
CB
C
B
AA
A C B
B
B A
A
BA
filter = A|B
![Page 42: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/42.jpg)
Quilt• Can perform Sorting
• Aggregation provided
• Allows recursive functions
![Page 43: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/43.jpg)
Quilt: The real power of it• Sample document
<section>
<section.title>Procedure</section.title> The patient was taken to the operating room where she was placed in a supine position and <Anesthesia>induced under general anesthesia. </Anesthesia> <Prep> <action>Foley catheter was placed to decompress the bladder</action> and the abdomen was then prepped and draped in sterile fashion. </Prep> <Incision> A curvilinear incision was made <Geography>in the midline immediately infraumbilical</Geography> and the subcutaneous tissue was divided <Instrument>using electrocautery.</Instrument> </Incision> The fascia was identified and <action>#2 0 Maxon stay sutures were placed on each side of the midline.</action> <Incision> The fascia was divided using <Instrument>electrocautery</Instrument> and the peritoneum was entered. </Incision> <Observation>The small bowel was identified</Observation> and <action> the <Instrument>Hasson trocar</Instrument> </action>
:
</section>
![Page 44: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/44.jpg)
Quilt: The real power of it• In each section with title "Procedure", what Instruments were used in
the second Incision?FOR $s IN //section[section.title="Procedure"]
RETURN ($s//Incision)[2]/Instrument
• In each section with title "Procedure", what are the first two instruments to be used?
FOR $s IN //section[section.title="Procedure"]
RETURN ($s//Instrument)[1-2]
![Page 45: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/45.jpg)
Quilt: The real power of it• In the first procedure, what happened between the first incision and
the second incision?
FOR $proc IN //section[section.title="Procedure"][1],
$bet IN $proc//((* AFTER ($proc//incision)[1]) BEFORE ($proc//incision)[2]) RETURN $bet
![Page 46: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/46.jpg)
XML Storage• Text files
• Simple
• Would require special purpose query processor
• Relational databases• Ternary relations [Florescu et al]
• Inlining methods [Shanmugasamudram et al]
• STORED [Mary Fernandez]
![Page 47: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/47.jpg)
XML Storage• Object Oriented databases[Sophie Cluet et al]
• Native storage
![Page 48: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/48.jpg)
XML Storage• Using Ternary relations
• Edge labels are maintained in a table with the object ids that the edge connects
• Value of leaf nodes are stored using yet another table
![Page 49: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/49.jpg)
&o1
&o3
&o2
&o4 &o5
paper
title author authoryear
&o6
“The Calculus” “…” “…” “1986”
Store XML in Ternary Relation
S o u r c e L a b e l D e s t
& o 1 p a p e r & o 2& o 2 t i t l e & o 3& o 2 a u t h o r & o 4& o 2 a u t h o r & o 5& o 2 y e a r & o 6
N o d e V a l u e
& o 3 T h e C a l c u l u s& o 4 …& o 5 …& o 6 1 9 8 6
Ref
Val
![Page 50: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/50.jpg)
XML Storage• DTDs converted into DTD graph
• Inlining methods• Basic inlining
• Shared inlining
• Hybrid inlining
![Page 51: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/51.jpg)
Corresponding DTD graph
![Page 52: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/52.jpg)
Element graph for Editor Element
![Page 53: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/53.jpg)
XML Storage• Basic inlining
• For each node in the DTD graph a relation is created
• Creates a large no. of relations
![Page 54: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/54.jpg)
Relations created using Basic inlining
![Page 55: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/55.jpg)
XML Storage• Shared inlining
• Create relations for elements in-degree>1
• An element node is repr in exactly 1 rel
• For mutually recursive elements make one as a separate relation
![Page 56: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/56.jpg)
Relations created using shared inlining
![Page 57: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/57.jpg)
XML Storage• Hybrid inlining
• inlines elements with in-degree > 1 that are not recursive or reached through a “*” node
![Page 58: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/58.jpg)
Relations created using hybrid inlining
![Page 59: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/59.jpg)
XML Storage• STORED
• Uses a query language to specify mappings.
• Mappings are generated using mining algorithms
• Nonconforming data is stored in overflow graphs.
![Page 60: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/60.jpg)
XML Storage• STORED(contd.)
• Given a data instance D, a STORED query is generated automatically.
FROM Audit.taxpayer:$X{name:$N, phone:$P1,
optional{phone:$P2}}
STORE R1($X,$N,$P1,$P2)
• Given relational mappings, generate explicit overflow mappings so that the query is lossless.
![Page 61: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/61.jpg)
XML Storage• Object oriented method
• Using DTD a hierarchy of the elements is obtained
• Each element is now modeled as a class
• For handling “*” of DTD a list of objects is maintained
• To handle union types(Eg., phone|email) new class can be introduced
![Page 62: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/62.jpg)
XML Storage• eXcelon way
– eXcelon XML Data Engine is a high performance XML data management engine
– Based on ObjectStore DBMS
– When XML data gets parsed in eXcelon, it is represented in XMLStore as discrete XML elements.
– The hierarchical structure of XML is therefore preserved in its persistent representation
![Page 63: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/63.jpg)
XML AlgebraWhy yet another algebra?
– Structure of data• Deeply structured
• Exact structure not specific
– Recursion• Structurally recursive
Proposed Algebra: Too much stress on type conformance
![Page 64: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/64.jpg)
XML Algebra• Sample Data<bib>
<book>
<title>Data on the Web</title>
<year>1999</year>
<author>Abiteboul</author>
<author>Buneman</author>
</book>
<book>
<title> XML Query</title>
<year>2000</year>
<author>Mary</author>
</book>
</bib>
![Page 65: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/65.jpg)
XML Algebratype Bib = bib [ Book{0,*}]
type Book = book [
title [String ],
year [Integer],
author[ String]{1,*}
]
let bib0: Bib = bib [
book [
title [“Data on the Web”], year [1999],
author[“Abiteboul”], author[“Buneman”]
]
book[
title[“XML Query”],year[2000],
author[“Mary”]
]
]
![Page 66: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/66.jpg)
XML Algebra• Projection
Eg: project book( children (bib0) )– Allows a more convenient notation as well
(similar to Xpath notation)– Eg. bib0/book/author
==> author [“Abiteboul”]
author [“Buneman”]
author [“Mary”]
:author [ String ] {0,*}
![Page 67: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/67.jpg)
XML Algebra• Selection
Eg: for b bib0/book in
where value(b/year) <= 2000 then b
==> book [
title [ “Data on the web”],
year [“1999”],
author[“Abiteboul”],
author[“Buneman”]
]
: Book{0,*}
![Page 68: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/68.jpg)
XML Algebra• Join:type Reviews =
reviews [
book [
title [String],
review [ String]
]{0,*}
]
let review0: Reviews =
reviews[
book [ title[“XMLQuery”],
review[“A fine book”]
],
book [ title[“Data on Web”],
review[“This is great”]
]
]
![Page 69: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/69.jpg)
XML Algebra• Join
for b bib0/book infor r review0/book in
where value(b/title) = value(r/title) thenbook [ b/title, b/author, r/review]
==> book [title [“Data on the web”],
author[“Abiteboul”],author[“Buneman”]
review[“A fine book”]],
![Page 70: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/70.jpg)
XML Algebra• Join book[
title[“XML Query”],
author[“Mary”],
review[“This is great”]
]
: book[
title[String ],
author[String]{1,*},
review[String]
]{0,*}
![Page 71: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/71.jpg)
XML Algebra• Querying Order
– Index function pairs an integer index with each element in a forest
– Eg: index(book0/author)
==> pair[fst[1],snd[author[“Abiteboul”]]],
pair[fst[2],snd [author[“Buneman”]]],
pair[fst[3],snd [author[“Suciu”]]]
:pair[fst[Integer],snd[author[String]]]{1,*}
![Page 72: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/72.jpg)
XML Algebra• Aggregation
– Has five built-in aggregation
functions: avg,count, max, min and sum– Eg:
for b bib0/book in
where count(b/author) >= 2 then b/title
==> title[“Data on the web”]
: title{0,*}
![Page 73: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/73.jpg)
XML Algebra• Additional Features
– Structural Recursion • To define documents with recursive structure, recursive types
are used
– Sorting• sort(pairs)
– Grouping• Group(pairs)
![Page 74: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/74.jpg)
Kweelt• Is a framework to query XML Data
• An implementation of Quilt
• Architecture :
![Page 75: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/75.jpg)
XML Indexing1
2 3 4 5 6
7 8 9 10 11 12 13
t t t t t
a b a c a d a a b
Semistructured Data
![Page 76: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/76.jpg)
XML Indexing• Data guides(Used in Lore)
• Data guide is a concise and accurate summary of the data graph
1
2 3 4 5 6
7 8 10 12 13 7 13 9 11
t
ab c
d
Data Guide
![Page 77: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/77.jpg)
XML Indexing• T-Index
1
2 3 4 5 6
7 13 8 10 12 9 11
t
aa c db
T-Index
![Page 78: XML: Data Driving Business?](https://reader038.vdocuments.net/reader038/viewer/2022103006/568131f1550346895d985210/html5/thumbnails/78.jpg)
Challenges
• Storage issues• Relational or native?
• Query optimization• Query plan?
• Other than queries…say triggers?
• Updates to data
• Mining of XML data