querying xml streams in db2

Download Querying XML streams in DB2

If you can't read please download the document

Upload: nadine

Post on 10-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Querying XML streams in DB2. Vanja Josifovski Marcus Fontoura Knowledge Management Dept. IBM Almaden Research Center. Agenda. Motivation and background SQL/XML, XPath, XQuery, XML streams TurboXPath (TXP) TXP role in DB2 Design Evaluation results Conclusions and future work - PowerPoint PPT Presentation

TRANSCRIPT

  • Querying XML streams in DB2 Vanja JosifovskiMarcus FontouraKnowledge Management Dept.IBM Almaden Research Center

  • AgendaMotivation and backgroundSQL/XML, XPath, XQuery, XML streamsTurboXPath (TXP)TXP role in DB2DesignEvaluation resultsConclusions and future workOther research areas

  • MotivationCurrent trends in DBMS:New XML data type and a set of new XML-related operatorsXML-enabled integration systemQueries over locally stored XML data and XML data streamed from external sourcesWeb services and business-to-business applicationsQuerying XML (streams) is essential

  • SQL/XMLSQL - Part 14 - XML related specifications (SQL/XML)http://www.sqlx.orgNew XML data typePublishing functionsXMLElement, XMLAttribute, XMLAgg Querying functionsXMLContains, XMLExtract, XMLTable (shred)

  • XPathXML query language defined by W3C working groupOperates over a single document (no joins)Single extraction point, returning a node setXPath examples//customer//customer/@id//customer[birthdate=07/25/1970]/name//customer[address[state=CA]]

  • XQuery (1/2)Also defined by W3C working group Extends XPath forProcessing several XML documents (joins)Constructing XML resultsCan return multiple node setsFLWR (flower) is the most common type of expression

  • XQuery (2/2)XQuery example

    FOR $c IN document("doc1.xml")//customer FOR $p IN document("doc2.xml")//profiles[cid=$c/cid()] LET $o := $c/order WHERE $o/date = '12/12/01' RETURN {$c/name} {$p/status} {$o/amount}

  • XML StreamsApplications need to store XML documents in relational databases as XML as relational data ExampleWeb services

  • TXP role in DB2 (1/3)

    XML Storage XPath-based InterfaceXML IndexingTXP

    Textual XMLTXP

    XML Streams Web ServicesTXPcontextXML Enabled Runtimexml fragments/column valuesXPath/XQuery

  • TXP role in DB2 (2/3)Table accesses in traditional query evaluation pipelinesReturns virtual tables of XML columnsExampleFOR $c IN document("doc1.xml")//customer FOR $p IN document("doc2.xml")//profiles[cid=$c/cid()] LET $o := $c/order WHERE $o/date = '12/12/01' RETURN {$c/name} {$p/status} {$o/amount}

  • TXP role in DB2 (3/3)cidnameamountcid = cidstatusnameamountXML generation operatorsstatusnameamount

  • TurboXPath (TXP)Processing of multiple XPath expressions:One pass over the XML documentDocument order (pre-order) traversalNo need to build a DOM tree in memoryResults emitted as found in the documentEfficient over:XML streamsPre-parsed XML documents

  • TXP Features (1/2) Forward axes (child /, descendant //)Backward axes (parent .. and ancestor)Query rewrites over streamsPredicates (Boolean and positional)/a/b[c + d > 5 or .//e]//a[5] - currently being implemented Any node test//contributors/*/name

  • TXP Features (1/2)Multiple extraction points (tuples)://customer[name and address and phone] return tuples Subset of FOR-LET-WHERE over a single document Very common case in the XQuery use docCurrent supports most of XPath 1.0Recursive XML input documents

  • TXP ArchitectureExpression parserSAX EventHandlersTuple constructor/ Buffer managementInput pathexpressionsXML stream Output tuplesTXPEvaluatorDocumentWalkerPre-parsed XML (stored)

  • TXP internals: evaluatorParse tree - staticStructural treePredicate treesWork array - dynamicState of the evaluatorIn-lined tree documentBuffersResults (copy or reference)Predicate evaluation (copy)Discard when not neededQuery: /a/b[$c + d > 5 or .//$e]

  • Execution example (1) acb(c and b)Parse treer

    c1 b1

    ...Input XMLrF0rF0aF*rQuery: //a[c]//bb buffers:noneparse treepointerdocument levelstatus flaginitial work array with one entry

  • Execution example (2)

    c1 b1

    ...acb(c and b)Parse treerInput XMLrF0rF0aF*rF0aF*cF2bF*raQuery: //a[c]//bb buffers:none

  • Execution example (3)

    c1 b1

    ...acb(c and b)Parse treerInput XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cT2bF*racQuery: //a[c]//bb buffers:none

  • Execution example (4)

    c1 b1

    ...acb(c and b)Parse treerInput XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cT2bF*rac /cQuery: //a[c]//bb buffers:none

  • Execution example (4)

    c1 b1

    ...acb(c and b)Parse treerInput XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cT2bF*rac /c bQuery: //a[c]//bb buffers:1.

  • Execution example (5)

    c1 b1

    ...acb(c and b)Parse treerInput XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cT2bF*rac /c bQuery: //a[c]//bb buffers:1. b1rF0aF*cT2bT*/b

  • Execution example (6)

    c1 b1

    ...acb(c and b)Parse treerInput XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cT2bF*rac /c bQuery: //a[c]//bb buffers:1. rF0aF*cT2bT*/brT0aT*/a

  • Recursive execution example (1)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rQuery: //a[c]//bb buffers:none

  • Recursive execution example (2)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rF0aF*cF2bF*raQuery: //a[c]//bb buffers:none

  • Recursive execution example (3)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cF2bF*cF3bF*raaQuery: //a[c]//bb buffers:none

  • Recursive execution example (4)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cF2bF*cF3bF*rF0aF*cF2bF*cT3bF*raacQuery: //a[c]//bb buffers:none

  • Recursive execution example (5)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cF2bF*cF3bF*rF0aF*cF2bF*cT3bF*raac /cQuery: //a[c]//bb buffers:none

  • Recursive execution example (6)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLb buffers:1. rF0rF0aF*rF0aF*cF2bF*rF0aF*cF2bF*cF3bF*rF0aF*cF2bF*cT3bF*raac /crF0aF*cF2bF*cT3bF*bQuery: //a[c]//bb1 buffer open

  • Recursive execution example (7)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cF2bF*cF3bF*rF0aF*cF2bF*cT3bF*rF0aF*cF2bT*cT3bT*raac /c/brF0aF*cF2bF*cT3bF*bQuery: //a[c]//bb1 buffer openb buffers:1. b1

  • Recursive execution example (8)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cF2bF*cF3bF*rF0aF*cF2bF*cT3bF*rF0aF*cF2bT*cT3bT*rT0aT*cF2bT*raac /c/brF0aF*cF2bF*cT3bF*b/aQuery: //a[c]//bb1 buffer openb1 buffer closeb buffers:1. b1

  • Recursive execution example (9)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cF2bF*cF3bF*rF0aF*cF2bF*cT3bF*rF0aF*cF2bT*cT3bT*rT0aT*cF2bT*raac /c/brF0aF*cF2bF*cT3bF*b bQuery: //a[c]//bb1 buffer openb1 buffer closeb2 buffer openb buffers:1. b12. rT0aT*cF2bT*/a

  • Recursive execution example (10)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cF2bF*cF3bF*rF0aF*cF2bF*cT3bF*rF0aF*cF2bT*cT3bT*rT0aT*cF2bT*raac /c/brF0aF*cF2bF*cT3bF*b/a b /bQuery: //a[c]//bb1 buffer openb2 buffer open/closeb1 buffer closeb buffers:1. b12. b2

  • Recursive execution example (11)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cF2bF*cF3bF*rF0aF*cF2bF*cT3bF*rF0aF*cF2bT*cT3bT*rT0aT*cF2bT*rT0aT*raac /c/brF0aF*cF2bF*cT3bF*b/a b /b/ab1 buffer openb2 buffer open/closeb2 removedb1 emitted, removedQuery: //a[c]//bb1 buffer closeb buffers:none

  • Recursive execution example (12)acb(c and b)Parse treer

    c1 b1 b2

    ...Input XMLrF0rF0aF*rF0aF*cF2bF*rF0aF*cF2bF*cF3bF*rF0aF*cF2bF*cT3bF*rF0aF*cF2bT*cT3bT*rT0aT*cF2bT*rT0aT*rT0aT*cF2bF*raac /c/brF0aF*cF2bF*cT3bF*b/a b /b/aaQuery: //a[c]//bb buffers:none

  • Predicate evaluationSeparate parse tree for the predicates, attached at an anchor node in the structure treeEvaluated when anchor node closedPredicate parse tree leafs point into the structure parse treePredicate tree is traversed and evaluated

  • Predicate PushdownSingle value predicates can be evaluated before the anchor node is closed: Example: /x[a>b and c = 5]rxabcab>=5crxabcandab>=5cand

  • Tuple construction using buffer annotations 1 2 3 4 5 6 7 8 9 10

    11 12

    ...Input XMLFragmentAncestor sets212ASt={1}ASt={11}FragmentAncestor sets48ASt={1}; ASa={3}ASt={1}; ASa={6,7}FragmentAncestor sets59ASt={1}; ASa={3}ASt={1}; ASa={7}g output buffersb/text() output buffersc/text() output buffers9ASt={1}; ASa={6}c/text()59Result10b/text()488g222rtgabc

  • Evaluation (i)XMLContains (Boolean query)

    Chart5

    0.7160.0650.19

    1.3720.06050.451

    1.9680.040.5805

    2.6090.0450.781

    5.0120.051.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (a) XMLContains (DBLP)

    Sheet1

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7310.7010.040.099110.7160.0650.190.190.19

    17981.3821.3620.0810.0417981.3720.06050.4510.4410.461

    26462.0031.9330.050.0326461.9680.040.58050.6410.52

    35332.6442.5740.050.0435332.6090.0450.7810.7710.791

    70655.0974.9270.070.0370655.0120.051.6781.8031.553

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7210.7110.2810.3419110.7160.3110.190.190.19

    17981.3821.3720.610.55117981.3770.58050.4510.4410.461

    26462.0831.9531.0420.83126462.0180.93650.58050.6410.52

    35332.6342.5841.0721.18135332.6091.12650.7810.7710.791

    70655.0374.8972.5832.17470654.9672.37851.6781.8031.553

    File size (KB)1 Output2 Outputs3 Outputs

    9110.270.260.26

    17980.4910.4910.511

    26460.7110.8310.791

    35331.1321.1821.021

    70651.7521.7231.753

    File size (KB)1 Output2 Outputs3 Outputs1 Output2 Outputs3 Outputs

    9110.3610.390.351111

    17980.520.5210.51141664

    26460.7820.7010.921749343

    35330.9711.0221.102101001000

    70652.0431.8922.403204008000

    File size (KB)TXPDB2-ADB2-B

    9110.3310.5710.551

    17980.5811.0710.882

    26460.8711.6631.091

    35331.1022.0831.452

    70652.6335.3173.605

    File size (KB)TXPDB2

    9110.270.281

    17980.580.551

    26460.8120.941

    35331.0911.232

    70652.2932.634

    Sheet2

    File Size (KB)XalanTXPParsingFile Size (KB)1 Output2 Outputs3 Outputs

    4070.840.260.184070.2610.3210.261

    8131.540.490.3118130.440.4410.52

    12192.370.760.53112190.8010.7110.802

    16253.371.030.6216250.9711.0311.001

    20324.211.200.74220321.2621.2521.282

    24384.641.490.98124381.5421.4921.472

    28445.521.701.03128441.7831.6921.752

    32506.081.941.35232501.9221.9431.883

    36576.992.191.40236572.1632.1532.243

    40638.052.411.47340632.4242.4142.454

    487511.042.101.76256883.3853.4955.157

    568813.073.382.09365003.8663.9565.338

    650014.843.862.43473134.2864.4266.209

    731315.184.372.63381254.6664.8474.897

    812516.844.712.874

    File Size (KB)XalanTXPParsing

    4071.110.260.18

    8131.960.4510.311

    12192.850.7510.531

    16254.221.0410.62

    20324.671.1920.742

    24385.741.5020.981

    28446.991.7231.031

    32507.691.9331.352

    36578.842.1731.402

    40639.992.4331.473

    487513.712.9741.762

    568816.003.3942.093

    650018.693.9062.434

    731319.174.3362.633

    812520.4494.6672.874

    File Size (MB)XalanTXP

    443

    897

    202827

    404237

    8014056

    20087082

    280148597

    4002098124

    File Size (MB)XalanTXP

    4161.7

    8251.7

    20531.7

    401101.7

    2001401.7

    4003101.7

    Sheet3

    Xalan

    Txp1

    Txp2

    Sheet3

    0.7160.0650.19

    1.3720.06050.451

    1.9680.040.5805

    2.6090.0450.781

    5.0120.051.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (a) XMLContains (DBLP)

    0.7160.3110.19

    1.3770.58050.451

    2.0180.93650.5805

    2.6091.12650.781

    4.9672.37851.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (c) XMLExtract (DBLP)

    0.8410.260.18

    1.5430.4910.311

    2.3730.7610.531

    3.3651.0310.62

    4.2061.2020.742

    4.6371.4920.981

    5.5181.7031.031

    6.0781.9431.352

    6.992.1931.402

    8.0522.4131.473

    11.0362.1031.762

    13.0693.3752.093

    14.8413.8562.434

    15.1824.3662.633

    16.8444.7072.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (b) XMLContains (Random)

    1.1120.260.18

    1.9630.4510.311

    2.8540.7510.531

    4.2161.0410.62

    4.6671.1920.742

    5.7381.5020.981

    6.991.7231.031

    7.6911.9331.352

    8.8432.1731.402

    9.9942.4331.473

    13.712.9741.762

    16.0033.3942.093

    18.6873.9062.434

    19.1684.3362.633

    20.4494.6672.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (d) XMLExtract (Random)

    0.270.260.26

    0.4910.4910.511

    0.7110.8310.791

    1.1321.1821.021

    1.7521.7231.753

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (g) Number of Outputs (DBLP Query 1)

    0.3610.390.351

    0.520.5210.511

    0.7820.7010.921

    0.9711.0221.102

    2.0431.8922.403

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (h) Number of Outputs (DBLP Query 2)

    43

    97

    2827

    4237

    14056

    87082

    148597

    2098124

    Xalan

    TXP

    File Size(MB)

    Time (s)

    (e) XMLExtract (Review)

    161.7

    251.7

    531.7

    1101.7

    1401.7

    3101.7

    Xalan

    TXP

    File Size (MB)

    Memory (MB)

    (f) Memory Usage (Review)

    Chart6

    0.8410.260.18

    1.5430.4910.311

    2.3730.7610.531

    3.3651.0310.62

    4.2061.2020.742

    4.6371.4920.981

    5.5181.7031.031

    6.0781.9431.352

    6.992.1931.402

    8.0522.4131.473

    11.0362.1031.762

    13.0693.3752.093

    14.8413.8562.434

    15.1824.3662.633

    16.8444.7072.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (b) XMLContains (Random)

    Sheet1

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7310.7010.040.099110.7160.0650.190.190.19

    17981.3821.3620.0810.0417981.3720.06050.4510.4410.461

    26462.0031.9330.050.0326461.9680.040.58050.6410.52

    35332.6442.5740.050.0435332.6090.0450.7810.7710.791

    70655.0974.9270.070.0370655.0120.051.6781.8031.553

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7210.7110.2810.3419110.7160.3110.190.190.19

    17981.3821.3720.610.55117981.3770.58050.4510.4410.461

    26462.0831.9531.0420.83126462.0180.93650.58050.6410.52

    35332.6342.5841.0721.18135332.6091.12650.7810.7710.791

    70655.0374.8972.5832.17470654.9672.37851.6781.8031.553

    File size (KB)1 Output2 Outputs3 Outputs

    9110.270.260.26

    17980.4910.4910.511

    26460.7110.8310.791

    35331.1321.1821.021

    70651.7521.7231.753

    File size (KB)1 Output2 Outputs3 Outputs1 Output2 Outputs3 Outputs

    9110.3610.390.351111

    17980.520.5210.51141664

    26460.7820.7010.921749343

    35330.9711.0221.102101001000

    70652.0431.8922.403204008000

    File size (KB)TXPDB2-ADB2-B

    9110.3310.5710.551

    17980.5811.0710.882

    26460.8711.6631.091

    35331.1022.0831.452

    70652.6335.3173.605

    File size (KB)TXPDB2

    9110.270.281

    17980.580.551

    26460.8120.941

    35331.0911.232

    70652.2932.634

    Sheet2

    File Size (KB)XalanTXPParsingFile Size (KB)1 Output2 Outputs3 Outputs

    4070.840.260.184070.2610.3210.261

    8131.540.490.3118130.440.4410.52

    12192.370.760.53112190.8010.7110.802

    16253.371.030.6216250.9711.0311.001

    20324.211.200.74220321.2621.2521.282

    24384.641.490.98124381.5421.4921.472

    28445.521.701.03128441.7831.6921.752

    32506.081.941.35232501.9221.9431.883

    36576.992.191.40236572.1632.1532.243

    40638.052.411.47340632.4242.4142.454

    487511.042.101.76256883.3853.4955.157

    568813.073.382.09365003.8663.9565.338

    650014.843.862.43473134.2864.4266.209

    731315.184.372.63381254.6664.8474.897

    812516.844.712.874

    File Size (KB)XalanTXPParsing

    4071.110.260.18

    8131.960.4510.311

    12192.850.7510.531

    16254.221.0410.62

    20324.671.1920.742

    24385.741.5020.981

    28446.991.7231.031

    32507.691.9331.352

    36578.842.1731.402

    40639.992.4331.473

    487513.712.9741.762

    568816.003.3942.093

    650018.693.9062.434

    731319.174.3362.633

    812520.4494.6672.874

    File Size (MB)XalanTXP

    443

    897

    202827

    404237

    8014056

    20087082

    280148597

    4002098124

    File Size (MB)XalanTXP

    4161.7

    8251.7

    20531.7

    401101.7

    2001401.7

    4003101.7

    Sheet3

    Xalan

    Txp1

    Txp2

    Sheet3

    0.7160.0650.19

    1.3720.06050.451

    1.9680.040.5805

    2.6090.0450.781

    5.0120.051.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (a) XMLContains (DBLP)

    0.7160.3110.19

    1.3770.58050.451

    2.0180.93650.5805

    2.6091.12650.781

    4.9672.37851.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (c) XMLExtract (DBLP)

    0.8410.260.18

    1.5430.4910.311

    2.3730.7610.531

    3.3651.0310.62

    4.2061.2020.742

    4.6371.4920.981

    5.5181.7031.031

    6.0781.9431.352

    6.992.1931.402

    8.0522.4131.473

    11.0362.1031.762

    13.0693.3752.093

    14.8413.8562.434

    15.1824.3662.633

    16.8444.7072.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (b) XMLContains (Random)

    1.1120.260.18

    1.9630.4510.311

    2.8540.7510.531

    4.2161.0410.62

    4.6671.1920.742

    5.7381.5020.981

    6.991.7231.031

    7.6911.9331.352

    8.8432.1731.402

    9.9942.4331.473

    13.712.9741.762

    16.0033.3942.093

    18.6873.9062.434

    19.1684.3362.633

    20.4494.6672.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (d) XMLExtract (Random)

    0.270.260.26

    0.4910.4910.511

    0.7110.8310.791

    1.1321.1821.021

    1.7521.7231.753

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (g) Number of Outputs (DBLP Query 1)

    0.3610.390.351

    0.520.5210.511

    0.7820.7010.921

    0.9711.0221.102

    2.0431.8922.403

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (h) Number of Outputs (DBLP Query 2)

    43

    97

    2827

    4237

    14056

    87082

    148597

    2098124

    Xalan

    TXP

    File Size(MB)

    Time (s)

    (e) XMLExtract (Review)

    161.7

    251.7

    531.7

    1101.7

    1401.7

    3101.7

    Xalan

    TXP

    File Size (MB)

    Memory (MB)

    (f) Memory Usage (Review)

  • Evaluation (ii)XMLExtract (single column extraction)

    Chart7

    0.7160.3110.19

    1.3770.58050.451

    2.0180.93650.5805

    2.6091.12650.781

    4.9672.37851.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (c) XMLExtract (DBLP)

    Sheet1

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7310.7010.040.099110.7160.0650.190.190.19

    17981.3821.3620.0810.0417981.3720.06050.4510.4410.461

    26462.0031.9330.050.0326461.9680.040.58050.6410.52

    35332.6442.5740.050.0435332.6090.0450.7810.7710.791

    70655.0974.9270.070.0370655.0120.051.6781.8031.553

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7210.7110.2810.3419110.7160.3110.190.190.19

    17981.3821.3720.610.55117981.3770.58050.4510.4410.461

    26462.0831.9531.0420.83126462.0180.93650.58050.6410.52

    35332.6342.5841.0721.18135332.6091.12650.7810.7710.791

    70655.0374.8972.5832.17470654.9672.37851.6781.8031.553

    File size (KB)1 Output2 Outputs3 Outputs

    9110.270.260.26

    17980.4910.4910.511

    26460.7110.8310.791

    35331.1321.1821.021

    70651.7521.7231.753

    File size (KB)1 Output2 Outputs3 Outputs1 Output2 Outputs3 Outputs

    9110.3610.390.351111

    17980.520.5210.51141664

    26460.7820.7010.921749343

    35330.9711.0221.102101001000

    70652.0431.8922.403204008000

    File size (KB)TXPDB2-ADB2-B

    9110.3310.5710.551

    17980.5811.0710.882

    26460.8711.6631.091

    35331.1022.0831.452

    70652.6335.3173.605

    File size (KB)TXPDB2

    9110.270.281

    17980.580.551

    26460.8120.941

    35331.0911.232

    70652.2932.634

    Sheet2

    File Size (KB)XalanTXPParsingFile Size (KB)1 Output2 Outputs3 Outputs

    4070.840.260.184070.2610.3210.261

    8131.540.490.3118130.440.4410.52

    12192.370.760.53112190.8010.7110.802

    16253.371.030.6216250.9711.0311.001

    20324.211.200.74220321.2621.2521.282

    24384.641.490.98124381.5421.4921.472

    28445.521.701.03128441.7831.6921.752

    32506.081.941.35232501.9221.9431.883

    36576.992.191.40236572.1632.1532.243

    40638.052.411.47340632.4242.4142.454

    487511.042.101.76256883.3853.4955.157

    568813.073.382.09365003.8663.9565.338

    650014.843.862.43473134.2864.4266.209

    731315.184.372.63381254.6664.8474.897

    812516.844.712.874

    File Size (KB)XalanTXPParsing

    4071.110.260.18

    8131.960.4510.311

    12192.850.7510.531

    16254.221.0410.62

    20324.671.1920.742

    24385.741.5020.981

    28446.991.7231.031

    32507.691.9331.352

    36578.842.1731.402

    40639.992.4331.473

    487513.712.9741.762

    568816.003.3942.093

    650018.693.9062.434

    731319.174.3362.633

    812520.4494.6672.874

    File Size (MB)XalanTXP

    443

    897

    202827

    404237

    8014056

    20087082

    280148597

    4002098124

    File Size (MB)XalanTXP

    4161.7

    8251.7

    20531.7

    401101.7

    2001401.7

    4003101.7

    Sheet3

    Xalan

    Txp1

    Txp2

    Sheet3

    0.7160.0650.19

    1.3720.06050.451

    1.9680.040.5805

    2.6090.0450.781

    5.0120.051.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (a) XMLContains (DBLP)

    0.7160.3110.19

    1.3770.58050.451

    2.0180.93650.5805

    2.6091.12650.781

    4.9672.37851.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (c) XMLExtract (DBLP)

    0.8410.260.18

    1.5430.4910.311

    2.3730.7610.531

    3.3651.0310.62

    4.2061.2020.742

    4.6371.4920.981

    5.5181.7031.031

    6.0781.9431.352

    6.992.1931.402

    8.0522.4131.473

    11.0362.1031.762

    13.0693.3752.093

    14.8413.8562.434

    15.1824.3662.633

    16.8444.7072.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (b) XMLContains (Random)

    1.1120.260.18

    1.9630.4510.311

    2.8540.7510.531

    4.2161.0410.62

    4.6671.1920.742

    5.7381.5020.981

    6.991.7231.031

    7.6911.9331.352

    8.8432.1731.402

    9.9942.4331.473

    13.712.9741.762

    16.0033.3942.093

    18.6873.9062.434

    19.1684.3362.633

    20.4494.6672.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (d) XMLExtract (Random)

    0.270.260.26

    0.4910.4910.511

    0.7110.8310.791

    1.1321.1821.021

    1.7521.7231.753

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (g) Number of Outputs (DBLP Query 1)

    0.3610.390.351

    0.520.5210.511

    0.7820.7010.921

    0.9711.0221.102

    2.0431.8922.403

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (h) Number of Outputs (DBLP Query 2)

    43

    97

    2827

    4237

    14056

    87082

    148597

    2098124

    Xalan

    TXP

    File Size(MB)

    Time (s)

    (e) XMLExtract (Review)

    161.7

    251.7

    531.7

    1101.7

    1401.7

    3101.7

    Xalan

    TXP

    File Size (MB)

    Memory (MB)

    (f) Memory Usage (Review)

    Chart8

    1.1120.260.18

    1.9630.4510.311

    2.8540.7510.531

    4.2161.0410.62

    4.6671.1920.742

    5.7381.5020.981

    6.991.7231.031

    7.6911.9331.352

    8.8432.1731.402

    9.9942.4331.473

    13.712.9741.762

    16.0033.3942.093

    18.6873.9062.434

    19.1684.3362.633

    20.4494.6672.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (d) XMLExtract (Random)

    Sheet1

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7310.7010.040.099110.7160.0650.190.190.19

    17981.3821.3620.0810.0417981.3720.06050.4510.4410.461

    26462.0031.9330.050.0326461.9680.040.58050.6410.52

    35332.6442.5740.050.0435332.6090.0450.7810.7710.791

    70655.0974.9270.070.0370655.0120.051.6781.8031.553

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7210.7110.2810.3419110.7160.3110.190.190.19

    17981.3821.3720.610.55117981.3770.58050.4510.4410.461

    26462.0831.9531.0420.83126462.0180.93650.58050.6410.52

    35332.6342.5841.0721.18135332.6091.12650.7810.7710.791

    70655.0374.8972.5832.17470654.9672.37851.6781.8031.553

    File size (KB)1 Output2 Outputs3 Outputs

    9110.270.260.26

    17980.4910.4910.511

    26460.7110.8310.791

    35331.1321.1821.021

    70651.7521.7231.753

    File size (KB)1 Output2 Outputs3 Outputs1 Output2 Outputs3 Outputs

    9110.3610.390.351111

    17980.520.5210.51141664

    26460.7820.7010.921749343

    35330.9711.0221.102101001000

    70652.0431.8922.403204008000

    File size (KB)TXPDB2-ADB2-B

    9110.3310.5710.551

    17980.5811.0710.882

    26460.8711.6631.091

    35331.1022.0831.452

    70652.6335.3173.605

    File size (KB)TXPDB2

    9110.270.281

    17980.580.551

    26460.8120.941

    35331.0911.232

    70652.2932.634

    Sheet2

    File Size (KB)XalanTXPParsingFile Size (KB)1 Output2 Outputs3 Outputs

    4070.840.260.184070.2610.3210.261

    8131.540.490.3118130.440.4410.52

    12192.370.760.53112190.8010.7110.802

    16253.371.030.6216250.9711.0311.001

    20324.211.200.74220321.2621.2521.282

    24384.641.490.98124381.5421.4921.472

    28445.521.701.03128441.7831.6921.752

    32506.081.941.35232501.9221.9431.883

    36576.992.191.40236572.1632.1532.243

    40638.052.411.47340632.4242.4142.454

    487511.042.101.76256883.3853.4955.157

    568813.073.382.09365003.8663.9565.338

    650014.843.862.43473134.2864.4266.209

    731315.184.372.63381254.6664.8474.897

    812516.844.712.874

    File Size (KB)XalanTXPParsing

    4071.110.260.18

    8131.960.4510.311

    12192.850.7510.531

    16254.221.0410.62

    20324.671.1920.742

    24385.741.5020.981

    28446.991.7231.031

    32507.691.9331.352

    36578.842.1731.402

    40639.992.4331.473

    487513.712.9741.762

    568816.003.3942.093

    650018.693.9062.434

    731319.174.3362.633

    812520.4494.6672.874

    File Size (MB)XalanTXP

    443

    897

    202827

    404237

    8014056

    20087082

    280148597

    4002098124

    File Size (MB)XalanTXP

    4161.7

    8251.7

    20531.7

    401101.7

    2001401.7

    4003101.7

    Sheet3

    Xalan

    Txp1

    Txp2

    Sheet3

    0.7160.0650.19

    1.3720.06050.451

    1.9680.040.5805

    2.6090.0450.781

    5.0120.051.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (a) XMLContains (DBLP)

    0.7160.3110.19

    1.3770.58050.451

    2.0180.93650.5805

    2.6091.12650.781

    4.9672.37851.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (c) XMLExtract (DBLP)

    0.8410.260.18

    1.5430.4910.311

    2.3730.7610.531

    3.3651.0310.62

    4.2061.2020.742

    4.6371.4920.981

    5.5181.7031.031

    6.0781.9431.352

    6.992.1931.402

    8.0522.4131.473

    11.0362.1031.762

    13.0693.3752.093

    14.8413.8562.434

    15.1824.3662.633

    16.8444.7072.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (b) XMLContains (Random)

    1.1120.260.18

    1.9630.4510.311

    2.8540.7510.531

    4.2161.0410.62

    4.6671.1920.742

    5.7381.5020.981

    6.991.7231.031

    7.6911.9331.352

    8.8432.1731.402

    9.9942.4331.473

    13.712.9741.762

    16.0033.3942.093

    18.6873.9062.434

    19.1684.3362.633

    20.4494.6672.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (d) XMLExtract (Random)

    0.270.260.26

    0.4910.4910.511

    0.7110.8310.791

    1.1321.1821.021

    1.7521.7231.753

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (g) Number of Outputs (DBLP Query 1)

    0.3610.390.351

    0.520.5210.511

    0.7820.7010.921

    0.9711.0221.102

    2.0431.8922.403

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (h) Number of Outputs (DBLP Query 2)

    43

    97

    2827

    4237

    14056

    87082

    148597

    2098124

    Xalan

    TXP

    File Size(MB)

    Time (s)

    (e) XMLExtract (Review)

    161.7

    251.7

    531.7

    1101.7

    1401.7

    3101.7

    Xalan

    TXP

    File Size (MB)

    Memory (MB)

    (f) Memory Usage (Review)

  • Evaluation (iii)XMLExtract (over large files, outside DB2)

    Chart9

    43

    97

    2827

    4237

    14056

    87082

    148597

    2098124

    Xalan

    TXP

    File Size(MB)

    Time (s)

    (e) XMLExtract (Review)

    Sheet1

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7310.7010.040.099110.7160.0650.190.190.19

    17981.3821.3620.0810.0417981.3720.06050.4510.4410.461

    26462.0031.9330.050.0326461.9680.040.58050.6410.52

    35332.6442.5740.050.0435332.6090.0450.7810.7710.791

    70655.0974.9270.070.0370655.0120.051.6781.8031.553

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7210.7110.2810.3419110.7160.3110.190.190.19

    17981.3821.3720.610.55117981.3770.58050.4510.4410.461

    26462.0831.9531.0420.83126462.0180.93650.58050.6410.52

    35332.6342.5841.0721.18135332.6091.12650.7810.7710.791

    70655.0374.8972.5832.17470654.9672.37851.6781.8031.553

    File size (KB)1 Output2 Outputs3 Outputs

    9110.270.260.26

    17980.4910.4910.511

    26460.7110.8310.791

    35331.1321.1821.021

    70651.7521.7231.753

    File size (KB)1 Output2 Outputs3 Outputs1 Output2 Outputs3 Outputs

    9110.3610.390.351111

    17980.520.5210.51141664

    26460.7820.7010.921749343

    35330.9711.0221.102101001000

    70652.0431.8922.403204008000

    File size (KB)TXPDB2-ADB2-B

    9110.3310.5710.551

    17980.5811.0710.882

    26460.8711.6631.091

    35331.1022.0831.452

    70652.6335.3173.605

    File size (KB)TXPDB2

    9110.270.281

    17980.580.551

    26460.8120.941

    35331.0911.232

    70652.2932.634

    Sheet2

    File Size (KB)XalanTXPParsingFile Size (KB)1 Output2 Outputs3 Outputs

    4070.840.260.184070.2610.3210.261

    8131.540.490.3118130.440.4410.52

    12192.370.760.53112190.8010.7110.802

    16253.371.030.6216250.9711.0311.001

    20324.211.200.74220321.2621.2521.282

    24384.641.490.98124381.5421.4921.472

    28445.521.701.03128441.7831.6921.752

    32506.081.941.35232501.9221.9431.883

    36576.992.191.40236572.1632.1532.243

    40638.052.411.47340632.4242.4142.454

    487511.042.101.76256883.3853.4955.157

    568813.073.382.09365003.8663.9565.338

    650014.843.862.43473134.2864.4266.209

    731315.184.372.63381254.6664.8474.897

    812516.844.712.874

    File Size (KB)XalanTXPParsing

    4071.110.260.18

    8131.960.4510.311

    12192.850.7510.531

    16254.221.0410.62

    20324.671.1920.742

    24385.741.5020.981

    28446.991.7231.031

    32507.691.9331.352

    36578.842.1731.402

    40639.992.4331.473

    487513.712.9741.762

    568816.003.3942.093

    650018.693.9062.434

    731319.174.3362.633

    812520.4494.6672.874

    File Size (MB)XalanTXP

    443

    897

    202827

    404237

    8014056

    20087082

    280148597

    4002098124

    File Size (MB)XalanTXP

    4161.7

    8251.7

    20531.7

    401101.7

    2001401.7

    4003101.7

    Sheet3

    Xalan

    Txp1

    Txp2

    Sheet3

    0.7160.0650.19

    1.3720.06050.451

    1.9680.040.5805

    2.6090.0450.781

    5.0120.051.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (a) XMLContains (DBLP)

    0.7160.3110.19

    1.3770.58050.451

    2.0180.93650.5805

    2.6091.12650.781

    4.9672.37851.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (c) XMLExtract (DBLP)

    0.8410.260.18

    1.5430.4910.311

    2.3730.7610.531

    3.3651.0310.62

    4.2061.2020.742

    4.6371.4920.981

    5.5181.7031.031

    6.0781.9431.352

    6.992.1931.402

    8.0522.4131.473

    11.0362.1031.762

    13.0693.3752.093

    14.8413.8562.434

    15.1824.3662.633

    16.8444.7072.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (b) XMLContains (Random)

    1.1120.260.18

    1.9630.4510.311

    2.8540.7510.531

    4.2161.0410.62

    4.6671.1920.742

    5.7381.5020.981

    6.991.7231.031

    7.6911.9331.352

    8.8432.1731.402

    9.9942.4331.473

    13.712.9741.762

    16.0033.3942.093

    18.6873.9062.434

    19.1684.3362.633

    20.4494.6672.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (d) XMLExtract (Random)

    0.270.260.26

    0.4910.4910.511

    0.7110.8310.791

    1.1321.1821.021

    1.7521.7231.753

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (g) Number of Outputs (DBLP Query 1)

    0.3610.390.351

    0.520.5210.511

    0.7820.7010.921

    0.9711.0221.102

    2.0431.8922.403

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (h) Number of Outputs (DBLP Query 2)

    43

    97

    2827

    4237

    14056

    87082

    148597

    2098124

    Xalan

    TXP

    File Size(MB)

    Time (s)

    (e) XMLExtract (Review)

    161.7

    251.7

    531.7

    1101.7

    1401.7

    3101.7

    Xalan

    TXP

    File Size (MB)

    Memory (MB)

    (f) Memory Usage (Review)

    Chart10

    161.7

    251.7

    531.7

    1101.7

    1401.7

    3101.7

    Xalan

    TXP

    File Size (MB)

    Memory (MB)

    (f) Memory Usage (Review)

    Sheet1

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7310.7010.040.099110.7160.0650.190.190.19

    17981.3821.3620.0810.0417981.3720.06050.4510.4410.461

    26462.0031.9330.050.0326461.9680.040.58050.6410.52

    35332.6442.5740.050.0435332.6090.0450.7810.7710.791

    70655.0974.9270.070.0370655.0120.051.6781.8031.553

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7210.7110.2810.3419110.7160.3110.190.190.19

    17981.3821.3720.610.55117981.3770.58050.4510.4410.461

    26462.0831.9531.0420.83126462.0180.93650.58050.6410.52

    35332.6342.5841.0721.18135332.6091.12650.7810.7710.791

    70655.0374.8972.5832.17470654.9672.37851.6781.8031.553

    File size (KB)1 Output2 Outputs3 Outputs

    9110.270.260.26

    17980.4910.4910.511

    26460.7110.8310.791

    35331.1321.1821.021

    70651.7521.7231.753

    File size (KB)1 Output2 Outputs3 Outputs1 Output2 Outputs3 Outputs

    9110.3610.390.351111

    17980.520.5210.51141664

    26460.7820.7010.921749343

    35330.9711.0221.102101001000

    70652.0431.8922.403204008000

    File size (KB)TXPDB2-ADB2-B

    9110.3310.5710.551

    17980.5811.0710.882

    26460.8711.6631.091

    35331.1022.0831.452

    70652.6335.3173.605

    File size (KB)TXPDB2

    9110.270.281

    17980.580.551

    26460.8120.941

    35331.0911.232

    70652.2932.634

    Sheet2

    File Size (KB)XalanTXPParsingFile Size (KB)1 Output2 Outputs3 Outputs

    4070.840.260.184070.2610.3210.261

    8131.540.490.3118130.440.4410.52

    12192.370.760.53112190.8010.7110.802

    16253.371.030.6216250.9711.0311.001

    20324.211.200.74220321.2621.2521.282

    24384.641.490.98124381.5421.4921.472

    28445.521.701.03128441.7831.6921.752

    32506.081.941.35232501.9221.9431.883

    36576.992.191.40236572.1632.1532.243

    40638.052.411.47340632.4242.4142.454

    487511.042.101.76256883.3853.4955.157

    568813.073.382.09365003.8663.9565.338

    650014.843.862.43473134.2864.4266.209

    731315.184.372.63381254.6664.8474.897

    812516.844.712.874

    File Size (KB)XalanTXPParsing

    4071.110.260.18

    8131.960.4510.311

    12192.850.7510.531

    16254.221.0410.62

    20324.671.1920.742

    24385.741.5020.981

    28446.991.7231.031

    32507.691.9331.352

    36578.842.1731.402

    40639.992.4331.473

    487513.712.9741.762

    568816.003.3942.093

    650018.693.9062.434

    731319.174.3362.633

    812520.4494.6672.874

    File Size (MB)XalanTXP

    443

    897

    202827

    404237

    8014056

    20087082

    280148597

    4002098124

    File Size (MB)XalanTXP

    4161.7

    8251.7

    20531.7

    401101.7

    2001401.7

    4003101.7

    Sheet3

    Xalan

    Txp1

    Txp2

    Sheet3

    0.7160.0650.19

    1.3720.06050.451

    1.9680.040.5805

    2.6090.0450.781

    5.0120.051.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (a) XMLContains (DBLP)

    0.7160.3110.19

    1.3770.58050.451

    2.0180.93650.5805

    2.6091.12650.781

    4.9672.37851.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (c) XMLExtract (DBLP)

    0.8410.260.18

    1.5430.4910.311

    2.3730.7610.531

    3.3651.0310.62

    4.2061.2020.742

    4.6371.4920.981

    5.5181.7031.031

    6.0781.9431.352

    6.992.1931.402

    8.0522.4131.473

    11.0362.1031.762

    13.0693.3752.093

    14.8413.8562.434

    15.1824.3662.633

    16.8444.7072.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (b) XMLContains (Random)

    1.1120.260.18

    1.9630.4510.311

    2.8540.7510.531

    4.2161.0410.62

    4.6671.1920.742

    5.7381.5020.981

    6.991.7231.031

    7.6911.9331.352

    8.8432.1731.402

    9.9942.4331.473

    13.712.9741.762

    16.0033.3942.093

    18.6873.9062.434

    19.1684.3362.633

    20.4494.6672.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (d) XMLExtract (Random)

    0.270.260.26

    0.4910.4910.511

    0.7110.8310.791

    1.1321.1821.021

    1.7521.7231.753

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (g) Number of Outputs (DBLP Query 1)

    0.3610.390.351

    0.520.5210.511

    0.7820.7010.921

    0.9711.0221.102

    2.0431.8922.403

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (h) Number of Outputs (DBLP Query 2)

    43

    97

    2827

    4237

    14056

    87082

    148597

    2098124

    Xalan

    TXP

    File Size(MB)

    Time (s)

    (e) XMLExtract (Review)

    161.7

    251.7

    531.7

    1101.7

    1401.7

    3101.7

    Xalan

    TXP

    File Size (MB)

    Memory (MB)

    (f) Memory Usage (Review)

  • Evaluation (iv)XMLTable (varying the number of columns)Optimizer should generate plans that benefit from that

    Chart11

    0.270.260.26

    0.4910.4910.511

    0.7110.8310.791

    1.1321.1821.021

    1.7521.7231.753

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (g) Number of Outputs (DBLP Query 1)

    Sheet1

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7310.7010.040.099110.7160.0650.190.190.19

    17981.3821.3620.0810.0417981.3720.06050.4510.4410.461

    26462.0031.9330.050.0326461.9680.040.58050.6410.52

    35332.6442.5740.050.0435332.6090.0450.7810.7710.791

    70655.0974.9270.070.0370655.0120.051.6781.8031.553

    File size (KB)XalanXalanTXPTXPFile size (KB)XalanTXPParsingParsingParsing

    9110.7210.7110.2810.3419110.7160.3110.190.190.19

    17981.3821.3720.610.55117981.3770.58050.4510.4410.461

    26462.0831.9531.0420.83126462.0180.93650.58050.6410.52

    35332.6342.5841.0721.18135332.6091.12650.7810.7710.791

    70655.0374.8972.5832.17470654.9672.37851.6781.8031.553

    File size (KB)1 Output2 Outputs3 Outputs

    9110.270.260.26

    17980.4910.4910.511

    26460.7110.8310.791

    35331.1321.1821.021

    70651.7521.7231.753

    File size (KB)1 Output2 Outputs3 Outputs1 Output2 Outputs3 Outputs

    9110.3610.390.351111

    17980.520.5210.51141664

    26460.7820.7010.921749343

    35330.9711.0221.102101001000

    70652.0431.8922.403204008000

    File size (KB)TXPDB2-ADB2-B

    9110.3310.5710.551

    17980.5811.0710.882

    26460.8711.6631.091

    35331.1022.0831.452

    70652.6335.3173.605

    File size (KB)TXPDB2

    9110.270.281

    17980.580.551

    26460.8120.941

    35331.0911.232

    70652.2932.634

    Sheet2

    File Size (KB)XalanTXPParsingFile Size (KB)1 Output2 Outputs3 Outputs

    4070.840.260.184070.2610.3210.261

    8131.540.490.3118130.440.4410.52

    12192.370.760.53112190.8010.7110.802

    16253.371.030.6216250.9711.0311.001

    20324.211.200.74220321.2621.2521.282

    24384.641.490.98124381.5421.4921.472

    28445.521.701.03128441.7831.6921.752

    32506.081.941.35232501.9221.9431.883

    36576.992.191.40236572.1632.1532.243

    40638.052.411.47340632.4242.4142.454

    487511.042.101.76256883.3853.4955.157

    568813.073.382.09365003.8663.9565.338

    650014.843.862.43473134.2864.4266.209

    731315.184.372.63381254.6664.8474.897

    812516.844.712.874

    File Size (KB)XalanTXPParsing

    4071.110.260.18

    8131.960.4510.311

    12192.850.7510.531

    16254.221.0410.62

    20324.671.1920.742

    24385.741.5020.981

    28446.991.7231.031

    32507.691.9331.352

    36578.842.1731.402

    40639.992.4331.473

    487513.712.9741.762

    568816.003.3942.093

    650018.693.9062.434

    731319.174.3362.633

    812520.4494.6672.874

    File Size (MB)XalanTXP

    443

    897

    202827

    404237

    8014056

    20087082

    280148597

    4002098124

    File Size (MB)XalanTXP

    4161.7

    8251.7

    20531.7

    401101.7

    2001401.7

    4003101.7

    Sheet3

    Xalan

    Txp1

    Txp2

    Sheet3

    0.7160.0650.19

    1.3720.06050.451

    1.9680.040.5805

    2.6090.0450.781

    5.0120.051.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (a) XMLContains (DBLP)

    0.7160.3110.19

    1.3770.58050.451

    2.0180.93650.5805

    2.6091.12650.781

    4.9672.37851.678

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (c) XMLExtract (DBLP)

    0.8410.260.18

    1.5430.4910.311

    2.3730.7610.531

    3.3651.0310.62

    4.2061.2020.742

    4.6371.4920.981

    5.5181.7031.031

    6.0781.9431.352

    6.992.1931.402

    8.0522.4131.473

    11.0362.1031.762

    13.0693.3752.093

    14.8413.8562.434

    15.1824.3662.633

    16.8444.7072.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (b) XMLContains (Random)

    1.1120.260.18

    1.9630.4510.311

    2.8540.7510.531

    4.2161.0410.62

    4.6671.1920.742

    5.7381.5020.981

    6.991.7231.031

    7.6911.9331.352

    8.8432.1731.402

    9.9942.4331.473

    13.712.9741.762

    16.0033.3942.093

    18.6873.9062.434

    19.1684.3362.633

    20.4494.6672.874

    Xalan

    TXP

    Parsing

    File Size (KB)

    Time (s)

    (d) XMLExtract (Random)

    0.270.260.26

    0.4910.4910.511

    0.7110.8310.791

    1.1321.1821.021

    1.7521.7231.753

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (g) Number of Outputs (DBLP Query 1)

    0.3610.390.351

    0.520.5210.511

    0.7820.7010.921

    0.9711.0221.102

    2.0431.8922.403

    1 Output

    2 Outputs

    3 Outputs

    File Size (KB)

    Time (s)

    (h) Number of Outputs (DBLP Query 2)

    43

    97

    2827

    4237

    14056

    87082

    148597

    2098124

    Xalan

    TXP

    File Size(MB)

    Time (s)

    (e) XMLExtract (Review)

    161.7

    251.7

    531.7

    1101.7

    1401.7

    3101.7

    Xalan

    TXP

    File Size (MB)

    Memory (MB)

    (f) Memory Usage (Review)

  • Conclusions and Future WorkTXP efficiently evaluates XPath/XQuery subset over XML streams and pre-parsed XMLLow memory consumptionFast response time when compared to XalanTuple construction mechanism is useful for efficiently evaluating predicates and FLWR expressionsReturns values (copy) or references (XID)Works both over indexed (stored) XML and streamed XML using the same control structureDeliverables for DB2: XMLWrapper, XML Storage, XML Loader/Shredder

  • Other research areasSQL/XMLAutomatic generation of taxonomiesLotus Discovery ServerText indexingIntranet Search

  • Automatic Taxonomy Generation (1/2)Unified model for taxonomyEach node (including intermediate nodes) model features that are common for the tree belowAll features (including stopwords) are modeled in the taxonomyHybrid bottom-up and top-down schemeAlgorithm Start with an initial feasible solution (one level taxonomy)Merge nodes as appropriate (needed) to discover more abstract topicsSplit nodes as appropriate (needed) to find more refined topics

  • Automatic Taxonomy Generation (2/2)