1 approximate xml query answers presenter: hongyu guo authors: n. polyzotis, m. garofalakis, y....

27
1 Approximate XML Query Answers Approximate XML Query Answers Presenter: Hongyu Guo Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis Ioannidis

Upload: myles-pearson

Post on 21-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

11

Approximate XML Query AnswersApproximate XML Query Answers

Presenter: Hongyu GuoPresenter: Hongyu Guo

Authors: N. polyzotis, M. Garofalakis, Y. IoannidisAuthors: N. polyzotis, M. Garofalakis, Y. Ioannidis

Page 2: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

22

Outline of this talkOutline of this talk

MotivationMotivation TreeSketch ApproachTreeSketch Approach Experimental ResultsExperimental Results Contributions and LimitationsContributions and Limitations

Page 3: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

33

OutlineOutline

MotivationMotivation TreeSketch ApproachTreeSketch Approach Experimental ResultsExperimental Results Contributions and LimitationsContributions and Limitations

Page 4: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

44

MotivationsMotivations

XML de-facto standard for data exchangeXML de-facto standard for data exchange Need to explore large XML data sets and get fast Need to explore large XML data sets and get fast

feedback from complex XML queriesfeedback from complex XML queries

Conflict between fast ‘on-line’ response and query Conflict between fast ‘on-line’ response and query execution costexecution cost

--Need fast feedback--Need fast feedback

Page 5: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

55

XML Query ChallengesXML Query Challenges

Involve complex traversals of the XML data hierarchy Involve complex traversals of the XML data hierarchy Complex queries over massive tree-structured data--very Complex queries over massive tree-structured data--very

expensiveexpensive Approaches: Optimize the query or optimize the data Approaches: Optimize the query or optimize the data

structurestructure

No need for accurate results, we can instead return No need for accurate results, we can instead return approximate query answersapproximate query answers

Page 6: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

66

Approximate Query answersApproximate Query answers

Obtain an approximation to the true resultObtain an approximation to the true result Currently employed in relational systems successfullyCurrently employed in relational systems successfully

Use approximate result to get timely feedbackUse approximate result to get timely feedback

XML Data

.

Synopsis

XMLR

XML R’

Query

Page 7: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

77

OutlineOutline

MotivationMotivation TreeSketch ApproachTreeSketch Approach Experimental ResultsExperimental Results Contributions and LimitationsContributions and Limitations

--A technique being used to return fast, approximate results--A technique being used to return fast, approximate results

Page 8: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

88

Data and Query ModelData and Query Model

a: author a: author n: namen: nameb: book b: book p: paperp: paper y: yeary: year k: keywordk: keyword t: titlet: title

--Some background, XML document--Some background, XML document

Page 9: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

99

Data and Query ProcessData and Query Process--Twig Query, Query Tree, and Nested Result Tree--Twig Query, Query Tree, and Nested Result Tree

Page 10: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

1010

Basic Query ScenarioBasic Query Scenario

a

n kk

d0

ApproximateApproximateNesting TreeNesting Tree

True True Nesting TreeNesting Tree

XML Data

Synopsis

Key idea is to return fast, accurate feedbackKey idea is to return fast, accurate feedback

Page 11: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

1111

Approximate Query AnswersApproximate Query Answers

How to construct concise XML synopses, which How to construct concise XML synopses, which capture the statistical traits of the true datacapture the statistical traits of the true data

How to produce approximate query answers How to produce approximate query answers over the synopsis efficientlyover the synopsis efficiently

-- -- Two key problemsTwo key problems

Page 12: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

1212

TreeSketch ConstructionTreeSketch Construction

Step 1: Step 1: Given an XML treeGiven an XML tree T, build a graph synopsis: each node T, build a graph synopsis: each node

represents a set of same tag elements, large treerepresents a set of same tag elements, large tree Step2: Step2:

Compress synopsisCompress synopsis by merging nodes with similar sub- by merging nodes with similar sub-structures (i.e. clustering of the XML elements)structures (i.e. clustering of the XML elements)

Step 3Step 3 Repeat Step 2 until the Repeat Step 2 until the predefined space budgetpredefined space budget constraint is constraint is

metmet Step 4Step 4

Return the TreeSketch SynopsisReturn the TreeSketch Synopsis

PerfectPerfect Space BudgetSpace Budget

--Construction Algorithm--Construction Algorithm

Page 13: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

1313

More DiscussionsMore Discussions

Graph synopsis constructionGraph synopsis construction Use node to represent a set of same tag elementsUse node to represent a set of same tag elements Query can be retrieved with zero-error Query can be retrieved with zero-error The size can become very large-it can easily be in the order of the original The size can become very large-it can easily be in the order of the original

document sizedocument size

TreeSketch synopsis constructionTreeSketch synopsis construction Compress the synopsis by merging nodesCompress the synopsis by merging nodes Bottom-up merging clustering algorithmBottom-up merging clustering algorithm

Key technique to compress Key technique to compress Clustering Clustering Based on structureBased on structure Model accuracy depends on quality of clusteringModel accuracy depends on quality of clustering

Tight clusters Tight clusters Accurate synopsis, but large model Accurate synopsis, but large model Loose clusters Loose clusters Less accuracy, but small model Less accuracy, but small model

--of the construction procedure--of the construction procedure

Page 14: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

1414

Construction ExampleConstruction Example

XML DocumentXML Document (Graph Synopsis)(Graph Synopsis)

P(1)

S(2)

F(2)

C(4)

F(2)

E(2)

R(1)

p1

s2

f5

c11

s3

f6

c12

f4

e8 c9 e10

f7

c13

r

Synopsis node Synopsis node Set of elements of Set of elements of the same tag the same tag

Synopsis edge Synopsis edge Document edge(s)Document edge(s)

--Count same tag elements--Count same tag elements

Page 15: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

1515

Construction ExampleConstruction Example

Calculate the number of Calculate the number of children for each edgechildren for each edge

Count [r, p]: mean Count [r, p]: mean #children in p per element #children in p per element in rin r

1

2 = 2 / 1

1 1

111

P(1)

S(2)

F(2)

C(4)

F(2)

E(2)

R(1)

--Calculate number of children per element--Calculate number of children per element

P(1)

S(2)

F(2)

C(4)

F(2)

E(2)

R(1)

Page 16: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

1616

Merging NodesMerging Nodes

1

2

2

10.5

P(1)

S(2)

C(4)

F(4)F(4)

E(2)

R(1)

--Less space budget--Less space budget

1

2

1 1

111

P(1)

S(2)

F(2)

C(4)

F(2)

E(2)

R(1)

More Concise Synopsis

TreeSkech synopsisTreeSkech synopsis

Page 17: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

1717

Compute Approximate AnswersCompute Approximate Answers--more like the traditional way--more like the traditional way

Travel down the treeTravel down the tree Match a pattern in the structure and return Match a pattern in the structure and return

a sub-treea sub-tree TreeSketch: Fast responseTreeSketch: Fast response

Concise synopsisConcise synopsis Keep statistical informationKeep statistical information

Node: number of same tag elementsNode: number of same tag elements Edge: number of children per elementEdge: number of children per element

Page 18: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

1818

Compute Approximate AnswersCompute Approximate Answers

TreeSketchTreeSketch

q0

q1

q2 q3

//section//section

.//equation.//equation.//caption.//caption

QueryQuery Approximate Nesting TreeApproximate Nesting Tree

RR

EE

1x1=11x1=11x1+1x1=21x1+1x1=2

CC

SS

1x2 = 21x2 = 2 1

2

1 1

111

P(1)

S(2)

F(2)

C(4)

F(2)

E(2)

R(1)

Approximate results with structureApproximate results with structure 1) Take advantage of the concise structure1) Take advantage of the concise structure 2) and the statistical data2) and the statistical data

--Example--Example

Page 19: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

1919

OutlineOutline

MotivationMotivation TreeSketch ApproachTreeSketch Approach Experimental ResultsExperimental Results Contributions and LimitationsContributions and Limitations

Page 20: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

2020

Experimental SetupExperimental Setup

Focus on Focus on the quality of the approximate answers generatedthe quality of the approximate answers generated the efficiency of the construction processthe efficiency of the construction process

Data SetData Set Data Sets: XMark, DBLP, IMDB, SwissProtData Sets: XMark, DBLP, IMDB, SwissProt

Workload: 1000 random twig queriesWorkload: 1000 random twig queries

Page 21: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

2121

Evaluation MethodsEvaluation Methods

Error Error Distance between R’ and R Distance between R’ and R Popular metric: Tree-edit distancePopular metric: Tree-edit distance

Min-cost sequence of operations that transform R’ to RMin-cost sequence of operations that transform R’ to R Argument: not capture the structure similarityArgument: not capture the structure similarity

New Evaluation metrics : ESD (Element Simulation Distance)New Evaluation metrics : ESD (Element Simulation Distance) Calculate the number of children for each edge in the tree to capture Calculate the number of children for each edge in the tree to capture

the complete structure of the treethe complete structure of the tree model how well the structure of two trees match from each othermodel how well the structure of two trees match from each other ““degree” of simulation between two trees degree” of simulation between two trees Average ESD for evaluationAverage ESD for evaluation

Page 22: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

2222

Experimental ResultsExperimental Results--Approximate answers, compared with TwigXsketches--Approximate answers, compared with TwigXsketches

Page 23: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

2323

Experimental ResultsExperimental Results--Relative Errors--Relative Errors

< 5%i.e. 95% accuracy

Page 24: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

2424

OutlineOutline

MotivationMotivation TreeSketch ApproachTreeSketch Approach Experimental ResultsExperimental Results Contributions and LimitationsContributions and Limitations

-Strengths and Weaknesses-Strengths and Weaknesses

Page 25: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

2525

TreeSketch ApproachTreeSketch Approach

Propose an effective XML-summarization mechanismPropose an effective XML-summarization mechanism Captures the complete tree structure of large XML dataCaptures the complete tree structure of large XML data Experimental results: produce fast and accurate approximate query Experimental results: produce fast and accurate approximate query

answersanswers Author claim: The first work to address the timely problem of Author claim: The first work to address the timely problem of

producing approximate tree-structured answers for complex XML producing approximate tree-structured answers for complex XML queriesqueries

Comparison with the related work: 2 optionsComparison with the related work: 2 options Either compute the exact answer to a path query: expensive Either compute the exact answer to a path query: expensive Or use an approach such as twig-XSketch, which does not capture Or use an approach such as twig-XSketch, which does not capture

the complete tree structure of the underlying XML database the complete tree structure of the underlying XML database

-In this paper-In this paper

Page 26: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

2626

LimitationsLimitations

Difficult to optimize some pre-defined parameters, such as the space Difficult to optimize some pre-defined parameters, such as the space budgetbudget

which directly related to the accuracy of the approximate query answerswhich directly related to the accuracy of the approximate query answers too large too large affect the efficiency, too small affect the efficiency, too small quality of the answers; quality of the answers;

depends on the query, data set, and the computing resourcesdepends on the query, data set, and the computing resources

Expecting incremental model construction processExpecting incremental model construction process XML data always increase incrementally, we need to construct the XML data always increase incrementally, we need to construct the

synopsis model incrementallysynopsis model incrementally

More experiments or some real applications are needed to justify the More experiments or some real applications are needed to justify the scalability of this techniquescalability of this technique

-Nice research, Next steps for further investigation-Nice research, Next steps for further investigation

Page 27: 1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis

2727

Thank YouThank You // Merci Merci