containment and equivalence for an xpath fragment by gerom e mikla dan suciu presented by roy ionas

35
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suci u Presented By Roy Ionas

Post on 20-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Containment and Equivalence for an XPath

FragmentBy

Gerome Mikla

Dan Suciu

Presented By

Roy Ionas

SEMINAR OBJECTIVESSEMINAR OBJECTIVES

• PRSENTING THE PROBLEM OF NON PRSENTING THE PROBLEM OF NON POLYNOMIAL COMPLEXITY FOR CONTAINMENT POLYNOMIAL COMPLEXITY FOR CONTAINMENT AND EQUIVALENCE OF XPath FRAGMENTS.AND EQUIVALENCE OF XPath FRAGMENTS.

• PRESENTING TWO ALGORITHMS THAT PRESENTING TWO ALGORITHMS THAT IMPROVE THE COST OF XPATH CONTAINMENT IMPROVE THE COST OF XPATH CONTAINMENT AND EQUIVALENCE PROBLEM.AND EQUIVALENCE PROBLEM.

• PRESENTING TREE PATTERNS AS AN PRESENTING TREE PATTERNS AS AN EFFECTIVE TOOL FOR PROVING IN XPATH EFFECTIVE TOOL FOR PROVING IN XPATH FRAGMENTS. FRAGMENTS.

SO WHAT IS XPath?SO WHAT IS XPath?

• A simple language for A simple language for navigatingnavigating XML XML documents and selecting a set of nodesdocuments and selecting a set of nodes

• With XPATH we can query XML data , With XPATH we can query XML data , describe key constraints , express describe key constraints , express transformations and reference elements in transformations and reference elements in remote documents.remote documents.

• We can find XPath influence in other XML We can find XPath influence in other XML query languages and features such as XQuery query languages and features such as XQuery , XSLT , XML schema , XLink , XPointer and , XSLT , XML schema , XLink , XPointer and more... more...

DEFINTIONSDEFINTIONS

• Simple XPath fragment.Simple XPath fragment.

• Containment between two XPath fragments.Containment between two XPath fragments.

• Equivalence between two XPath fragments.Equivalence between two XPath fragments.

• Computability definitions.Computability definitions.

• Tree patterns as a proving tool for XPath Tree patterns as a proving tool for XPath fragments.fragments.

Simple XPath fragmentSimple XPath fragment

• An XPath statement.An XPath statement.

• Contains three most important features for Contains three most important features for navigating:navigating:– Child and descendant axis. “//” “/”Child and descendant axis. “//” “/”– Wildcards. “*”Wildcards. “*”– Qualifiers. “[]”Qualifiers. “[]”

• We disregard attributes , conditions...We disregard attributes , conditions...

• We identify and compare nodes only by their label.We identify and compare nodes only by their label.

• We disregard order completely.We disregard order completely.

• Example: a//*[b//d][c]Example: a//*[b//d][c]

Simple XPath fragmentSimple XPath fragment

• Are these all the features we have in Are these all the features we have in XPath???XPath???

• Are these all the features we need for Are these all the features we need for representing navigation in XML representing navigation in XML documents ?documents ?

NO!!!!!

YES!!!!!

At least these are the needed ones for the proof of this article.

ContainmentContainment

• The meaning of Containment between two The meaning of Containment between two XPath’s fragments A and B is that for every XML XPath’s fragments A and B is that for every XML document the result of applying XPath A will be document the result of applying XPath A will be contained in the result of applying XPath B.contained in the result of applying XPath B.

• Result is stated as a Set of nodes and does not Result is stated as a Set of nodes and does not consider order.consider order.

• Can we apply this containment on the entire Can we apply this containment on the entire XML documents world??XML documents world??

• Is there another way to determine containment Is there another way to determine containment between two XPath fragments???between two XPath fragments???

EquivalenceEquivalence

• The meaning of Equivalence between two XPath The meaning of Equivalence between two XPath fragments A and B is that for every XML document fragments A and B is that for every XML document the result of applying XPath A will equal to the result the result of applying XPath A will equal to the result of applying XPath B.of applying XPath B.

• The problem of Equivalence can be reduced to the The problem of Equivalence can be reduced to the problem of Containmentproblem of Containment– Equivalence = containment in both ways between patterns.Equivalence = containment in both ways between patterns.– Containment can be computed with an algorithm that Containment can be computed with an algorithm that

computes equivalence and runs in polynomial time. computes equivalence and runs in polynomial time.

• From now we will mention only the problem of From now we will mention only the problem of containment and the results will be valid as well for containment and the results will be valid as well for equivalence.equivalence.

Computability DefinitionsComputability Definitions

• NP - stands for “Nondeterministic-Polynomial". NP - stands for “Nondeterministic-Polynomial". • P class - A class of mathematical problems for P class - A class of mathematical problems for

which an efficient solution has been found , which which an efficient solution has been found , which is solvable in polynomial time.is solvable in polynomial time.

• NP class - A class of mathematical problems which NP class - A class of mathematical problems which most likely has most likely has Exponential ComplexityExponential Complexity, for which , for which no efficient solution has been found (yet), which is no efficient solution has been found (yet), which is not solvable in polynomial time. not solvable in polynomial time.

• NP hard problem - a problem that can be reduced NP hard problem - a problem that can be reduced from each NP problem ( even worst than NP… ).from each NP problem ( even worst than NP… ).

• NP complete problem – a problem which belongs NP complete problem – a problem which belongs to the NP class of problems and is a NP hard to the NP class of problems and is a NP hard problem by itself.problem by itself.

Tree PatternsTree Patterns

• An unordered tree over the alphabet of the XPath.An unordered tree over the alphabet of the XPath.

• XPath nodes are marked as nodes in the tree XPath nodes are marked as nodes in the tree pattern.pattern.

• Child axis are marked as edges.Child axis are marked as edges.

• Descendant are marked as edges with double Descendant are marked as edges with double lines. lines.

• K-tuple of nodes called the result type.K-tuple of nodes called the result type.

• For a tree pattern P The arity of the result tuple For a tree pattern P The arity of the result tuple is called the of arity of P.is called the of arity of P.

• Pattern tree P is Boolean iff its arity is 0.Pattern tree P is Boolean iff its arity is 0.

Tree PatternsTree Patterns

• Tree patterns are more elegant and Tree patterns are more elegant and general than XPath fragments.general than XPath fragments.

• We can reduce from XPath to Tree We can reduce from XPath to Tree Patterns and via versa quite easily.Patterns and via versa quite easily.

Now we can prove attributes using the graph theory.

Tree Pattern - exampleTree Pattern - example

• For the Xpath expression :For the Xpath expression :– a//*[b//d][c] will be the next treea//*[b//d][c] will be the next tree

*

d

b

root

wildcard

descendant

child

a

c

Usage of Tree Patterns for Usage of Tree Patterns for navigating in XML treesnavigating in XML trees

• Embedding from Tree pattern to XML tree.Embedding from Tree pattern to XML tree.

• Imagine it as a function that must:Imagine it as a function that must:– preserve root.preserve root.– Respects node labels.Respects node labels.– Respects edge relationships.Respects edge relationships.

• After embedding return the information from the After embedding return the information from the nodes marked as return nodes and down.nodes marked as return nodes and down.

• For Boolean Patterns return true if such an For Boolean Patterns return true if such an embedding exists.embedding exists.

Example for embeddingExample for embedding

a

*

d

cb

a

s

t

cb

d

PROBLEM….PROBLEM….

• Testing Containment between two XPath Testing Containment between two XPath fragments is a NP complete problem.fragments is a NP complete problem.

• Can be proven by a reduction from the Can be proven by a reduction from the 3CNF Co-NP class to our class.3CNF Co-NP class to our class.

Do We really care about it???Do We really care about it???

• In almost all the applications we In almost all the applications we described so far.described so far.

• Inference of keys.Inference of keys.

• Optimization of XPath queries.Optimization of XPath queries.

When do we need to test for containment or equivalence between fragments?

Solving the problemSolving the problem

• Finding an algorithm that will be both Finding an algorithm that will be both efficient and complete for this problem efficient and complete for this problem is quite difficult ( like proving P = NP ). is quite difficult ( like proving P = NP ).

• Finding an algorithm which is efficient Finding an algorithm which is efficient but not complete.but not complete.

• Finding an algorithm that is complete Finding an algorithm that is complete but not always efficient.but not always efficient.

First solution : Pattern First solution : Pattern homomorphismhomomorphism

Pattern Homomorphisms - Pattern Homomorphisms - definitiondefinition

• An homomorphism h between two tree patterns p,p’ is a An homomorphism h between two tree patterns p,p’ is a function h:Nodes(p) -> Nodes(p’) that maintains the function h:Nodes(p) -> Nodes(p’) that maintains the following conditions:following conditions:– Root preserving.Root preserving.– For each x in p h(x) in p’ is x or *.For each x in p h(x) in p’ is x or *.– Child and descendant relations preserving.Child and descendant relations preserving.

• Finding weather a homomorphism between two Finding weather a homomorphism between two patterns exist has many efficient algorithms.patterns exist has many efficient algorithms.

• The algorithm is sound. Whenever there exists The algorithm is sound. Whenever there exists homomorphism between tree patterns p and p’ than p homomorphism between tree patterns p and p’ than p p . p .

• The existence of homomorphism is always a The existence of homomorphism is always a sufficient condition for containment.sufficient condition for containment.

• But is it a necessary condition?But is it a necessary condition?

Example for Example for homomorphismhomomorphism

a

b

a

c

*

h(a) = a

h(b) = *

Homomorphism is not a Homomorphism is not a complete solution for complete solution for

containmentcontainment

• A Homomorphism between the two tree patterns A Homomorphism between the two tree patterns does not exist even though they are equivalent.does not exist even though they are equivalent.

a

b

*

a

b

*

Cases where homomorphism Cases where homomorphism appliesapplies

• Fragments contain only *,[]Fragments contain only *,[]

• Fragments contain only //,[]Fragments contain only //,[]

• Fragments that contain all three Fragments that contain all three but can be translated to an but can be translated to an expression that belongs to one of expression that belongs to one of the above without changing the the above without changing the semantic. semantic.

Conclusion for Conclusion for homomorphismhomomorphism

• Sound.Sound.

• Efficient.Efficient.

• Incomplete.Incomplete.

Now we aim searching over an algorithm which will be sound and complete and

may be efficient in several cases.

ALGORITHM FOR ALGORITHM FOR CONTAINMENTCONTAINMENT

Containment between regular Containment between regular languageslanguages

• Reducing the problem of containment Reducing the problem of containment between two XPath fragments to between two XPath fragments to containment between two regular containment between two regular languages by translating from Tree languages by translating from Tree Pattern to an automata.Pattern to an automata.

• The algorithm is complete , with The algorithm is complete , with defined rules we can translate defined rules we can translate completely from automata to Tree completely from automata to Tree Pattern and via versa.Pattern and via versa.

Automata for XPath fragmentAutomata for XPath fragment

• Defined on ranked trees.Defined on ranked trees.

• Bottom up structure.Bottom up structure.

• Only the root is an accepting state.Only the root is an accepting state.

• The initial states are the leaves of the The initial states are the leaves of the tree.tree.

• The transitions are of the form:The transitions are of the form:(q1,q2,…,qn;a) -> q(q1,q2,…,qn;a) -> q

definitionsdefinitions

• FTA - finite tree automata, an automata that FTA - finite tree automata, an automata that contains set of states and transitions of the form contains set of states and transitions of the form described.described.

• FTA can be deterministic - DFTA.FTA can be deterministic - DFTA.• Each FTA A with Q states can be translated to a Each FTA A with Q states can be translated to a

DFTA B with maximum of DFTA B with maximum of QQ states . states .• AFTA - alternating finite tree automaton extends AFTA - alternating finite tree automaton extends

the definition of FTA by adding “AND transitions” the definition of FTA by adding “AND transitions” of the form of the form (q1,q2,…,qm)->qi.(q1,q2,…,qm)->qi.

• A DFTA can be built as well for AFTA without A DFTA can be built as well for AFTA without increasing the cost of determinisiting the increasing the cost of determinisiting the automata. automata.

The entire algorithmThe entire algorithm

• Construct the DFTA A accepting the Construct the DFTA A accepting the “regular expressions of P”“regular expressions of P”

• Construct the AFTA A’ accepting the Construct the AFTA A’ accepting the regular expressions of P’ ”regular expressions of P’ ”

• Compute the AFTA B=A x A’Compute the AFTA B=A x A’

• compute the DFTA C=Det(B)compute the DFTA C=Det(B)

• if lang(A) if lang(A) lang(C) the return true else lang(C) the return true else return false.return false.

r

a

* b

ab

b

r

a

b *

?

Step 1:Building FTA A from Tree Step 1:Building FTA A from Tree pattern ppattern p• States(A) = Nodes(p).States(A) = Nodes(p).

• For each node x with children x1,…,xk For each node x with children x1,…,xk we add a transition (x1,x2,…;x) -> xwe add a transition (x1,x2,…;x) -> x

• For each descendant edge e from node For each descendant edge e from node x to node y we add (y;e)->x.x to node y we add (y;e)->x.

we add internal circle (y,*) -> y we add internal circle (y,*) -> y

• The terminal state will be only the root.The terminal state will be only the root.

Example for building FTAExample for building FTA

r

a

* b

ab

b

r

a

b*

ab

b

Step 2:Building an AFTA A’ Step 2:Building an AFTA A’ from pattern p’from pattern p’

• States(A’) = Nodes(p’) States(A’) = Nodes(p’) Edges(p’) Edges(p’)

• (q,a) -> for every symbol a that has (q,a) -> for every symbol a that has out coming edge e. if it is a out coming edge e. if it is a descendant relationship than we also descendant relationship than we also add an internal circle to the source add an internal circle to the source node.node.

(e1,e2,e3..) -> a for every a that (e1,e2,e3..) -> a for every a that has incoming edges.has incoming edges.

Example for building AFTA for Example for building AFTA for pattern p’pattern p’

r

a

b *

b *

a

r

Conclusion for the containment Conclusion for the containment algorithmalgorithm

• SoundSound

• Complete.Complete.

• Not always efficient.Not always efficient.