25 nov 2001sdbi 20011 sigalit sima sigalit batiashvili kdoshim representing & querying xml with...

91
25 nov 2001 SDBI 2001 1 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor Vianu

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

25 nov 2001SDBI 20011

Sigalit Sima Sigalit

Batiashvili Kdoshim

Representing & Querying XML With Incomplete Information

Serge Abiteboul

Luc Segoufin

Victor Vianu

Page 2: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

2

GoalsMotivation- Data on the WEB- Incompleteness problem

Representation System

Refine Algorithm

Querying The Incomplete Information- CWA approach- Answer using sub queries (CWA+OWA)

Page 3: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

3

Introduction & Motivation

Page 4: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

4

Data On The WEB

• Partial Information is known- expiration of data- unavailable sites- modification of data, etc.

• Irregular structure - self describing

Semistructured Data

Page 5: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

5

Data On The Web ProblemData no longer fits into tables (no rigid structure).

We Want..Apply database-like functionality to access data on the WEB.

Focus: XML-ized portion of the WEB

Page 6: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

6

XML

• eXtended Markup Language• The Lingua franca of the WEB• Facilitate the use of database

techniques to manage WEB data• Brings order

- nested tags (similar to record structure)- ordered sub-elements- structure (DTD, XML-Schema)

DTD

(Document Type Definition)

Define constrains on the XML Document Structure

Page 7: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

7

XML Example

<person> <name> Jhon Smith </name>

<addr> Green Field Park N.Y. </addr>

<email> [email protected] </email>

</person>Person

nameaddremail

Jhon SmithGreen..Jhon.smith@..

Page 8: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

8

View XML As Trees

<person> <name> John

Smith</name> <addr> Green Field Park N.Y. </addr> <email> john.smith@

infineon.com </email></person>

person

name addr email

John Green Field john.smith@

Smith Park N.Y. person name addr email*

DTD

Page 9: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

9

Webhouse

Webhouse- A collection of website sources- context: XML- Hold a DTD that describes the

sources structure

WarehouseA collection of information from many sources

Page 10: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

10

Webhouse Maintaince

The Webhouse continuously enriched by web sites exploration

webhouse

Technique: Crawling the web.

Page 11: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

11

Webhouse

• Dynamic nature of WEB data• Limited storage capacity• Expiration of data• Modification of data• Etc.

Why?

Information held in the webhouse is never complete.

Page 12: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

12

The Problem

- Missing documents satisfying the query in the webhouse- Missing the relevant data in the document

Posing a query against the webhouse may yield an incomplete answer

Page 13: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

13

Solution Two main approaches•Closed World Assumption (CWA) If some information does not appear explicitly it does not hold.

- possible method: Best Effort

- possible method: Fetch Data

•Open World Assumption (OWA)Anything not ruled out is possible

Page 15: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

15

Fetch Data

• Defining the missing portion of the data using the available information

• Thus, determining the additional exploration of WEB sources.

How?

We use the Fetch Data approach We would like toBe able to define what additional resource we are looking for.

Page 16: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

16

Example

Given the DTDCatalog product+Product name price cat picture*Cat subcat

catalog

product

name price<200

cat=elec

subcat

Query1

Query 1Find the name, price & subcategories of electronics products with price < $200

Page 17: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

17

Answer to query 1catalog

product productproduct

Canon 120 elec

camera

Nikton 199 elec

camera

Sony 175 elec

cdplayer

Page 18: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

18

Given the DTDCatalog product+Product name price cat picture*

Query 2catalog

product

name cat=elec

subcat=camera

picture

Query 2

Finds the name & pictures of all cameras with picture

Cat subcat

Page 19: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

19

Answer Strategy

We Already Have..

ElecPrice < $200Camera (with)

picture

Camera (with)Picture

Price < $200

Query 1

Query 2

Page 20: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

20

Answer Strategy (cont)

We Need .. ElecPrice < $200

Camera (with)picture

Camera (with)Picture

Price < $200

Query 1

Query 2

Camera (with)Picture

Price >= $200

•No need to query the Web for the whole query•Define the missing information•Reducing the search space

Page 21: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

21

Representation System

Page 22: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

22

Framework

• Define the data model - for the webhouse repository (XML data)

• Define constraint model- simplified DTD

• Define query language• Define the representation system for

the incomplete information

Page 23: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

23

Data Modelcatalog

productproduct

name

pricecat

subcat

name

pricecat

subcat

labeling function

N – set of nodes

=Canon =120

=elec

=camera

=Nikton =199=elec

=camera

value mapping

: N

= nodes labels

v: Q N

Q = data values

Tdata=<t, , v>

Page 24: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

24

Data Tree Prefix

Tdata=<t, , v>

<t’, ’, v’>

Page 25: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

25

Tree Typeroot (catalog)

a

ana2a1 …

DTD as Regular expression

(a) = a a1 a2 … an

w2w1 wn

DTD as Regular expression

(a) = a a1 a2 … an

Wi=1 exactly one child labeled ai

Wi=? at most one child labeled ai

Wi=+ at least one child labeled ai

Wi=* 0 or more child labeled ai

w1 w2 wn

Ttype=<, r, >

: element namesr: root label

Page 26: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

26

Tree Type Satisfaction

catalog

product

name price cat picture

subcat

+

1 1 1

1

*

Ttype

product

nikon cat=elec

subcat=camera

c.jpeg199

catalogTdata

satisfies

rep(Ttype) = {Tdata: t Ttype}

Page 27: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

27

Prefix-Selection Query

• We defined the structure of webhouse data using Tree Types.

• It is natural to define a pattern based query (tree format).

• The matching will thus be done by browsing the input tree.

• Such a query is called a PS query.

Page 28: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

28

PS-Query Example

catalog

product

name price cat picture

subcat

+

1 1 1

1

*

Ttype

catalog

product

name pricecat

subcat

Tquery

data prefixesconstraints

=elec<200

Tquery=q=<t, , cond>

t: rooted tree: labeling functioncond: constraints (,,,,)

Page 29: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

29

PS-Query Answer

• Denoted q(t’) where t’ is the data tree.

• Consists of a prefix of this tree matching the corresponding query tree nodes.

Page 30: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

30

Answer Examplecatalog

productproduct

Canon 120

elec

camera

Nikton 199 elec

camera

catalog

product

name pricecat

subcat

=elec<200

Note:q1(t’), q2(t’) share tree data prefixes

(root and maybe more)

Page 31: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

31

Incomplete Information

Data available

Prefixes of tree data enriched by previous queries

Missing portion

Simply define the missing information

Using the initial Ttype and queries

Missing portion

Simply define the missing information

Using the initial Ttype and queries

Incomplete

Tree

Page 32: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

32

Conditional Tree Type

1 A Tree Type with a condition functionon the tree nodes.

Provides extensions to Tree Types

dealer

UsedcarsNewcars

ad

ad

model modelyear

* *

Corresponding DTD

dealer UsedCars |NewCars

UsedCars ad*

NewCars ad*

ad model year | model

2 Allow context dependent structure definition.

HOW?

Page 33: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

33

Specialization

dealer

Usedcars

Newcars

adUsed

adNew

model model

year

* *

dealer

Usedcars

Newcars

adad

model model

year

* * = {dealer, UsedCars, NewCars, ad, model, year} = {dealer, UsedCars, NewCars, ad, model, year}

’= {adNew, adUsed}

: ’

Page 34: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

34

Specialization

dealer

Usedcars

Newcars

adUsed

adNew

model model

year

* *

dealer

Usedcars

Newcars

adad

model model

year

* *

(adNew) = (adUsed) = ad

: ’

CTtype=<,cond,>

Page 35: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

35

Incomplete Tree

catalog

product

name price<200 cat=elec

subcat

Query 1: Find the name, price & subcategories of electronics products with price < $200

A tree representing the incomplete information.

Page 36: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

36

Incomplete Tree (cont)catalog

product productproduct

Canon 120

elec

camera

Nikton 199

elec

camera

Sony 175

elec

cdplayer

Query 1: Find the name, price & subcategories of electronics products with price < $200

What Is Missing?

?

Page 37: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

37

What Is Missing?

product1

name cat!=elec

subcat

pictureprice

*

All products with category differ than electricity

product2

name cat=elec

subcat

pictureprice>200

*

All products with price > 200

Page 38: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

38

catalog

product productproduct

Canon

120

elec

camera

Nikton 199 elec

camera

Sony175

elec

cdplayer

product2

name cat=elec

subcat

pictureprice>200

*

*

product1

name cat!=elec

subcat

pictureprice

*

*Incomplete Tree T

Available Information

Prefix of a full data tree (Tdata)

Missing Information

Conditional tree type (CTtype)

Page 39: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

39 Query 2: Find the name & pictures of all cameras with picture

product

Olympus elec

camera

o.jpg

c.jpg

3 3 2a

catalog

product productproduct

Canon 120

elec

camera

Nikton 199

elec

camera

Sony 175

elec

cdplayer

**

product1

product2 What Is

Missing??

Page 40: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

40

What Is Missing?

product1

name cat!=elec

subcat

pictureprice

*

All products with category differ than electricity

Page 41: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

41

product2b

name cat=elec

subcatcamera

pictureprice200

*

All products with price 200 & subcategory

is not camera

What Is Missing?

product1

namecat!=elec

subcat

pictureprice

*

All products with category

differ than

electricity

Page 42: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

42

product2b

namecat=elec

subcatcamera

picture

price200

*

What Is Missing?product1

namecat!=elec

subcat

pictureprice

*

product2c

name cat=elec

subcat=camera

price200

All products with price 200 & no picture

Page 43: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

43

Incomplete Tree Definition

A Tree T which consists of the following• A data tree Tdata =<t, , v>

– Represents the known data

– Use labels from • A conditional tree type , CTtype

– Represent the missing portion of the data

– Use specialized alphabet ’

• A data labeling mapping ’ from Tdata nodes to element in ’.– E.g. ’(nN | (n)=product) = {product,product3…}

Page 44: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

44

Rep(T) Definition

• Rep(T) is the set of trees represented by an incomplete tree T.

• Tdata Rep(T)

A possible completion on the prefix of the available data tree given by T.

Page 45: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

45

Rep(T) Definition (cont)

Rep(T)

student

name

addrid

Ttype

Given a Ttype

student

name=shlomo

q

student

shlomo

addrid

Tdata

Page 46: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

46

Acquiring Incomplete Information

• Refine Algorithm

Page 47: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

47

Acquiring Incomplete Info.

• How this is done via WEB?- simply using answers to queries

• We now show how this can be done against the representation system

AssumptionThe input tree is a single document described by a tree type. We can merge few documents to a single one.

Page 48: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

48

Refine Motivation

• Each query posed against the webhouse defines additional constraints

• Answers to these queries help us refine the partial information.

• We describe this partial information using incomplete tree.

• As we acquire the webhouse for more information we want to be able to define the current incomplete information

Page 49: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

49

Refine Motivation (cont)

product2

name cat=elec

subcat

pictureprice200

*

Missing

All products with price 200

Page 50: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

50

Refine Motivation (cont)product2

name cat=elec

subcat

pictureprice200

*

Strong constrain

t

product2b

namecat=elec

subcatcamera

picture

price200

*

product2refinemen

t

product2c

name cat=elec

subcat=camera

price200

no picture

Page 51: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

51

Refine Algorithm

• Refine the incomplete information

Input

T: incomplete tree q: PS-query A: = q(T) answer to q

Output

T’: incomplete tree compatible with the answer A to q

Page 52: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

52

Refine Algorithmwebhouse q

A=q(T)

q-1(A)The set of trees

compatible with the answer to q

But we only need trees

that match the so far

incomplete treeq-

1(A)

Rep(T) q-1(A)Rep(T’) =Rep(T) q-

1(A)

Page 53: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

53

Refine Output

Defines a new incomplete tree T’

In order to do so we need to define1. CTtype to represent the missing

portion2. Tdata to represent the available data

Step 1

Page 54: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

54

Refine Algorithm – step 1

1. Compute the conditional tree type of the negation of q.

I.e. Conditional tree for trees which return an empty answer to q.

Page 55: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

55

Refine – step 1

1. Compute the conditional tree type of the negation of q.

tq

aa

ta

ta

ta

ta

a

an

a1 a2

cond’(ta

) =true

ta

a

an

a1 a2

ta

a

an

a1 a2

^

cond’(ta)

=¬condq(a)

cond’(ta)

=condq(a)

^

Define ’The labels for the new types will be

defined as specialization of label ‘a‘I.e. (a) = ’(ta)= ’(ta)= ’(ta)^

Page 56: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

56

Refine – step 1 (cont)

We defined CT’ cond’ mappingWe defined the specialization

mapping ’ root’ has type (tr tr)

r: the root of query tree

^

ta t* … t* ,accept everything

ta t* … t* ,accept everything below a because there the

condition of q is not satisfied

ta i t* … t* t* … t* ,one of the children must not

satisfy a condition of q

a1 an

a1

a1

ai ai

an

an

^ ^

root’ type (tr tr)Lets define rules..

Page 57: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

57

Refine - step 1 Example

product

cat=elec

subcat=camera

picture

tq

tr

ta1

t b

ta2

product

cat=elec

subcatcamera

picture

product

cat=elec

subcat=camera

no pictureTo provide a simple way to view the

disjunction as defined by

product

catelec

subcat

picture

t =tq negationq-1

Negation computation complexity

O(|q|*||)

q the tree query size max number of children

Page 58: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

58

Refine – step 1 Example

CT’product2

namecat=elec

subcat

picture

price200

*

product1

namecat!=elec

subcat

pictureprice

*

CT

product

cat=elec

subcatcamera

picture

product

cat=elec

subcat=camera

no picture

product

catelec

subcat

picture t q-1

Note

This intersection yields exactly the missing types product1,

product2b and product2c

We next show it..

Page 59: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

59

Refine – step 1 Example

CT’

product1

namecat!=elec

subcat

pictureprice

*product2

namecat=elec

subcat

picture

price200

*

product1

namecat!=elec

subcat

pictureprice

*

product

catelec

subcat

picture

Page 60: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

60

Refine – step 1 Example

CT’

product2b

namecat=elec

subcatcamera

pictureprice 200

*product2

namecat=elec

subcat

picture

price200

*

product1

namecat!=elec

subcat

pictureprice

*

product

cat=elec

subcatcamera

picture

Page 61: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

61

Refine – step 1 Example

CT’

product2c

namecat=elec

subcatcamera

price 200

product2

namecat=elec

subcat

picture

price200

*

product1

namecat!=elec

subcat

pictureprice

*

product

cat=elec

subcat=camera

no pictureno picture

Page 62: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

62

Node ids Assumption• Persistent node ids

Distinct queries against an XML document return nodes with the same id iff the nodes are identical.

product

canonelec

camera

120

&231product

canonelec

camera

c.jpg120

*

&231product

canonelec

camera

c.jpg120

&231

=

Page 63: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

63

Node ids Assumption (cont)

• Make it possible to enrich the information about a given node through consecutive queries

• Otherwise, the size of representation system will be too large to handle.

- the representation system will need to be extended in order to keep track of the various possible ways of matching nodes returned by different queries

• A crucial assumption

Page 64: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

64

Refine Output

Defines a new incomplete tree T’

In order to do so we need to define1. CTtype to represent the missing

portion2. Tdata to represent the available data

Step 2

Page 65: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

65

Refine – step 2

T’data is the join between Tdata

and A

Nodes in both A and Tdata

Compute the intersection.

E.g. product

Nodes in Tdata

But not in ANode type is

Specialized using the CT’ we just computed.

E.g. product3

Nodes in A But not in Tdata

Refinement ofexisting type

E.g. product2a

To compute..

Page 66: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

66

Drawback –The Blowup Problem

root

a b

Given a tree type

root

a=i b=i

n queriesqi

(1 i n)

with emptyanswers

Lets follow CT construction

Where CT belongs to the incomplete tree based on queries q1… qi

qi

qi

Page 67: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

67

The Blowup Problem

root

a=1 b=1

Query q1 Incomplete tree Tq1

Tdata

is empty

q1

root

a1 b

root

a b 1

CTq1

Page 68: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

68

The Blowup Problem

root

a=2 b=2

Query q2 CTq2

1. Compute the q2-1 negation of

q2

root

a2 b

root

a b 2

Page 69: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

69

The Blowup Problem

root

a=2 b=2

Query q2 CTq2

2 .Compute the intersection

q2-1 CT

q1

Page 70: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

70

The Blowup Problem2. Compute the intersection q2

-1 CTq1

root

a1, a2 b

root

a b 1, b 2

root

a2 b 1

root

a1 b 2

root

a2 b

root

a b 2

root

a1 b

root

a b 1

CTq1 q2

-1

Continuing the computation yields:

|CT | = 4*2 = 23 = 8

|CT | = 2n

Refine algorithm yields a disjunction of 2n

new types

q3

qn

Continuing the computation yields:

|CT | = 4*2 = 23 = 8

|CT | = 2n

Refine algorithm yields a disjunction of 2n

multiplicity statements

Exponential blowup of representation system

q3

qn

Page 71: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

71

Avoiding The BlowupWe consider two ways of avoiding the exponential blowup of incomplete trees:

Provide Extension to the incomplete tree. conjunctive incomplete trees

Put some restrictions on the tree type and the queries.

Page 72: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

72

Conjunctive Incomplete Tree

root

a1 b

root

a b 1

Types defined only

as disjunction

I.e.

root a1 b root a b1

Define Type as conjunction of disjunctions

root (a1b ab1)

root (a1b ab1)… (anb abn)

• ai and bi are specialization of a and b, respectively

• cond(ai) = (i), 1in

• cond(bi) = (i), 1in

Page 73: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

73

Conjunctive Incomplete Tree

With conjunctionThe incomplete information can be represented using only n conjunctions of disjunctions.

Without conjunction Algorithm Refine yields a disjunction of 2n multiplicity statements.

Page 74: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

74

Heuristics

To deal with the case when the incomplete tree is already too large to be practical

• Shrink the incomplete tree by asking critical additional queries that help to eliminate the missing portion.

• Loose some information: allows a trade of accuracy against size of incomplete tree.

Page 75: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

75

Acquiring Partial Information Summary

• Webhouse is acquired using answers to queries

• Each answer refines our partial information

• Partial information is described using incomplete trees

• We compute the new incomplete tree at each stage using Refine algorithm

Page 76: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

76

Querying Incomplete Trees

Page 77: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

77

Answering QueriesRemember .. The known data is of the format

product

name cat

subcat

pictureprice

*

product2a

name

cat=elec

subcat=camera

pictureprice200

product3

name cat=elec

subcatcamera

price

cameras with price200

elec products (not cameras) with price200

Page 78: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

78

Answering QueriesGiven query 3:

Find the name, price & pictures of all cameras with price < $100 and have at least one picture.

product

name cat=elec

subcat=camera

pictureprice<100

+

We can provide a complete answer to query 3 using the available information.

Page 79: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

79

Given query 4:

Find all cameras

product

name cat=elec

subcat=camera

pictureprice

*

No complete answer is available from the known information. We can do the following:

Answering Queries

1. Provide the complete list of cameras with price < 200

2. Provide the complete list of cameras with a picture

3. Tell the user there may be more cameras

(that are expensive and have no pictures)

Page 80: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

80

Answering Queries• Provides an incomplete answer to the query given the knowledge available

• No data source access for further information

Next..

Mediator Approach: Provide a complete answer but seek the webhouse only for the missing information. The incomplete tree is used as a guide to the mediator.

Page 81: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

81

Mediator ApproachAdditional queries may have to be generated against the input document to obtain the information needed to fully answer the query.

product

name cat=elec

subcat=camera

pictureprice

200

0

Seek the web only for cameras with price200 with no picture

Page 82: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

82

Mediator Approach (cont)Assumption: The generated queries are local.

Local QueriesQueries that explore the input document starting from the nodes already available.

T Incomplete

Tree

root

Tdata

Data Treen

root

q PS-query

Page 83: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

83

Local Query T

Incomplete Tree

root

Tdata

Data Tree n

root

q PS-query

Local ps-query: p@n

p: ps-query n: node in Tdata

root

n

Page 84: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

84

Local QueryL: { p @n | p a local query }

n1

nk

……

root

n1

nk

Tdata Data Tree

p@n1 p@nk

L completes T if q(T)=q(T’) .

We want the set of queries to collect the additional

information to fully answer a given ps-query.

T’

T’ is obtained by extending each node n of Tdata for which

p@n L with p@n(T) Trep(T)

Page 85: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

85

Local QueryUsing local queries help us avoid doing the work already done by previous queries.

We want the set of queries L to be non redundant

1. No nodes exist in T returns by query in L

2. No new nodes are returned by distinct queries of L.

3. Queries in L should always return non empty answer.

Page 86: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

86

Mediator Approach Conclusion

Mediator approach defines combination of the CWA and OWA semantic.

CWA – describe the missing information. I.e. some facts are not knownOWA – some data still ignored may exist.

Page 87: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

87

Assumptions

Page 88: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

88

Order1. Origin XML documents define order on

elements.

Moving to tree representation lose the original ordering.

Assumption

No order is required in our representation system

2. The source DTD may describe the order of children at each node type.

3. Queries may use ordering in their selection patterns.

Page 89: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

89

BranchingAssumption

A PS query tree patterns allow just one child with a given label.

root

product product

camera cdplayer

Branching

Allows multiple children with the same label

Page 90: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

90

Branching

Tdata

root

a1 ana2…

root

a aa …

q: branching ps-query

b=1

…b=2 b=n

q(T) requires the description of n! possibilities of assigning the n values of

b to a1… an

Page 91: 25 nov 2001SDBI 20011 Sigalit Sima Sigalit Batiashvili Kdoshim Representing & Querying XML With Incomplete Information Serge Abiteboul Luc Segoufin Victor

91

References

• Representing and Querying XML with Incomplete Information. Serge Abiteboul, Luc Segoufin, Victor Vianu.

• Incomplete Information and XML Presentation.http://www-rocq.inria.fr/~abiteboul

• A WEB Odyssey: from Codd to XML. Victor Vianu.

• Incomplete Information in Relational Database Tomasz Imielinski and Jr. Lipski Witold.