1 selectivity estimation for exclusive query translation in deep web data integration fangjiao jiang...

26
1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng & Xiaofeng Meng

Upload: berenice-carpenter

Post on 17-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

1

Selectivity Estimation for Exclusive Query Translation

in Deep Web Data Integration

Fangjiao JiangRenmin University of China

Joint work with Weiyi Meng & Xiaofeng Meng

Page 2: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

2

The previous Web: things are just on the surface

Page 3: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

3

The current Web: Getting “deeper”

A great deal of information is hidden behind query forms

Deep = not accessible through search engines

Page 4: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

4

Why is it important?

More than 10 million distinct forms

Page 5: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

5

Why is it important?

Up to 5,000 billions dynamic result pages

Page 6: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

6

A Key Component: Query translation

ChallengeLarge-scaleHeterogeneityAutonomy

Integrated query interface

Web database query interfaces

Query translation

Page 7: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

7

The Problem

Selectivity Estimation for Exclusive Query Translation

Page 8: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

8

Example

? ?

Page 9: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

9

Related work & the Challenge

A prominent solution for selectivity estimation —— histograms [Piatetsky+, Poosala+, Ioannidis+]

Categorical attribute

Infinite-value attribute

Another solution —— random sampling [Goodman+, Haas+, Oliken+, Vitter+, Dasgupta+]

Random sampling

ChallengeSelectivity estimation of infinite-value attribute

Page 10: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

10

Selectivity Estimation for Exclusive Query Translation

Page 11: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

11

Two Observations

There exist different correlations between different attribute pairs

the word frequency of the values on an infinite-value attribute usually has a Zipf-like distribution

Weakest

Strongest

WeakerWeaker

Page 12: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

12

Selectivity Estimation for Exclusive Query Translation

Attribute Correlation calculation for a domain Selectivity estimation for a Web database

Correlation-based sampling

Word frequency probing

Zipf equation calculation

Selectivity estimation

Page 13: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

13

Selectivity Estimation Challenges

1. Attribute Correlation calculationFind the least correlative attribute

Discover the word rank

2. Zipf equation calculationCalculate the parameters of Zipf equation

Estimate selectivity

Page 14: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

14

Attribute Correlation Calculation

Page 15: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

15

GoalRandom sample Word Rank

Attribute Correlation calculation

(1)

(2)

Page 16: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

16

Discussion on Word rank

Word rank should be computed for each attribute

Page 17: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

17

Zipf Equation Calculation

Page 18: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

18

Zipf equation calculation

Zipf equantion

Frequency

RankRwi Rwx Rwj Rwy

Fwi

Fwj

Fwy

Fwx

Zipf Distribution

Page 19: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

19

The parameters of Zipf equation

Page 20: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

20

discussion on P, p and E

Page 21: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

21

Experiments

Page 22: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

22

Data Sets & Evaluation Method

Data sets

Evaluation method

Page 23: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

23

Experimental Results

The average precision of selectivity estimations is high.

Page 24: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

24

Summary

Page 25: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

25

Contributions

Identify the selectivity estimation problem of infinite-value attribute for exclusive query translation

Propose correlation-base sampling approach to obtain the sample as random as possible

Propose Zipf-based selectivity estimation method

Verify the accuracy of our approach

Page 26: 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)

26

Thanks(Q&A)