1 selectivity estimation for exclusive query translation in deep web data integration fangjiao jiang...
TRANSCRIPT
1
Selectivity Estimation for Exclusive Query Translation
in Deep Web Data Integration
Fangjiao JiangRenmin University of China
Joint work with Weiyi Meng & Xiaofeng Meng
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
2
The previous Web: things are just on the surface
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
3
The current Web: Getting “deeper”
A great deal of information is hidden behind query forms
Deep = not accessible through search engines
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
4
Why is it important?
More than 10 million distinct forms
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
5
Why is it important?
Up to 5,000 billions dynamic result pages
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
6
A Key Component: Query translation
ChallengeLarge-scaleHeterogeneityAutonomy
Integrated query interface
Web database query interfaces
Query translation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
7
The Problem
Selectivity Estimation for Exclusive Query Translation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
8
Example
√
? ?
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
9
Related work & the Challenge
A prominent solution for selectivity estimation —— histograms [Piatetsky+, Poosala+, Ioannidis+]
Categorical attribute
Infinite-value attribute
Another solution —— random sampling [Goodman+, Haas+, Oliken+, Vitter+, Dasgupta+]
Random sampling
ChallengeSelectivity estimation of infinite-value attribute
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
10
Selectivity Estimation for Exclusive Query Translation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
11
Two Observations
There exist different correlations between different attribute pairs
the word frequency of the values on an infinite-value attribute usually has a Zipf-like distribution
Weakest
Strongest
WeakerWeaker
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
12
Selectivity Estimation for Exclusive Query Translation
Attribute Correlation calculation for a domain Selectivity estimation for a Web database
Correlation-based sampling
Word frequency probing
Zipf equation calculation
Selectivity estimation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
13
Selectivity Estimation Challenges
1. Attribute Correlation calculationFind the least correlative attribute
Discover the word rank
2. Zipf equation calculationCalculate the parameters of Zipf equation
Estimate selectivity
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
14
Attribute Correlation Calculation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
15
GoalRandom sample Word Rank
Attribute Correlation calculation
(1)
(2)
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
16
Discussion on Word rank
Word rank should be computed for each attribute
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
17
Zipf Equation Calculation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
18
Zipf equation calculation
Zipf equantion
Frequency
RankRwi Rwx Rwj Rwy
Fwi
Fwj
Fwy
Fwx
Zipf Distribution
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
19
The parameters of Zipf equation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
20
discussion on P, p and E
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
21
Experiments
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
22
Data Sets & Evaluation Method
Data sets
Evaluation method
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
23
Experimental Results
The average precision of selectivity estimations is high.
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
24
Summary
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
25
Contributions
Identify the selectivity estimation problem of infinite-value attribute for exclusive query translation
Propose correlation-base sampling approach to obtain the sample as random as possible
Propose Zipf-based selectivity estimation method
Verify the accuracy of our approach
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009)
26
Thanks(Q&A)