multiple-site distributed spatial query optimization using spatial semijoins

24
Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins Wendy Osborn and Saad Zaamout

Upload: carney

Post on 26-Feb-2016

51 views

Category:

Documents


2 download

DESCRIPTION

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins. Wendy Osborn and Saad Zaamout. Outline. Introduction Related Work Algorithm Performance Evaluation Conclusion and Future Work. Spatial Data. Canadian Cow Country. *borrowed from www.mapquest.ca. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins

Wendy Osborn and Saad Zaamout

Page 2: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Outline Introduction Related Work Algorithm Performance Evaluation Conclusion and Future Work

Page 3: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Canadian Cow Country.....

Spatial Data

*borrowed from www.mapquest.ca

Page 4: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Distributed Database

*borrowed from docs.google.com

Calgary Montreal

Toronto

Page 5: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Research Problem Efficient processing of a distributed spatial

query Cost considerations:

data transmission CPU I/O

Page 6: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Related Work Spatial join

Kang et al. (2002) Spatial semijoins

Tan, Ooi, Abel (1995, 2000) Karam and Petry (2006)

Limitations Two-site distributed spatial queries

Page 7: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

The Algorithm - Assumptions Each site has one participating spatial relation Each spatial relation has one spatial attribute All MBRs in a relation are unique

relation cardinality = number of MBRs in relation Each spatial relation is indexed by an R-tree

Page 8: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Spatial Semijoin Implementation1. “Project” spatial attribute from relation R

1. obtain (MBR,ID) pairs from leaf node of R-tree2. Transmit spatial attribute to relation S3. Perform semijoin

RSA S 4. Transmit identifiers from RSA whose MBR

qualifies in the query back to relation R

Page 9: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Algorithm - Example

100

200

600

800

QS

R1

R2 R

3

R4

Page 10: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Algorithm - Overview1. Sort and group by spatial attribute

cardinality2. Transmit spatial attributes3. Execute spatial semijoins4. Transmit qualifying tuples to query site

Page 11: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Algorithm – Stage 1 All sites (i.e. relations) are sorted in ascending

order of spatial attribute cardinality Divided into two groups

P – the first n/2 sites Q – the remaining n/2 sites

Page 12: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Algorithm - Stage 2 Transmit spatial attribute from sites in P to

sites in Q in the following manner: Spatial attribute with smallest cardinality in P sent

to site with smallest cardinality in Q Spatial attribute with next smallest cardinality in P

sent to site with next smallest cardinality in Q and so on…

Page 13: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Algorithm Example

R1R3

R4R2

P Q

SA

SA = MBR + ID

Page 14: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Algorithm – Stage 3 Spatial semijoin performed between spatial

attribute and relation at each site in Q Result:

set of tuples from relation that qualify in the semijoin

set of identifiers from spatial attribute whose MBRs qualify in the semijoin

Identifiers shipped back to originating site in P

Page 15: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Algorithm Example

R1R3

R4R2

P Q

ID

Page 16: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Algorithm – Stage 4

QS

R1

R2 R

3

R4

QT

QT

QT

QT

Page 17: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Performance Evaluation comparison vs. naïve approach six-site distributed spatial query

100, 200, 400, 600, 800, 1000 tuples each tuple has the following structure:

MBR, identifier, region name, population, line slope indicator

Page 18: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Cost Calculations Data sizes:

Character – 1 byte Integer – 2 bytes long integer and double float – 8 bytes

Cost of transmitting an identifier cost(ID) = sizeof(int)

Cost of transmitting a spatial attribute value (MBR) cost(MBR) = 4 * sizeof(double) + sizeof(int)

Cost of transmitting a tuple cost(MBR) + 20 * sizeof(char) * sizeof(long int) +

sizeof(int)

Page 19: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Cost Calculations Cost of performing a semijoin and transmitting

tuples to query site:

cost(X, Y, Z) = number_of_tuples(Y) * cost(MBR)+ number_of_qualifiers(X) * cost(ID) +

cost(tuple)+ number_of_qualifiers(Z) * cost(tuple)

Calculated for all n/2 semijoins

Page 20: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Two-site Query TestSite 1 Site 2 Optimized Naïve %Improvem

ent100 400 16010 32000 50100 600 16270 44800 64100 800 15750 57600 73100 1000 14580 70400 79200 400 32150 38400 17200 600 31760 51200 38200 800 32020 64000 50200 1000 31890 76800 59

Page 21: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Four- and Six-site Query Test

Site 1 Site 2 Site 3 Site 4 Optimized

Naïve %Improve

100 200 400 600 52264 83200 37100 200 800 1000 53410 134400 60400 600 800 1000 162604 172900 6

For the six-site query – 100, 200, 400, 600, 800, 1000

• Optimized = 127,456 bytes • Naïve = 198400 bytes• %improvement = 36%

Page 22: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Conclusions For multiple-site queries, our algorithm

outperforms the naïve approach in all cases The greater the difference in relation sizes,

the greater the reduction in data transmission

Page 23: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

Future Work CPU and I/O costs Evaluate two-site queries vs. existing

strategies A real distributed database Development of more multi-site distributed

spatial query processing strategies

Page 24: Multiple-Site Distributed Spatial Query Optimization  using Spatial  Semijoins

THANK YOU!

?