exploiting type and space in a main memory query engine thomas schwarz matthias grossmann, daniela...

34
Exploiting Type and Space in a Main Memory Query Engine Thomas Schwarz Matthias Grossmann, Daniela Nicklas, Bernhard Mitschang Universität Stuttgart, Institute of Parallel and Distributed Systems

Upload: polly-craig

Post on 02-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Exploiting Type and Space in a Main Memory Query Engine

Thomas SchwarzMatthias Grossmann, Daniela Nicklas, Bernhard Mitschang

Universität Stuttgart, Institute of Parallel and Distributed Systems

University of StuttgartCenter of Excellence 627

2

Outline

Motivation and Scenarios

Index Structures

Related Work

Experiments

Conclusion

University of StuttgartCenter of Excellence 627

3

City-Guide Szenario

Query: How do I get to the closest hotel?

Hotel Youth-Hostel Museum

University of StuttgartCenter of Excellence 627

4

Typical Data and Type Hierarchy

12

3

4

6

7

Typical data

Root0

Building1

Museum3

Res-taurant

4

Road5

LocalRoad

6MainRoad

7Hotel2

Typical type hierarchy

NameID

Type

YouthHostel

8

University of StuttgartCenter of Excellence 627

5

Type Hierarchy of TIGER/Line Data Sets

D51 130Airportor airfield

D52 131Trainstation

D53 132Busterminal

D 97Landmark

A11 5Primary ...,unseparated

A15 9Primary...,separated

A19 13Prim..,bridge

A1 4Primary HighwayWith LimitedAccess

A4 34Local,Neighborhood,and Rural Road

D5 128Transpor-tationTerminal

D4 122Educationalor religiousInstitution

B1 66RailroadMainLine

B12 68..., intunnel

A 1Road

B 63Railroad

H 213Hydrography

0the root type

CFCC Type IDdescription

258 types

University of StuttgartCenter of Excellence 627

6

Typical Queries

Typical queries ask for Gas stations next to the planned route Nearest base stations for wireless internet Sights / landmarks / buildings in a given area All roads / only major roads in a given area

Disjunctive queries Restrict type of queried objects Restrict location of queried objects

Exploit these characteristics for speedup Leverage a dedicated index structure Combine both primary access paths

University of StuttgartCenter of Excellence 627

7

System Architecture

DataProvider

Mobile Device

Application

DiscoveryService Integration

MiddlewareIntegrationMiddleware

DataProvider

DataProvider

Mobile Device

ApplicationApplication

main memoryquery engine

University of StuttgartCenter of Excellence 627

10

Summary of the Requirements

Simple query capabilitites suffice

Combine Type and Space

Cope with different workloads

Fast response times

University of StuttgartCenter of Excellence 627

11

Outline

Motivation and Scenarios

Index Structures

Related Work

Experiments

Conclusion

University of StuttgartCenter of Excellence 627

12

Separate Indexes

Array

(Main road) 7

(Hotel) 2

(Local road) 6

(Restaurant)4

(Root) 0

(Museum) 3

(Road) 5

(Building) 1

Spatial index(Quadtree)

chooseschooses

Separate Lists

Cost-basedoptimizer

Candidates Candidates

Type predicate Spatial predicateFinalResult

University of StuttgartCenter of Excellence 627

13

Real 3D Index

2

3

41..4

B

uild

ing

+ a

ll su

btyp

es

typ

e d

ime

nsi

on

Query

University of StuttgartCenter of Excellence 627

14

Traversing the Index

Spatial dimension

Typ

e d

imen

sion

Query

University of StuttgartCenter of Excellence 627

15

Traversing the Index

Spatial dimension

Typ

e d

imen

sion

Query

University of StuttgartCenter of Excellence 627

16

Traversing the Index

Spatial dimension

Typ

e d

imen

sion

Query

University of StuttgartCenter of Excellence 627

17

Traversing the Index

Spatial dimension

Typ

e d

imen

sion

Query

University of StuttgartCenter of Excellence 627

18

Traversing the Index

Spatial dimension

Typ

e d

imen

sion

Query

University of StuttgartCenter of Excellence 627

19

Traversing the Index

Spatial dimension

Typ

e d

imen

sion

Query

University of StuttgartCenter of Excellence 627

20

Traversing the Index

Spatial dimension

Typ

e d

imen

sion

Query

University of StuttgartCenter of Excellence 627

21

Traversing the Index

Spatial dimension

Typ

e d

imen

sion

Query

University of StuttgartCenter of Excellence 627

22

Traversing the Index

Spatial dimension

Typ

e d

imen

sion

Query

University of StuttgartCenter of Excellence 627

23

Type Hierarchy Linearization

Treat type information like a spatial dimension

Root0

Building1

Museum3

Res-taurant

4

Road5

LocalRoad

6MainRoad

7Hotel2

University of StuttgartCenter of Excellence 627

24

Type Hierarchy Linearization

Treat type information like a spatial dimension

Root0

Building1

Museum3

Res-taurant

4

Road5

LocalRoad

6MainRoad

7Hotel2

Type dimensionBuilding + all subtypes

University of StuttgartCenter of Excellence 627

25

Effects of the Spacing in the Type Dimension

typ

e d

ime

nsi

on

spatial dimension0 2 4 6 8

0

4

2

6

8

Objects are primarily grouped

by their type

Objects are primarily grouped by their position

inner nodeof index tree

object

spatial dimension

typ

e d

ime

nsi

on

0 2 4 6 80

6

9

3

wide spacing betweenmapped values

narrow spacing betweenmapped values

Affects clustering of objectsDetermine best type mapping range

University of StuttgartCenter of Excellence 627

26

Type Mapping Variant: Equal Spread (ES)

0 1000100 200 300 400 500 600 700 800 900

0 1 2 3 4 5 6 7 8 9 Type ID

mapped value

type mapping range

range containingall subtypes

Same gap between all mapped values

976432

1 5 8

0 The simplestvariant

University of StuttgartCenter of Excellence 627

27

Type Mapping Variant: Type Hierarchy (TH)

0 1000

0

250

1

313

2

375

3

438

4

500

5

583

6

666

7

750

8

875

9 Type ID

mapped value

range containingall subtypes

type mapping range

Same gap between a type and its direct

subtypes

976432

1 5 8

0Cluster objects

with samesupertype

University of StuttgartCenter of Excellence 627

28

Type Mapping Variant: Object Distribution (OD)

0 10003671 250304 429 571 643 750 839

0 1 2 3 4 5 6 7 8 9 Type ID

mapped value

range containingall subtypes

Size of gap corresponds to the number of

instances of a type2 2 10 3 7

210 47

33 64 76

99

12

5 8 85

0 2 Cluster infrequentobjects by location,

cluster frequentobjects by type

type mapping range

Requires additional histogram information

University of StuttgartCenter of Excellence 627

29

Related Work

Spatial Indexes We use them, but don‘t build one

Object-oriented Databases Use only point access methods

Object-relational Databases Separate table for each type

Query many tables for all subtypes

Single global table Use point access methods

University of StuttgartCenter of Excellence 627

30

Outline

Motivation and Scenarios

Index Structures

Related Work

Experiments

Conclusion

University of StuttgartCenter of Excellence 627

31

Experimental Setup

Data sets from 9 counties in California(TIGER/Line 2003)

Universe Width: 15 to 100 km

Height: 26 to 115 km

12k to 203k objects

258 types

University of StuttgartCenter of Excellence 627

33

Comparing the Type Mapping Ranges

100%

110%

120%

130%

140%

150%

160%

ES TH OD ES TH OD ES TH OD ES TH ODData Provider Discovery

ServiceIntegrationMiddleware

Mobile Device

Re

lati

ve

Res

po

ns

e T

ime A, = 15km

B, = 150km

C, = 1500km

D, = 15000km

E, = 60000km

type mapping range

Almost best type mapping range is sufficient

^

^

^

^

^

University of StuttgartCenter of Excellence 627

34

Comparing the Approaches

100%

110%

120%

130%

140%

Data Provider DiscoveryService

IntegrationMiddleware

Mobile Device

150%200%250%300%350%400%450%500%550%600%650%700%

481% 51

8%

690%

446%

Rel

ativ

e R

esp

on

se T

ime

SEP

indexingapproach

R3D.1:1R3D.ESR3D.THR3D.OD

Type mapping does matter!Object Density is the best variantMore impact with low type selectivity

University of StuttgartCenter of Excellence 627

35

Resource Consumption

0

5

10

15

20

2512

k17

k23

k46

k54

k72

k15

1k17

5k20

3k

Byt

es

Nan

o s

eco

nd

s

SEP R3D.OD.B R3D.OD.C R3D.OD.D R3D.OD.E

Indexing approach

Data set size Data set size

12k

17k

23k

46k

54k

72k

151k

175k

203k

0

100

200

300

400

500

600Insertion time per object Memory per object

Scales well with larger data setsSpeed costs resources

University of StuttgartCenter of Excellence 627

36

Conclusion

Location-conscious main memory query engine Exploits characteristics of typical queries

Deployable to many components

Real 3D Index Best performance

Type mapping range: Larger than expected

Type mapping variant: Object Density

Separate Indexes Best resource consumption

University of StuttgartCenter of Excellence 627

37

Outlook

Virtualize query processing Dynamically distribute query capabilites according to load

Integrate other dimensions Valid time

Measurement time