md- hbase : a scalable multi-dimensional data infrastructure for location aware services

20
S. Nishimura S. Nishimura (NEC Service Platforms Labs.), (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional D-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware ata Infrastructure for Location Aware Services Services Presenter: Zhuo Liu

Upload: germaine-jordon

Post on 04-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

MD- HBase : A Scalable Multi-dimensional Data Infrastructure for Location Aware Services. S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara). Presenter: Zhuo Liu. Overview. A Motivating Story Existing Technologies - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

S. Nishimura S. Nishimura (NEC Service Platforms Labs.), (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. AbbadiS. Das, D. Agrawal, A. Abbadi

(University of California, Santa Barbara)(University of California, Santa Barbara)

MD-HBase: A Scalable Multi-dimensional Data MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware ServicesInfrastructure for Location Aware Services

Presenter: Zhuo Liu

Page 2: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 2

Overview

▐ A Motivating Story

▐ Existing Technologies

▐ Our proposal

▐ Evaluation

▐ Conclusion

Page 3: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 3

Motivating Scenario: Mobile Coupon Distribution

Coupon

CurrentLocation Current

LocationCurrentLocation

Distribution Policy

• Area• # of coupons

Mobile CouponDistributer

Page 4: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 4

Motivating Scenario: Mobile Coupon Distribution

CurrentLocation

CurrentLocation

CurrentLocation

CurrentLocation

CurrentLocation Current

Location

CurrentLocation

CurrentLocation

CurrentLocation

CurrentLocation

CurrentLocation Current

Location

Distribution Policy• Area• # of coupons

CouponCouponCoupon

Large amounts of DataHigh Throughput

System Scalability

Multi-Dimensional QueryNearest Neighbors Query

Efficient Complex Queries

125,000,000 subscribersin Japan

Page 5: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 5

Existing Technologies

Multi-dimensional

QueriesScalability

Relational DBs

Spatial DBs

Commercial products

but expensive

Open source products

Key-Value Stores

What We Want

at a reasonable price

Page 6: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 6

Ordered Key-Value Stores

key00

key11

keynn

key00

key01

key0X

value00

value01

value0X

key11

key12

key1Y

value11

value12

value1Y

keynn valuenn

Index

BucketsSorted by key

Good at 1-D Range Query

LongitudeTime

Latit

ude

But, our target is multi-dimensional…

ex. BigTable HBase

Page 7: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 7

Naïve Solution: Linearlization

key00

key11

keynn

key00

key01

key0X

value00

value01

value0X

key11

key12

key1Y

value11

value12

value1Y

keynn valuenn

Projects n-D space to 1-D space

Simple, but problematic…

Apply a Z-ordering curve…

5 7 13 15

4 6 12 14

1 3 9 11

0 2 8 10

Page 8: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 8

Problem: False positive scans

▐ MD-query on Linearized spaceTranslate a MD-query to

linearized range query.• Ex. Query from 2 to 9.

Scan queried linearized range.Filter points out of the queried area.

• ex. blue-hatched area (4 to 7)

Require the boundary information of

the original space.

5 7 13 15

4 6 12 14

1 3 9 11

0 2 8 102

9

Page 9: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 9

Build a Multi-dimensional Index Layer on top of an Ordered Key-Value store

Our Approach: MD-HBase

Single Dimensional IndexMulti-Dimensional Index

Ordered Key-Value Storeex. BigTable, HBase, …

MD-HBase

Page 10: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 10

Introduce Multi-dimensional Index

▐ Multi-dimensional Index (ex. The K-d tree, The Quad tree)Divide a space into subspaces containing almost same # of pointsOrganize subspaces as tree

Efficient subspace pruning → to avoid false positive scans

Divide into Organize as

Page 11: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 11

Space Partition By the K-d tree

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

Binary Z-ordering space

00 01 10 11

11

10

01

00

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00

Partitioned space bythe K-d tree

How do we represent these subspaces?

bitwise interleavingex. x=00, y=11 → 0101

Page 12: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 12

Key Idea: The longest common prefix naming scheme

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00

000* 1***

Subspaces represented as the longest common prefix of keys!

Remarkable Property• Preserve boundary information

of the original space

1***

Left-bottomcorner

Right-topcorner

1000 1111

*→0 *→1

(10, 00) (11, 11)

Page 13: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 13

Build an index with the longest common prefix of keys

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00000* 001*

01**

1***

000*

001*

01**

1***

Index

Buckets

allocate per subspace

000*

001*

01**

1***

Page 14: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 14

Reconstruct the boundary Info. &Check whether intersecting the queried area

Multi-dimensional Range Query

0101 0111 1101 1111

0100 0110 1100 1110

0001 0011 1001 1011

0000 0010 1000 1010

00 01 10 11

11

10

01

00

000*

001*

01**

10**

11**

Index

Filter

001*

000*

001*

10**

11**

01**

10**

Scan

Scan

Subspace Pruning

Scan 0010 -1001on the index

Page 15: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 15

K Nearest Neighbors Query

▐ The best first algorithm can be applied. the most efficient technique in practical case

▐ Check the detail in our paper

1 2

4

3

5

Page 16: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Variations of Storage Layer

Table Share Model Uses single table, Maintain bucket boundary Most space efficiency Bucket co-location may cause

disk access congestions

Table per Bucket Model Allocates a table per bucket Most flexible mapping

One-to-one, one-to-many, many-to-one Bucket split is expensive

Copy all points to the new buckets.

Region per Bucket Model Allocates a region per bucket Most bucket split efficiency

Asynchronous bucket split Requires modification of HBase

Page 17: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 17

Experimental Results: Multi-dimensional Range Query

Dataset: 400,000,000 points Queries: select objects within MD ranges and change selectivity Cluster size: 4 nodes MD-HBase responses 10~100 times faster than others

and responses proportional time to selectivity.

1

10

100

1000

0.01 0.1 1 10

Selectivity (%)

Res

po

nse

Tim

e (S

ec)

MD-HBase HBase(ZOrder) MapReduce

Page 18: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 18

Experimental Results: k Nearest Neighbors Query

Dataset: 400,000,000 points Queries: choose a point and change the number of neighbors Cluster size: 4 nodes MD-HBase responses 1.5 sec where k 100, ≦

and 11 sec even if k = 10,000

0

2

4

6

8

10

12

1 10 100 1000 10000

k: Number of Neighbors

Res

po

nse

Tim

e (S

ec)

Page 19: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 19

Experimental Results: Insert

Dataset: spatially skewed data generated by zipfian distribution MD-HBase shows good scalability without significant overhead.

0

50,000

100,000

150,000

200,000

250,000

0 4 8 12 16 20

Number of nodes

Th

ou

gh

pu

t(r

eco

rds/

sec)

MD-HBase

Hbase(Zorder)

Page 20: MD- HBase : A Scalable  Multi-dimensional  Data Infrastructure for Location Aware Services

Page 20

Conclusions

Designed a scalable multi-dimensional data store. Scalability & Efficient multi-dimensional queries Key Idea: indexing the longest common prefix of keys Easily extend general ordered key-value stores.

Demonstrated scalable insert throughput and excellent query performance.

Range Query: 10-100 times faster than existing technologies. kNN Query: 1.5 s when k 100.≦ Insert: 220K inserts/sec on 16 nodes cluster without overhead

Thank you. Any Questions?