scaling up analytical queries with column -stores

19
Scaling up analytical queries with column-stores Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki École Polytechnique Fédérale de Lausanne

Upload: gloria

Post on 23-Feb-2016

57 views

Category:

Documents


0 download

DESCRIPTION

Scaling up analytical queries with column -stores. Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki. École Polytechnique Fédérale de Lausanne. Drinking from a data firehose. Fast and high quality data analysis for smart business decisions Data warehouses - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scaling up analytical queries with  column -stores

Scaling up analytical queries with column-stores

Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki

École Polytechnique Fédérale de Lausanne

Page 2: Scaling up analytical queries with  column -stores

Drinking from a data firehose Fast and high quality data analysis for

smart business decisions Data warehouses

1/3 of the database market ($$$) Column-stores are here to stay!

Need for multiple concurrent users 100s to 1000s queries*

2

Many concurrent queries + column-stores = ???*"High-performance data warehousing", TDWI best practices report

Page 3: Scaling up analytical queries with  column -stores

Multiple concurrent queries

3

DBMS

CORE 4

CORE 1

CORE 3

CORE 2

CORE 8CORE 7

CORE 6CORE 5

MEM

CORE 4

CORE 1

CORE 3

CORE 2

CORE 8CORE 7

CORE 6CORE 5

HDD

Find all restaurants with rating over 3.5 and close to East Village

steak?

pasta?

indian?

vegan?

High contention for resources

Page 4: Scaling up analytical queries with  column -stores

4

throughputresponse time

Page 5: Scaling up analytical queries with  column -stores

Throughput (memory-resident workload)

5

Ideal Real

# clients

Thro

ughp

ut (k

Q/h

)

total #HW contexts

saturation point

Concurrency can hurt performance

TPCH (sf:30)

Page 6: Scaling up analytical queries with  column -stores

Experimental setup Column stores

System-A and System-B (Commercial) System-C (Open-source)

Hardware Dual socket Intel(R) Xeon(R) CPU E5-2660

• 2 sockets x 8 cores x 2 threads (32 HW contexts) 128 GB RAM, 1600 MHz DIMMs L1: 64KB and L2: 256KB (per core), L3: 20MB (shared)

6

Page 7: Scaling up analytical queries with  column -stores

Workloads TPC-H

Scale factor: 30 (32GB on disk) Qtpch = {10 query templates}

SSB (Star Schema Benchmark) Scale factor: 30 (18GB on disk) Qssb = {all of 13 query templates}

Throughput exp. with 25 query instances

7

Memory-resident

Hot-runs

Page 8: Scaling up analytical queries with  column -stores

8

Experiment 1:

How does increased concurrency affect response time?

Page 9: Scaling up analytical queries with  column -stores

Scaling up TPCH Q1

9

0 50 100 150 200 2500

50100150200250300350400450500 System-A

System-CSystem-B

# concurrent queries

Avg.

resp

. tim

e (s

ec)

Linear increase in response time

Page 10: Scaling up analytical queries with  column -stores

Scaling up SSB Q3.1

10

0 50 100 150 200 2500

50

100

150

200

250 System-ASystem-CSystem-B

# concurrent queries

Avg.

resp

. tim

e (s

ec)

Similar behavior in SSB

Page 11: Scaling up analytical queries with  column -stores

11

Experiment 2:

What is the variability of query response time?

Page 12: Scaling up analytical queries with  column -stores

Variability of System-A

12Groups of short, medium and long running queries

TPCH (64 clients)

Page 13: Scaling up analytical queries with  column -stores

Variability of System-B

13Balanced resource allocation lower variation

TPCH (64 clients)

Page 14: Scaling up analytical queries with  column -stores

Variability of System-C

14System-C uses an admission control mechanism

TPCH (64 clients)

Page 15: Scaling up analytical queries with  column -stores

15

Experiment 3:

How does increasing concurrency affect throughput?

Page 16: Scaling up analytical queries with  column -stores

Throughput - TPCH

16

0 50 100 150 200 2500

2000

4000

6000

8000

10000

12000

14000

16000System-BSystem-CSystem-A

# concurrent clients

Thro

ughp

ut (

kQue

ries/

h)

Throughput decreases after the saturation point

48%

32% drop

35% drop

Page 17: Scaling up analytical queries with  column -stores

0 50 100 150 200 2500

2000400060008000

100001200014000160001800020000

System-BSystem-CSystem-A

# concurrent clients

Thro

ughp

ut (

kQue

ries/

h)

29% drop

39% drop

Throughput - SSB

17Exploiting sharing sustain peak performance

throughput plateaus

Page 18: Scaling up analytical queries with  column -stores

When concurrency in column-stores is increased:

Response time increases linearly

… with high variability

After saturation peak performance is not sustained

18

Except from System-B for SSB

Page 19: Scaling up analytical queries with  column -stores

Where do we go from here? QPipe, Datapath, CJoin, ShareDB, Blink Recycler (MonetDB), cooperative scans, CCM (cracking)

19

Ideal Real

# clients

Thro

ughp

ut

saturation point

Adaptive resource (re)allocation Work sharing techniques Contention-aware scheduling

Thank you!