pg columnstore index · 2020. 8. 20. · accelerating postgresql query performance columnar vs. row...

31
PG Columnstore Index (& why you should care) August 2020 1

Upload: others

Post on 04-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

PG Columnstore Index(& why you should care)

August 2020

1

Page 2: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

The PostgreSQL high performance innovators

Developers of Swarm64 Data Accelerator for PostgreSQL

Deep PostgreSQL & hardware-level engineering expertise

Berlin ■ Boston ■ Palo Alto

Page 3: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Agenda

Accelerating PostgreSQL query performance

Columnar vs. Row oriented storage

Columnstore indexing

Q&A0

100

200

300

400

500

600

700

800

900

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

TPC-H Query Response Times(seconds, lower is better, timeout at 900s)

Postgres 12 Postgres 12 + Swarm64 DA

Page 4: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Faster querying = do more with PostgreSQL

Migrating SQL Server and Oracle to PostgreSQLo Especially if using SQL Server columnstore index or Oracle in-

memory column store

Mixed workloads, large-scale reporting & chartingo High concurrency, query complexity, data volume

Open source data warehousing – cut DWH costs 90%o Alternative to Oracle, SQL Server, Netezza, Redshift

Page 5: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

Storage in RDBMS(Well, in the wider sense)

Page 6: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Database storage typology: are you OLTP or OLAP?

ColumnstoreTypical for OLAP

RowstoreTypical for OLTP

Page 7: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

A row-storage table meets a columnar index at a database.

Index(column storage)

Table(row storage)

Perfectly indexed partRead columnar

Not-yet-indexed rowsRead row-wise (if needed)

Columnstore Index(hybrid)

Page 8: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Why care about an index?(because it's minimal change, great performance)

Keep your data in its native format

Compression on columns, faster data reads

Adds a decoupling mechanism, keeps single source of truth

Page 9: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

What are the downsides?(bit more overhead, here and there)

Writing/reading might be more expensive

I/O advantage reduces the more columns are selected

Index has to be maintained, can be somewhat out of sync

Page 10: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Wait, doesn't PostgreSQL have this already?(it depends)

ZedstoreIntroduces different table format

PG advantage of row-based storage gone?

Fujitsu VCISeems only available for Fujitsu Enterprise PostgreSQL

Hybrid implementation, but targets in-memory

Page 11: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

The big picture,as of now

Page 12: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Swarm64 DA Columnstore Index

Page 13: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

The Swarm64 DA Columnstore Index: benefits.(compared to Swarm64 DA FDW-based acceleration)

Direct I/O: no page cache, more RAM for operations

Make use of WAL replication (again)

Backup & restore just work

autovacuum all the way

Page 14: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Accelerated Postgres for mixed workloads

Better query planning

Columnstore indexing

Faster query execution

~20x faster responses to complex queries

4x more simultaneous databases, users per server

100% drop-in to existing Postgres databases

Swarm64 Data Accelerator (DA) 5.0

Page 15: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Swarm64 DA 5.0: boosting the Postgres engine

• Compressed columnstore indexes• Smart skipping of irrelevant data• 10x-100x lower I/O load

Query execution

Data access& optimization

Queryplanning

• Query rewriting for speed, resource efficiency, & parallelism

• Optimized cost functions for more efficient OLAP & HTAP• Adaptive resource management system• Automatic up-to-date statistics

• Faster JOINs• More parallelism & better workload distribution• Faster data movement• Full ACID consistency

Interface:100%

Postgres

Page 16: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Anatomy of a query – faster & more efficient resource utilization

Direct access into compressed column indexesSmart skipping of irrelevant data sections

3x faster JOINs2x lower RAM consumption

Adding a “shuffle node” between JOINsand faster data movement

Keep query execution parallelfor as long as possible

Scan & filter

Merge (Aggregate,Distinct …)

JOIN 2

SORT & aggregate

JOIN 1

Minutes

Seconds

More parallel threads (scales as you add vCores)

Page 17: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Example: TPC-H Query №6

Page 18: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

l_discount, l_extendedprice, l_quantity, l_shipdate

Detail: the inner workings of accelerated TPC-H query 6

l_shipdate >= date '1993-01-01'AND l_shipdate < date '1993-01-01' + INTERVAL '1' YEARAND l_discount BETWEEN 0.05 – 0.01 AND 0.05 + 0.01AND l_quantity < 24

SUM(l_extendedprice * l_discount)

1 2 3 4 5 6

Page 19: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

Show & Tell

Page 20: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Example: TPC-H schema + query №14

Page 21: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Native PostgreSQL

Query Time: 15.60s

Scan Time: 13.62s

Page 22: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

CREATE EXTENSION swarm64da

Query Time: 7.34s

Scan Time: 6.02s

Page 23: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Page 24: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

PG + Swarm64 DA extension + columnstore index applied

Query Time: 4.81s

Scan Time: 3.25s

Page 25: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

20x faster TPC-H

PG 12.3 as basis, SF1000

Swarm64 finishes all 22 queries in 80 secs or less

Commodity 2U server 144 vCores, SSD array

0

100

200

300

400

500

600

700

800

900

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

TPC-H Query Response Times(seconds, lower is better, timeout at 900s)

Postgres 12 Postgres 12 + Swarm64 DA

A complete benchmark

Page 26: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Increased resource efficiency for mixed workloads

0%

20%

40%

60%

80%

100%

CPU Cycles RAM I/O Concurrent Users or Queries

Postgres 12 Postgres 12 with Swarm64 DA 5.0

Better mixed workload densityMore queries per hour/concurrent users on the same hardware

4xIncrease

Page 27: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

Getting Started with Swarm64

Page 28: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Pricing & availability

Price

Swarm64 DA $33 / vCore / month

PostgreSQL compatibility PostgreSQL 11 & upEnterpriseDB EPAS

Platforms Linux – on premises or cloud (any)

Page 29: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Proven acceleration timeline

Proof of concept Design/Plan Build Deploy

• Project plan & timeline

• Little-to-no code or SQL changes

• 2 weeks

• Show and prove performance gains

• Validate system requirements & costs

• Provision system

• Migrate data

• Testing

• Run anywhere

• Compatible with entire Postgres tools ecosystem

• Scale elastically

• Apply new acceleration upgrades over time

Page 30: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

© 2020 Swarm64, Inc.

Try it for free...

Works with free PostgreSQL (v. 11 +), EDB Postgres

Run it in your data center

Run it on the cloud • Start a Swarm64-accelerated PG instance on AWS in 5 minutes• Runs on all the other clouds too

Page 31: PG Columnstore Index · 2020. 8. 20. · Accelerating PostgreSQL query performance Columnar vs. Row oriented storage Columnstore indexing Q&A 0 100 200 300 400 500 600 700 800 900

Thank you!

[email protected]@swarm64.com