postgresql 9.6 performance-scalability improvements

© 2013 EDB All rights reserved. 1

Scalability And Performance Improvements In PostgreSQL 9.6

•

Dilip Kumar | 2016.03.17

PgDay Asia Singapore

Who am I ?

Dilip Kumar

Currently working at EnterpriseDB

Have worked to develop various features on PostgreSQL (for

internal projects) as well as on other In-House DB at Huawei.

I have also contributed patches to community.

Holds around 14 patents in my name in various DB technologies.

I have presented a paper in PgCon 2015

3

Journey From 9.1 to 9.5 What’s In 9.6 ? MVCC Scalability Improvement Clog Scalability Bulk Load Scalability Parallel Query Sorting Improvement Hash Lock Improvement Index Only Scan with partial Index Partial Sort Cache the Snapshot Buffer Header Spin Lock

Contents

Journey From 9.1 to 9.5

Whats’s In 9.6 ?

ProcArray Lock Contention Solved : CommitedParallel Sequence Scan : Commited Parallel NLJ and HJ : CommitedClog Control Lock Issue : In ProgressBulk Load Scalability : In ProgressBuffer Header Lock Issue : In progressHash Header Lock Contention : In ProgressSorting Improvement using Quick Sort : In ProgressCheckpoint Continuous Flushing : In ProgressIndex Only Scan with partial Index : In ProgressCaching the Snapshot : In ProgressPartial Sort : In Progress

MVCC Scalability Improvement

6

ProcArrayLock:

Was the major contention point reported in 9.5 and blocking Read write workload to scale beyond 30 cores in TPCC.

With Pgbench also After 30 cores scalability was not linear.

1 8 16 32 64 1280

5000

10000

15000

20000

25000

30000

pgbench -M prepared Median Of 30 mins of Runs Syncrhronous Commit=On

Head

Clients

TPS

MVCC Scalability Improvement

Many Solutions were tried to overcome this problem, like CSN snapshot, Incremental Snapshot.

Finally Group clear Xid in ProcArrayEnd Transaction Successfully got committed for 9.6 version and Scaling is almost linear upto 64 Clients in 64 thread machine.

1 8 16 32 64 1280

5000

10000

15000

20000

25000

30000

35000

pgbench -M prepared Median Of 30 mins of Runs Syncrhronous Commit=On

HeadPatch

Clients

TPS

Clog Control Improvement

8

Clog Control Lock:

Afer reducing ProcArrayLock contenton CLogControlLock become next visible contenton Point.

Contenton is mainly due to two reasons, one is that while writng the transacton status in CLOG, it acquires EXCLUSIVE CLogControlLock which contends with every other transacton which tries to access the CLOG for checking transacton status and to reduce it. Second contenton is due to the reason that when the CLOG page is not found in CLOG bufers, it needs to acquire CLogControlLock in Exclusive mode which again contends with shared lockers which tries to access the transacton status.

Soluton Used for ClogControl Lock is Similar to the ProcArray Group Clear XID.

Clog Control Improvement

9

Bulk Load Scalability

Relation Extension Lock:

Currently Relation extension Lock is becoming bottleneck while extending the relation in parallel.

Both COPY and INSERT are Suffering From the same problem.

Recently we are working on this various solutions are tries, and Currently this is in Progress. (Lock Free Extension, Extend In multiple Blocks with User Knob, Group Extend the Relation, Extend In multiple of Lock Waiters).

Bulk Load Scalability

1 2 4 8 16 32 640

200

400

600

800

1000

1200

COPY 10000 Record (4Bytes) Data Fits in Shared Buffers

Base

Patch

Clients

TP

S

1 2 4 8 16 32 640

50

100

150

200

250

300

INSERT 1000 Records (1K) data doesn't fits In Shared Buffers

Base

Patch

Clients

TP

S

Parallel Query

12

Parallel Query is a Great Win for 9.6 this allows to run single query in Parallel in multiple workers.

Parallel Sequence Scan and Parallel Join are already Committed and some are in progress like Parallel Index Scan, Parallel Aggregate.

Parallel Sequence Scan and Parallel Join has shown Great Improvement for Single Query.

Parallel Query

13

Q3 Q4 Q5 Q7 Q80

2000

4000

6000

8000

10000

12000

14000

TPC-H Query with scale factor=5

Time in ms

Sorting Improvement

Using quicksort for every external sort:

This usage a quick sort instead of Replacement Selection.

Replacing with quick sort which is cache conscious algorithm, that improves the performance significantly.

Sorting Improvement

1.7 3.5 7 140

50000

100000

150000

200000

250000

300000

350000

400000

Time Calculation for Using Quick Sort For All External Sort Work_mem= DEFAULT (4mb).

HeadPatch

Data Size (GB)

Time in (ms)

Hash Header Lock Contention

Hash Header mutex Contention: Postgres internal hash table is used for many key Operation.

Heavy Weight Locks are managed in hash Tables. Buffer pool is managed by hash Tables.

But whenever hash needs to allocate element from free list or release to free list, there is one common lock and that become main contention point.

As part of this work this is converted to partitioned level freelist now each partition have separate freelist and separate Locks.

Freelist elements can be shared across partitions.

Hash Header Lock Contention

z

64 1280

50000

100000

150000

200000

250000

300000

350000

PgBench: Scale Factor 300 Shared Buffer 512MB

Head

Patch

Cleints

TP

S

Index only scan with partial index

What is Partial Index ?

A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the partial index). The index contains entries for only those table rows that satisfy the predicate.

A major motivation for partial indexes is to avoid indexing common values. Since a query searching for a common value (one that accounts for more than a few percent of all the table rows) will not use the index anyway, there is no point in keeping those rows in the index at all. This reduces the size of the index, which will speed up queries that do use the index.

Index only scan with partial index

Problem:

Currently partial indexes end up not using index only scans in most of the cases.

unless you include all columns from the index predicate to the index, the planner will decide index only scans are not possible.

Adding those columns which are not needed at runtime, will only increase the index size.

Solution:

Select the index only scan, if there are some key which are not really required at runtime.

Partial Sort

In PostgreSQL If all the sort keys are part of the index then

index scan will be selected for sorting.

Otherwise scan all tuple from heap and Sort them

completely.

As per this patch, mix both methods:

get results from index in order which partially meets our

requirements

do rest of work from heap.

Cache The Snapshot

As per this Improvement whenever backend take a

snapshot its stored in a shared cache.

Whenever any transaction is getting commited, saved

snapshot will be marked Invalid.

When there is mix load of read and write that time if b/w

multiple read there is no write transaction, snapshot can be

reuse.

When write transaction Invalidate the snapshot, First

transaction taking the snapshot will recalculate and update

the cached snapshot

Cache The Snapshot

64 88 128 2560

5000

10000

15000

20000

25000

30000

35000

40000

Pgbench: Scale factor 300 Shared Buff=8GB

BASE

ONLY CLOG CHANGES

CLOG + SAVE SNAPSHOT

Clients

TP

S

Buffer Header lock Improvement

Convert Pin/Unpin Spin Lock to Atomic Operation: Spin Lock inside the Pin and Unpin buffer is also one of the

bottleneck, While testing scalability in Big NUMA Machines.

As per this patch Spin lock is converted to Atomic Operation and Increasing/decreasing reference count is done using just one ATOMIC operations.

Other Operation which need to do more work other than just updating state variable, that takes a Lock and Lock is using atomic operation.

Buffer Header lock Improvement

1 8 16 32 640

100000

200000

300000

400000

500000

600000

700000

800000

pgbench read only test -M prepared S.F=1000 Shared Buffer=8GB

HeadPached

Clients

TPS

Questions?

25

Thanks!

26

postgresql 9.6 performance-scalability improvements

Technology