secon'2016. Сигаев Федор, pg в кластере. Скандалы, интриги,...
TRANSCRIPT
PostgreSQL .
, , .
, ,
PostgreSQL CORE
Locale support
PostgreSQL extendability:
GiST(KNN), GIN, SP-GiST
Full Text Search (FTS)
NoSQL (hstore, jsonb)
Indexed regexp search
VODKA access method (WIP)
:
Intarray
Pg_trgm
Ltree
Hstore
plantuner
JsQuery
PGCon, PGConf: 20+
GSoC
PostgreSQL (1+1 in progress)
50+ PostgreSQL: , ,
Novartis, Raining Data, Heroku, Engine Yard, WarGaming, Rambler, Avito, 1c
Write or Read or Both scalability
HA
()
Postgres Cluster Matrix
WS RSParallel
ReadM-MSynchrRecov.HA Consistency
ACIDBASE
Postgres-R++++++XC/XL/X2+?+++++PGCluster++PgPool+++Pl/proxy+++
pg_shard/CituDB++++Greenplum+++Bucardo+++++
BDR+++++
SR+++++!FDW++++
PostgreSQL XC/XL/X2
2009-2012 Postgres-XC Development Group
2012-2014 TransLattice, Inc
2014-2016 Postgres-XL Development Group
2015-2016 2ndQuadrant Ltd
PostgreSQL XC/XL/X2
GTM (SPOF!)
ProxyGTM
N+1 commit logs (sync?)
, ,
CREATE TABLE ...
DISTRIBUTE BY { REPLICATION | ROUNDROBIN { [HASH | MODULO ] (
column_name ) } }
pg_shard/FDW
Shared nothing
No ACID, No BASE ,
BDR 2ndQuadrant Ltd
BASE only
(last win)
Node 1Node 2
?
(-BDR)
HA
DB3DB1DB2xTM
PgSQL
1
2
# begin;# select ctid, xmin, xmax, * from foo; ctid | xmin | xmax | id | val -------+------+------+----+----- (0,1) | 693 | 0 | 1 | 111# update foo set val=222;# select ctid, xmin, xmax, * from foo; ctid | xmin | xmax | id | val -------+------+------+----+----- (0,2) | 694 | 0 | 1 | 222# select txid_current(); txid_current -------------- 694
# begin;# select ctid, xmin, xmax, * from foo; ctid | xmin | xmax | id | val -------+------+------+----+----- (0,1) | 693 | 0 | 1 | 111
# select ctid, xmin, xmax, * from foo; ctid | xmin | xmax | id | val -------+------+------+----+----- (0,1) | 693 | 694 | 1 | 111
PgSQL
Tuple , (Read Commited)Xmin
Xmax == 0
Tuple , (Repeatable Read)Xmin
Xmin < Current XID
Xmax == 0 Xmax > Current XID
PgSQL
1
2
# begin;# select ctid, xmin, xmax, * from foo; ctid | xmin | xmax | id | val -------+------+------+----+----- (0,1) | 697 | 0 | 1 | 111 (0,2) | 697 | 0 | 2 | 222# update foo set val = 333 where id = 1;
# select ctid, xmin, xmax, * from foo; ctid | xmin | xmax | id | val -------+------+------+----+----- (0,2) | 697 | 699 | 2 | 222 (0,3) | 698 | 0 | 1 | 333# update foo set val = 555 where id = 2; -- 2-
# begin;# select ctid, xmin, xmax, * from foo; ctid | xmin | xmax | id | val -------+------+------+----+----- (0,1) | 697 | 0 | 1 | 111 (0,2) | 697 | 0 | 2 | 222
# update foo set val = 444 where id = 2;# select ctid, xmin, xmax, * from foo; ctid | xmin | xmax | id | val -------+------+------+----+----- (0,1) | 697 | 698 | 1 | 111 (0,4) | 699 | 0 | 2 | 444
# update foo set val = 666 where id = 1;-- DEADLOCK!!
Count(*) !
Transaction Manager before patch
Transaction Manager after patch
Distributed Transaction Manager
Pluggable transaction API
UDTCSPFDWAMTM?Core
eXtensible Transaction API
XidStatus (*GetTransactionStatus)(TransactionId xid, XLogRecPtr *lsn);
void (*SetTransactionStatus)(TransactionId xid, int nsubxids, TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
Snapshot (*GetSnapshot)(Snapshot snapshot);
TransactionId (*GetNewTransactionId)(bool isSubXact);
TransactionId (*GetOldestXmin)(Relation rel, bool ignoreVacuum);
bool (*IsInProgress)(TransactionId xid);
TransactionId (*GetGlobalTransactionId)(void);
bool (*IsInSnapshot)(TransactionId xid, Snapshot snapshot);
/
(sic!)
!
c pg_shard/FDW!
SPOF
!
DTM architecture
PostgreSQLInstance 1PostgreSQLInstance 2PostgreSQLInstance 3Arbitermaster
Arbiterslave 1Arbiterslave 2
synchronousreplication
asynchronousreplication
Coordinator
Multiplexing
Unix domainsockets
Arbiter
Unix domainsockets
TCP sockets
sockhub
backends
backends
Node 1
Node 2
DTM from client's point of view
Primary serverSecondary server
create extension pg_dtm;create extension pg_dtm;
select dtm_begin_transaction();
begin transaction;
update...;
commit;
select dtm_join_transaction(xid);
begin transaction;
update...;
commit;
!
- ?!
!
!
id
-,
... !
...
tsDTM architecture
PostgreSQLInstance 1PostgreSQLInstance 2PostgreSQLInstance 3Coordinator
Lightweight two-phase commit
XactLogCommitRecord(flush changes in WAL)ProcArrayEndTransaction(mark transaction as completed)ResourceOwnerRelease(release transaction locks)TransactionTreeSetCommitTsData(set transaction status in CLOG)Agreement
Transaction status
Different DTM implementations
Local transactions2PCArbiterExamples
Snapshot sharingXL, DTM
TimestampSpanner, Cockroach, tsDTM
IncrementalSAP HANA
Postgres Cluster Matrix
WS RSParallel
ReadM-MSynchrRecov.HA Consistency
ACIDBASE
Postgres-R++++++XC/XL/X2+?+++++PGCluster++PgPool+++Pl/proxy+++
pg_shard/CituDB++++++Greenplum+++Bucardo+++++
BDR+++++
SR++?+++FDW+++++
Postgres Professional
xTM
DTM
tsDTM
*TM
Postgres Professional
( .. )
()
raft-
MM-HA-NM !
Postgres Professional
PostgreSQL
, , ,
2011 .
(, , )
PgConf.Russia 2015 2016 - PostgreSQL
, , PostgreSQL *
* , , ,
PostgreSQL
4 : 3 PostgreSQL .
.
Postgres Professional NY 2016
Postgres Professional
!
!
:
+79166718198
www.postgrespro.ru
www.postgrespro.ru