asim praveen ashwin agrawal - postgresql · 2016. 2. 8. · scale out single instance is limited...
TRANSCRIPT
![Page 1: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/1.jpg)
Deciphering 2phase commitAshwin AgrawalAsim Praveen
{aagrawal, apraveen}@pivotal.io
![Page 2: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/2.jpg)
Scale out
● Single instance is limited● Manual attempts at sharding PostgreSQL● FDW based sharding● MPP → distributed databases
![Page 3: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/3.jpg)
Challenge with atomicity
begin;
insert into account values (id = 1 ...);
insert into account values (id = 2 ...);
commit;
shard 1
shard 2
![Page 4: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/4.jpg)
Challenge with atomicity
begin;
insert into account values (id = 1 ...);
insert into account values (id = 2 ...);
commit; shard 1
shard 2
![Page 5: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/5.jpg)
Challenge with atomicity
begin;
insert into account values (id = 1 ...);
insert into account values (id = 2 ...);
commit; shard 1
shard 2
![Page 6: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/6.jpg)
Two phase commit
DTM
shard 2
shard 1
prepare
prepare
Phase 1
![Page 7: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/7.jpg)
Two phase commit
DTM
shard 2
shard 1
prepare
prepare
Phase 1
vote: yes
vote: yes
yes
yes
![Page 8: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/8.jpg)
Two phase commit
DTM
shard 2
shard 1
prepare
prepare
Phase 1
all prepared
vote: yes
vote: yes
yes
yes
![Page 9: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/9.jpg)
Two phase commit
DTM
shard 2
shard 1
prepare
vote: yes
preparevote: yes
DTM
shard 2
shard 1
commit
ack
commitack
Phase 1 Phase 2
yes
yes
commit
commit
all prepared done
![Page 10: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/10.jpg)
2PC: shard failure in phase one
DTM
shard 2
shard 1
prepare
vote: yes
prepare
Phase 1
yes
rollback
![Page 11: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/11.jpg)
2PC: shard failure in phase one
DTM
shard 2
shard 1
prepare
vote: yes
prepare
DTM
shard 2
shard 1
abortack
abort
Phase 1 Phase 2
yes abort
rollback done
![Page 12: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/12.jpg)
2PC: shard failure in phase 2
DTM
shard 2
shard 1
prepare
vote: yes
preparevote: yes
DTM
shard 2
shard 1
commit
ack
commit
Phase 1 Phase 2
yes
yes
commit
all prepared
![Page 13: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/13.jpg)
2PC: recovery of shard 2
shard 2checkpoint
prepared: xid
wait for commit/abort message
DTM
Ack not received from shard 2;
retry sending commit to shard 2
timeline
![Page 14: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/14.jpg)
2PC: DTM crashed in phase 1
DTM
shard 2
shard 1
prepare
prepare
Phase 1
vote
vote
![Page 15: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/15.jpg)
2PC: DTM crashed in phase 1
DTM
shard 2
shard 1
prepare
prepare
DTM
shard 2
shard 1
in doubt xacts
(xid, state)
in doubt xacts(xid, state)
Phase 1 DTM Recovery
vote
vote
![Page 16: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/16.jpg)
2PC: DTM crashed after phase 1
DTM
shard 2
shard 1
prepare
prepare
Phase 1
all prepared
vote: yes
vote: yes
yes
yes
![Page 17: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/17.jpg)
2PC: DTM crashed after phase 1
DTM
shard 2
shard 1
prepare
prepare
DTM
shard 2
shard 1
commit
ack
commitack
Phase 1
commit
commit
all prepared
DTM Recovery
done
vote: yes
vote: yes
yes
yes
![Page 18: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/18.jpg)
2PC vs 1PC● Prepare phase
○ 1 network round trip○ 1 disk flush
● Commit phase○ 1 network round trip○ 1 disk flush
● 2PC guarantees A and D of ACID
![Page 19: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/19.jpg)
Single node snapshot isolationTuple headers contain:
• xmin: transaction ID of inserting transaction
• xmax: transaction ID of replacing/deleting transaction (initially NULL)
Basic idea: tuple is visible if xmin is valid and xmax is not. "Valid" means "either committed or the current transaction".
![Page 20: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/20.jpg)
“Snapshot” filter away active transactionsRules ensuring no transaction committing after the current transaction’s start be considered committed:
● Currently running transactions IDs never considered valid, even if shown committed in pg_clog.
● Transaction ID higher than the current transaction is not valid (future transaction).
![Page 21: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/21.jpg)
Challenge with isolation
shard1A: 10B: 15
A: begin;B: begin;
A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);
shard2
![Page 22: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/22.jpg)
Challenge with isolation
shard1A: 10B: 15
A: begin;B: begin;
A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);
B: insert into acc values(id=4, ...);A: insert into acc values(id=2, ...);
B: commit;
shard2B: 20A: 25
![Page 23: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/23.jpg)
Challenge with isolation
shard1A: 10B: 15
A: begin;B: begin;
A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);
B: insert into acc values(id=4, ...);A: insert into acc values(id=2, ...);
B: commit;A: select * from acc; 1, 2, 4
B is in future for A
B visible to A !!!shard2
B: 20A: 25
![Page 24: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/24.jpg)
● Global xid and global snapshot provided by DTM
● Gxmin, Gxmax, gInProgress [ ]● tuples contain local xmin/xmax
if (!XidInSnapshot(GS, xid)){ XidInSnapshot(LS, xid)}
Global snapshot
T1G
T1L
T2G
T2L T3L
global snapshot
local snapshot
shard
![Page 25: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/25.jpg)
Global snapshot in action1
A
10
2
B
15shard1
shard2
A: begin; GXID: 1B: begin; GXID: 2
A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);
![Page 26: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/26.jpg)
Global snapshot in action1
A
10
2
B
15shard1
1
A
25
2
B
20shard2
A: begin; GXID: 1B: begin; GXID: 2
A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);
B: insert into acc values(id=4, ...);A: insert into acc values(id=2, ...);
B: commit;
![Page 27: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/27.jpg)
Global snapshot in action1
A
10
2
B
15shard1
1
A
25
2
B
20shard2
A: begin; GXID: 1B: begin; GXID: 2
A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);
B: insert into acc values(id=4, ...);A: insert into acc values(id=2, ...);
B: commit;A: select * from acc; 1, 2
![Page 28: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/28.jpg)
Global snapshot with local transaction
A: begin; GXID: 1B: begin; GXID: 2
A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);
B: insert into acc values(id=4, ...);A: insert into acc values(id=2, ...);L: select * from acc;
/* 0 rows */B: commit;L: select * from acc; 4
1
A
25
2
B
20 L
1
A
10
2
B
15shard1
shard2
![Page 29: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/29.jpg)
Implementation options● Model
○ Pull■ DTM as a service■ participants join a transaction■ transaction can be initiated by any participant
○ Push■ DTM initiates transaction and decides participants■ MPP databases
![Page 30: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/30.jpg)
Implementation options● Transactions
○ Global and local■ global and local transaction IDs, snapshots■ mapping between global and local transaction IDs
○ Global only■ only one xid and snapshot across the cluster
![Page 31: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/31.jpg)
ACID distributed system● Two phase commit ⇒ A and D● Global snapshot ⇒ I
?? Spot the Problem ??
A: begin;A: update acc set ... where id = 1B: begin;B: update acc set ... where id = 2B: update acc set ... where id = 1A: update acc set ... where id = 2
![Page 32: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/32.jpg)
ACID distributed system● Two phase commit ⇒ A and D● Global snapshot ⇒ I
● Global lock manager ⇒ C
?? Spot the Problem ??
A: begin;A: update acc set ... where id = 1B: begin;B: update acc set ... where id = 2B: update acc set ... where id = 1A: update acc set ... where id = 2
![Page 33: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases](https://reader034.vdocuments.net/reader034/viewer/2022051903/5ff3ce0d6712a007cc7d9df6/html5/thumbnails/33.jpg)