postgresql + kafka: the delight of change data capture
TRANSCRIPT
![Page 1: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/1.jpg)
PostgreSQL + Kafka The Delight of Change Data CaptureJeff Klukas - Data Engineer at Simple
1
![Page 2: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/2.jpg)
2
Overview
Commit logs: what are they?
Write-ahead logging (WAL)
Commit logs as a data store
Demo: change data capture
Use cases
![Page 3: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/3.jpg)
3
https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/
Commit Logs
![Page 4: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/4.jpg)
4
Ordered Immutable Durable
Commit Logs
![Page 5: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/5.jpg)
5
Commit Logs
Ordered Immutable Durable
In practice, old logs can be deleted or archived
![Page 6: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/6.jpg)
6
Write-Ahead Logging (WAL)
![Page 7: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/7.jpg)
7
– https://www.postgresql.org/docs/current/static/wal-intro.html
“WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage”
![Page 8: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/8.jpg)
8
– https://www.postgresql.org/docs/9.4/static/logicaldecoding-explanation.html
“Logical decoding is the process of extracting all persistent changes to a database's tables into a coherent, easy to understand format which can be interpreted without detailed knowledge of the database's internal state.”
![Page 9: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/9.jpg)
9
![Page 10: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/10.jpg)
10
Topic Partitions
![Page 11: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/11.jpg)
11
Topics
![Page 12: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/12.jpg)
12
Compacted Topics
![Page 13: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/13.jpg)
13
https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/
![Page 14: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/14.jpg)
14
INSERT INTO transactions VALUES (56789, 20.00);
{ "transaction_id": {"int": 56789}, "amount": {"double": 20.00} }
Bottled Water - Message Key
{ "transaction_id": { "int": 56789 } }
Bottled Water - Message Value
![Page 15: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/15.jpg)
15
UPDATE transactions SET amount = 25.00 WHERE transaction_id = 56789;
{ "transaction_id": {"int": 56789}, "amount": {"double": 25.00} }
Bottled Water - Message Key
{ "transaction_id": { "int": 56789 } }
Bottled Water - Message Value
![Page 16: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/16.jpg)
16
DELETE FROM transactions WHERE transaction_id = 56789;
null
Bottled Water - Message Key
{ "transaction_id": { "int": 56789 } }
Bottled Water - Message Value
![Page 17: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/17.jpg)
17
tx-service
tx-postgres
Use Cases
![Page 18: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/18.jpg)
18
tx-service
tx-postgres
tx-pgkafka
Kafka topic: tx-pgkafka
![Page 19: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/19.jpg)
19
tx-service
tx-postgres
tx-pgkafka
demux-service
Kafka topic: tx-pgkafka
![Page 20: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/20.jpg)
20
tx-service
tx-postgres
tx-pgkafka
demux-service
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
![Page 21: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/21.jpg)
21
tx-service
tx-postgres
tx-pgkafka
demux-service
activity-service
activity-postgres
activity-pgkafka
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
Kafka topic: activity-pgkafka
![Page 22: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/22.jpg)
22
tx-service
tx-postgres
tx-pgkafka
demux-service
activity-service
activity-postgres
activity-pgkafka
Amazon Redshift (Data Warehouse)
Amazon S3 (Data Lake)
analytics-service
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
Kafka topic: activity-pgkafka
![Page 23: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/23.jpg)
23
tx-service
tx-postgres
tx-pgkafka
demux-service
activity-service
activity-postgres
activity-pgkafka
Amazon Redshift (Data Warehouse)
Amazon S3 (Data Lake)
analytics-service
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
Kafka topic: activity-pgkafka
Change Data Capture
![Page 24: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/24.jpg)
24
tx-service
tx-postgres
tx-pgkafka
demux-service
activity-service
activity-postgres
activity-pgkafka
Amazon Redshift (Data Warehouse)
Amazon S3 (Data Lake)
analytics-service
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
Kafka topic: activity-pgkafka
Messaging
![Page 25: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/25.jpg)
25
tx-service
tx-postgres
tx-pgkafka
demux-service
activity-service
activity-postgres
activity-pgkafka
Amazon Redshift (Data Warehouse)
Amazon S3 (Data Lake)
analytics-service
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
Kafka topic: activity-pgkafka
Analytics
![Page 26: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/26.jpg)
26
Recap
Commit logs: what are they?
Write-ahead logging (WAL)
Commit logs as a data store
Demo: change data capture
Use cases
![Page 27: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/27.jpg)
27
• Blog post on Simple’s CDC pipeline
• https://www.simple.com/engineering
• Bottled Water: https://github.com/confluentinc/bottledwater-pg
• Debezium (CDC to Kafka from Postgres, MySQL, or MongoDB)
• http://debezium.io/
• https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka
• https://www.confluent.io/kafka-summit-sf17/
• Martin Kleppmann, Making Sense of Stream Processing eBook
Also See…
![Page 28: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/28.jpg)
Thank You
28
![Page 29: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/29.jpg)
Extras
29
![Page 30: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/30.jpg)
30
The Dual Write Problem
https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/
![Page 31: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/31.jpg)
31
Redshift Architecture Amazon Redshift
![Page 32: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/32.jpg)
Replicating to Redshift
32
![Page 33: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/33.jpg)
33
Table Schema
CREATE TABLE pgkafka_txservice_transactions ( pg_lsn NUMERIC(20,0) ENCODE raw, pg_txn_id BIGINT ENCODE lzo, pg_operation CHAR(6) ENCODE bytedict, pg_txn_timestamp TIMESTAMP ENCODE lzo, ingestion_timestamp TIMESTAMP ENCODE lzo, transaction_id INT ENCODE lzo, amount NUMERIC(18,2) ENCODE lzo ) DISTKEY transaction_id SORTKEY (transaction_id, pg_lsn, pg_operation);
Amazon Redshift
![Page 34: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/34.jpg)
34
Deduplication
CREATE TABLE deduped LIKE pgkafka_txservice_transactions;
INSERT INTO deduped SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY pg_lsn ORDER BY ingestion_timestamp DESC) FROM pgkafka_txservice_transactions ) WHERE row_number = 1;
DROP TABLE pgkafka_txservice_transactions;
ALTER TABLE deduped RENAME TO pgkafka_txservice_transactions;
Amazon Redshift
![Page 35: PostgreSQL + Kafka: The Delight of Change Data Capture](https://reader034.vdocuments.net/reader034/viewer/2022051301/5a647b817f8b9a2c568b4b03/html5/thumbnails/35.jpg)
35
View of Current StateCREATE VIEW current_txservice_transactions AS SELECT transaction_id, amount, FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY transaction_id ORDER BY pg_lsn, pg_operation) AS n, COUNT(*) OVER (PARTITION BY transaction_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS c FROM pgkafka_txservice_transactions) WHERE n = c AND pg_operation <> 'delete';
Amazon Redshift