Transcript
Page 1: PostgreSQL + Kafka: The Delight of Change Data Capture

PostgreSQL + Kafka The Delight of Change Data CaptureJeff Klukas - Data Engineer at Simple

1

Page 2: PostgreSQL + Kafka: The Delight of Change Data Capture

2

Overview

Commit logs: what are they?

Write-ahead logging (WAL)

Commit logs as a data store

Demo: change data capture

Use cases

Page 3: PostgreSQL + Kafka: The Delight of Change Data Capture

3

https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/

Commit Logs

Page 4: PostgreSQL + Kafka: The Delight of Change Data Capture

4

Ordered Immutable Durable

Commit Logs

Page 5: PostgreSQL + Kafka: The Delight of Change Data Capture

5

Commit Logs

Ordered Immutable Durable

In practice, old logs can be deleted or archived

Page 6: PostgreSQL + Kafka: The Delight of Change Data Capture

6

Write-Ahead Logging (WAL)

Page 7: PostgreSQL + Kafka: The Delight of Change Data Capture

7

– https://www.postgresql.org/docs/current/static/wal-intro.html

“WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage”

Page 8: PostgreSQL + Kafka: The Delight of Change Data Capture

8

– https://www.postgresql.org/docs/9.4/static/logicaldecoding-explanation.html

“Logical decoding is the process of extracting all persistent changes to a database's tables into a coherent, easy to understand format which can be interpreted without detailed knowledge of the database's internal state.”

Page 9: PostgreSQL + Kafka: The Delight of Change Data Capture

9

Page 10: PostgreSQL + Kafka: The Delight of Change Data Capture

10

Topic Partitions

Page 11: PostgreSQL + Kafka: The Delight of Change Data Capture

11

Topics

Page 12: PostgreSQL + Kafka: The Delight of Change Data Capture

12

Compacted Topics

Page 13: PostgreSQL + Kafka: The Delight of Change Data Capture

13

https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/

Page 14: PostgreSQL + Kafka: The Delight of Change Data Capture

14

INSERT INTO transactions VALUES (56789, 20.00);

{ "transaction_id": {"int": 56789}, "amount": {"double": 20.00} }

Bottled Water - Message Key

{ "transaction_id": { "int": 56789 } }

Bottled Water - Message Value

Page 15: PostgreSQL + Kafka: The Delight of Change Data Capture

15

UPDATE transactions SET amount = 25.00 WHERE transaction_id = 56789;

{ "transaction_id": {"int": 56789}, "amount": {"double": 25.00} }

Bottled Water - Message Key

{ "transaction_id": { "int": 56789 } }

Bottled Water - Message Value

Page 16: PostgreSQL + Kafka: The Delight of Change Data Capture

16

DELETE FROM transactions WHERE transaction_id = 56789;

null

Bottled Water - Message Key

{ "transaction_id": { "int": 56789 } }

Bottled Water - Message Value

Page 17: PostgreSQL + Kafka: The Delight of Change Data Capture

17

tx-service

tx-postgres

Use Cases

Page 18: PostgreSQL + Kafka: The Delight of Change Data Capture

18

tx-service

tx-postgres

tx-pgkafka

Kafka topic: tx-pgkafka

Page 19: PostgreSQL + Kafka: The Delight of Change Data Capture

19

tx-service

tx-postgres

tx-pgkafka

demux-service

Kafka topic: tx-pgkafka

Page 20: PostgreSQL + Kafka: The Delight of Change Data Capture

20

tx-service

tx-postgres

tx-pgkafka

demux-service

Kafka topic: tx-pgkafka

Kafka topic: customers-table

Kafka topic: transactions-table

Page 21: PostgreSQL + Kafka: The Delight of Change Data Capture

21

tx-service

tx-postgres

tx-pgkafka

demux-service

activity-service

activity-postgres

activity-pgkafka

Kafka topic: tx-pgkafka

Kafka topic: customers-table

Kafka topic: transactions-table

Kafka topic: activity-pgkafka

Page 22: PostgreSQL + Kafka: The Delight of Change Data Capture

22

tx-service

tx-postgres

tx-pgkafka

demux-service

activity-service

activity-postgres

activity-pgkafka

Amazon Redshift (Data Warehouse)

Amazon S3 (Data Lake)

analytics-service

Kafka topic: tx-pgkafka

Kafka topic: customers-table

Kafka topic: transactions-table

Kafka topic: activity-pgkafka

Page 23: PostgreSQL + Kafka: The Delight of Change Data Capture

23

tx-service

tx-postgres

tx-pgkafka

demux-service

activity-service

activity-postgres

activity-pgkafka

Amazon Redshift (Data Warehouse)

Amazon S3 (Data Lake)

analytics-service

Kafka topic: tx-pgkafka

Kafka topic: customers-table

Kafka topic: transactions-table

Kafka topic: activity-pgkafka

Change Data Capture

Page 24: PostgreSQL + Kafka: The Delight of Change Data Capture

24

tx-service

tx-postgres

tx-pgkafka

demux-service

activity-service

activity-postgres

activity-pgkafka

Amazon Redshift (Data Warehouse)

Amazon S3 (Data Lake)

analytics-service

Kafka topic: tx-pgkafka

Kafka topic: customers-table

Kafka topic: transactions-table

Kafka topic: activity-pgkafka

Messaging

Page 25: PostgreSQL + Kafka: The Delight of Change Data Capture

25

tx-service

tx-postgres

tx-pgkafka

demux-service

activity-service

activity-postgres

activity-pgkafka

Amazon Redshift (Data Warehouse)

Amazon S3 (Data Lake)

analytics-service

Kafka topic: tx-pgkafka

Kafka topic: customers-table

Kafka topic: transactions-table

Kafka topic: activity-pgkafka

Analytics

Page 26: PostgreSQL + Kafka: The Delight of Change Data Capture

26

Recap

Commit logs: what are they?

Write-ahead logging (WAL)

Commit logs as a data store

Demo: change data capture

Use cases

Page 27: PostgreSQL + Kafka: The Delight of Change Data Capture

27

• Blog post on Simple’s CDC pipeline

• https://www.simple.com/engineering

• Bottled Water: https://github.com/confluentinc/bottledwater-pg

• Debezium (CDC to Kafka from Postgres, MySQL, or MongoDB)

• http://debezium.io/

• https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka

• https://www.confluent.io/kafka-summit-sf17/

• Martin Kleppmann, Making Sense of Stream Processing eBook

Also See…

Page 28: PostgreSQL + Kafka: The Delight of Change Data Capture

Thank You

28

Page 29: PostgreSQL + Kafka: The Delight of Change Data Capture

Extras

29

Page 30: PostgreSQL + Kafka: The Delight of Change Data Capture

30

The Dual Write Problem

https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/

Page 31: PostgreSQL + Kafka: The Delight of Change Data Capture

31

Redshift Architecture Amazon Redshift

Page 32: PostgreSQL + Kafka: The Delight of Change Data Capture

Replicating to Redshift

32

Page 33: PostgreSQL + Kafka: The Delight of Change Data Capture

33

Table Schema

CREATE TABLE pgkafka_txservice_transactions ( pg_lsn NUMERIC(20,0) ENCODE raw, pg_txn_id BIGINT ENCODE lzo, pg_operation CHAR(6) ENCODE bytedict, pg_txn_timestamp TIMESTAMP ENCODE lzo, ingestion_timestamp TIMESTAMP ENCODE lzo, transaction_id INT ENCODE lzo, amount NUMERIC(18,2) ENCODE lzo ) DISTKEY transaction_id SORTKEY (transaction_id, pg_lsn, pg_operation);

Amazon Redshift

Page 34: PostgreSQL + Kafka: The Delight of Change Data Capture

34

Deduplication

CREATE TABLE deduped LIKE pgkafka_txservice_transactions;

INSERT INTO deduped SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY pg_lsn ORDER BY ingestion_timestamp DESC) FROM pgkafka_txservice_transactions ) WHERE row_number = 1;

DROP TABLE pgkafka_txservice_transactions;

ALTER TABLE deduped RENAME TO pgkafka_txservice_transactions;

Amazon Redshift

Page 35: PostgreSQL + Kafka: The Delight of Change Data Capture

35

View of Current StateCREATE VIEW current_txservice_transactions AS SELECT transaction_id, amount, FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY transaction_id ORDER BY pg_lsn, pg_operation) AS n, COUNT(*) OVER (PARTITION BY transaction_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS c FROM pgkafka_txservice_transactions) WHERE n = c AND pg_operation <> 'delete';

Amazon Redshift


Top Related