rtfm - fosdem 2020 · real time replica from mysql with pg chameleon replay chunk 100k rows...

43
RTFM Federico Campoli FOSDEM PGDay, Brussels, 02 February 2020 Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 1 / 43

Upload: others

Post on 05-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

RTFM

Federico Campoli

FOSDEM PGDay, Brussels, 02 February 2020

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 1 / 43

Page 2: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Few words about the speaker

Born in 1972

Passionate about IT since 1982

Joined the Oracle DBA secret society in 2004

In love with PostgreSQL since 2006

PostgreSQL tattoo on the right shoulder

Freelance devops and data engineer

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 2 / 43

Page 3: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Getting in touch

Blog: https://pgdba.org

Twitter: @4thdoctor scarf

Github: https://github.com/the4thdoctor

Linkedin: https://www.linkedin.com/in/federicocampoli/

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 3 / 43

Page 4: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

RTFM

Acronym for Read The F... ManualWhat the F stands for?

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 4 / 43

Page 5: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

RTFM

Fantastic?

Fabulous?

Funny?

Fancy?

Well...

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 5 / 43

Page 6: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

RTFM

F******

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 6 / 43

Page 7: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Defcon levels

Defcon 5: Startling noise, DBA vaguely impressed

Defcon 4: Tripping over feet, DBA alarmed

Defcon 3: Earthquake, DBA jumping on the seat

Defcon 2: Asteroid dropping from the sky, DBA freaking out

Defcon 1: DALEKS!, DBA going berserk

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 7 / 43

Page 8: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Dramatis personae

Who didn’t RTFM?

Myself

Image source wargames Wikia https://war-games.fandom.com/

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 8 / 43

Page 9: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Dramatis personae

Who didn’t RTFM?

Others

Image source wargames Wikia https://war-games.fandom.com/

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 9 / 43

Page 10: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Table of contents

1 The Butler Did It

2 Emergency shutdown

3 Cast a spell

4 Wrap up

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 10 / 43

Page 11: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

The Butler Did It

Defcon 4

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 11 / 43

Page 12: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

The Butler Did It

Context

System with one expensive FusionIO card

Working queue organised on a single table

Table with about 100 million rows

Two timestamp fields

Updated twice at the start and the end of the processing

For each row

Which had an average width of 160 bytes

On a table with three indices

Plus the primary key...

Spanning on two integer fields

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 12 / 43

Page 13: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

The Butler Did It

On a SSD writes are limited

The design caused 1.5 GB/s block writes on the pg xlog(on rotating disk thanks goodness)

The data files were hit even harder

The table was completely rewritten every day

Autovacuum starting every 6 hours flushing even more data to disk

In just 8 months the available writes dropped from 80% to 44%

At that rate only 8 months left before the doomsday

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 13 / 43

Page 14: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

The Butler Did It

DISK IN READ ONLY MODE

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 14 / 43

Page 15: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

The Butler Did It

How it was fixed:

The primary key was aggregated in a separate table

The first field had a common value shared with the second field

The first field was used as grouping key

The second field was aggregated into an integer[]

The queue was implemented using the physical position inside the array

After the deploy the WAL generation rate dropped to 40 MB/s

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 15 / 43

Page 16: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

The Butler Did It

Figure: The doomsday countdown

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 16 / 43

Page 17: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

The Butler Did It

RTFM

MVCC Unmasked (slides): http://momjian.us/main/writings/pgsql/mvcc.pdf

MVCC Unmasked (video): https://www.youtube.com/watch?v=sq aO34SWZc

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 17 / 43

Page 18: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

Defcon 1

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 18 / 43

Page 19: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

Context

Virtual machine for business intelligence

Fairly big database about 1.4 TB

Real time replica from MySQL with pg chameleon

Replay chunk 100k rows

pl/pgSQL function to replay data and manage errors

Monitoring yet to be implemented

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 19 / 43

Page 20: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

The day started normally

Suddenly people got errors on their queries

The database was up as usual

However the nightly maintenance failed!

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 20 / 43

Page 21: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

ERROR: database is not accepting commands

to avoid wraparound data loss in database "analytics"

HINT: Stop the postmaster and vacuum

that database in single -user mode.

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 21 / 43

Page 22: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

Caused by

Insufficient autovacuum processes

An (apparently) undocumented behaviour of the pl/pgSQL functions

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 22 / 43

Page 23: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

A PostgreSQL function is single transaction

Normally consumes just one XID

However DML in blocks with EXCEPTION consume an extra XID every time the DML isexecuted

The pl/pgSQL function in pg chameleon

Replays the DML in a FOR LOOP with an EXCEPTION

Consuming an XID for each statement replayed

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 23 / 43

Page 24: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

Consider a table with one field, integer.

CREATE TABLE t_test

(

id integer

);

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 24 / 43

Page 25: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

A function inserts into the table with a FOR LOOP.

CREATE OR REPLACE FUNCTION fn_loop_noexception ()

RETURNS VOID AS

$BODY$

DECLARE

v_loop integer;

BEGIN

FOR v_loop IN 1..1000

LOOP

INSERT INTO t_test(id) VALUES (v_loop);

END LOOP;

END;

$BODY$

LANGUAGE plpgsql;

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 25 / 43

Page 26: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

Another function does the same but with an EXCEPTION block.

CREATE OR REPLACE FUNCTION fn_loop_withexception ()

RETURNS VOID AS

$BODY$

DECLARE

v_loop integer;

BEGIN

FOR v_loop IN 1..1000

LOOP

BEGIN

INSERT INTO t_test(id) VALUES (v_loop);

EXCEPTION

WHEN OTHERS

THEN

NULL;

END;

END LOOP;

END;

$BODY$

LANGUAGE plpgsql;

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 26 / 43

Page 27: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

Function without the exception block

test=# SELECT datname ,age(datfrozenxid) FROM pg_database WHERE datname=’test’;

datname | age

---------+-----

test | 3

(1 row)

test=# SELECT fn_loop_noexception ();

fn_loop_noexception

---------------------

(1 row)

test=# SELECT datname ,age(datfrozenxid) FROM pg_database WHERE datname=’test’;

datname | age

---------+-----

test | 4

(1 row)

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 27 / 43

Page 28: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

Function with the exception block

test=# SELECT datname ,age(datfrozenxid) FROM pg_database WHERE datname=’test’;

datname | age

---------+-----

test | 5

(1 row)

test=# SELECT fn_loop_withexception ();

fn_loop_withexception

-----------------------

(1 row)

test=# SELECT datname ,age(datfrozenxid) FROM pg_database WHERE datname=’test’;

datname | age

---------+------

test | 1006

(1 row)

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 28 / 43

Page 29: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

How it was fixed:

Silence slack

Move in an empty meeting room

Put the message I KNOW!!! on the entrance door

Start the cluster in single user mode

Get the ageing tables

Vacuum the ageing tables

Do a postmortem analysis

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 29 / 43

Page 30: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

Handy query to get the ageing tables

SELECT

sch.nspname as schema_name ,

tab.relname as table_name ,

greatest(age(tab.relfrozenxid),age(toa.relfrozenxid)) as age

FROM

pg_class toa

LEFT JOIN pg_class tab

ON tab.reltoastrelid = toa.oid

INNER JOIN pg_namespace sch

ON tab.relnamespace=sch.oid

WHERE

tab.relkind IN (’r’, ’m’)

ORDER BY age DESC

;

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 30 / 43

Page 31: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Emergency shutdown

RTFM

I wasn’t able to find this case on the documentation

The warning box just says that functions with EXCEPTION are more expensive.

It may be useful to add a warning explaining that the XID are consumed by DML withinan EXCEPTION block.

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 31 / 43

Page 32: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Cast a spell

Defcon 5

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 32 / 43

Page 33: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Cast a spell

Context

Fairly large database, 2 TB

Horrible design

Table with metrics mediated from java straight into an hstore field

Query runtime to retrieve 150 rows about 6 minutes

Despite the super expensive storage (FusionIO)

And the super expensive cpu and memory (state of the art in 2013)

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 33 / 43

Page 34: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Cast a spell

Execution plan showed nothing wrong

Fixed a subquery with the wrong join criteria with no success

When using the form SELECT * FROM the query performed much faster

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 34 / 43

Page 35: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Cast a spell

What went wrong

Addressing an HSTORE key returns a TEXT data type

The developers, instead of doing a cast

Decided to write a pl/pgsql function for each type they wanted to cast

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 35 / 43

Page 36: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Cast a spell

The cast function was something like that

CREATE OR REPLACE FUNCTION cast_to_integer(text ,hstore)

RETURNS integer AS

$BODY$

DECLARE

keyvalue ALIAS FOR $1;

metastore ALIAS FOR $2;

BEGIN

RETURN (metastore ->keyvalue):: integer;

END;

$BODY$

LANGUAGE plpgsql;

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 36 / 43

Page 37: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Cast a spell

And the select list was something like thatSELECT

url ,

id,

rtype ,

page_title ,

cast_to_integer(’tw_cnt ’,metastore) AS tw_cnt ,

cast_to_integer(’fb_lk’,metastore) AS fb_lk ,

cast_to_integer(’yt_vw’,metastore) AS yt_vw ,

cast_to_integer(’tw_fl’,metastore) AS tw_fl ,

cast_to_integer(’snp_sb ’,metastore) AS snp_sb ,

cast_to_integer(’avg_vs ’,metastore) AS avg_vs ,

cast_to_integer(’trk_lnk ’,metastore) AS trk_lnk ,

cast_to_integer(’avg_pg ’,metastore) AS avg_pg ,

cast_to_integer(’impr’,metastore) AS impr ,

cast_to_integer(’frm_pst ’,metastore) AS frm_pst ,

cast_to_integer(’outr’,metastore) AS outr ,

.

.

FROM ............

;

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 37 / 43

Page 38: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Cast a spell

How it was fixed:

Have an argument with the developers

Rewrite the join to perform better

Get rid of all of the functions in the select list

The query duration dropped to 10 seconds

Live long and prosper

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 38 / 43

Page 39: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Cast a spell

RTFM

https://www.postgresql.org/docs/9.1/plpgsql.html

Ask the community for advice before implementing stuff

Mailing lists: https://www.postgresql.org/list/IRC on freenode: channel #postgresqlSlack: https://postgres-slack.herokuapp.com/Telegram: https://t.me/pg sql

Hire a DBA

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 39 / 43

Page 40: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

Wrap up

Image source imgur https://imgur.com/gallery/mUeRW0e

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 40 / 43

Page 41: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

License

This work is licensed under a Creative Commons “Attribution-NonCommercial-ShareAlike 4.0 International” license.

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 41 / 43

Page 42: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

That’s all folks!

Thank you for listening!

Any questions?Copyright by dan232323 http://dan232323.deviantart.com/art/Pinkie-Pie-Thats-All-Folks-454693000Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 42 / 43

Page 43: RTFM - FOSDEM 2020 · Real time replica from MySQL with pg chameleon Replay chunk 100k rows pl/pgSQL function to replay data and manage errors Monitoring yet to be implemented Federico

RTFM

Federico Campoli

FOSDEM PGDay, Brussels, 02 February 2020

Federico Campoli RTFM FOSDEM PGDay, Brussels, 02 February 2020 43 / 43