high availability, disaster recovery and extreme read scaling using binlog servers

High Availability, Disaster Recovery

and Extreme Read Scaling

using Binlog Servers

Jean-François Gagné

jeanfrancois DOT gagne AT booking.com

Presented at Percona Live London 2014

Booking.com

2

Booking.com’

● Based in Amsterdam since 1996

● Online Hotel and Accommodation Agent: ● 135 offices worldwide

● +540.000 properties in 207 countries

● 42 languages (website and customer service)

● Part of the Priceline Group

● And we are using MySQL: ● >3000 servers, ~90% replicating

● ~100 masters: ~10 >50 slaves & ~4 >100 slaves

3

Binlog Server: Session Summary

1. Replication

2. What is the Binlog Server

3. Extreme Read Scaling

4. Remote Site Replication and Easy Disaster Recovery

5. Easy High Availability

6. Other Use-Cases

7. Impacts on the Ecosystem

4

Binlog Server: Replication

● One master / one or more slaves

● The master records all its writes in a journal: binary logs

● Each slave: ● Downloads the journal from the master

and saves it locally (IO thread): relay logs

● Executes the relay logs on the local database (SQL thread)

● Could produce binary logs to be a master to other slaves

● Replication is: ● Asynchronous lag slaves are eventually consistent

● Single threaded slower than the master

5

Binlog Server: Booking.com’’

● Typical replication deployment: ----- | M | ----- | +------+-- ... --+---------------+-------- ... | | | | ----- ----- ----- ----- | S1| | S2| | Sn| | M1| ----- ----- ----- ----- | +-- ... --+ | | ----- ----- | T1| | Tm| ----- -----

6

Binlog Server: What

● Binlog Server (BLS): is a daemon that: ● Downloads binary logs from the master

● Saves them in the same structure as the master

● Serves the binary logs to slaves

----- / \

| A | ---> / X \

----- -----

| |

----- -----

| B | | B |

----- -----

● A or X are the same from the point of view of B:

● By desing, the binlogs served by A and X are the same

7

Binlog Server: Read Scaling

● Typical replication topology for read scaling:

----- | M | ----- | +------+------+--- ... ---+ | | | | ----- ----- ----- ----- | S1| | S2| | S3| | Sn| ----- ----- ----- -----

● When too many slaves, NIC of M is overloaded: ● 100 slaves x 1Mbit/s very close to 1Gbit/s

● OSC or purging data in RBR becomes hard

● Slaves lag, or worst: unreachable master for writes

8

Binlog Server: Read Scaling’

● Typical solution: Intermediary Masters (IM):

----- | M | ----- +----------------+------ ... ------+ | | | ----- ----- ----- | M1| | M2| | Mm| ----- ----- ----- +------+ ... +--- ... +--- ... ---+ | | | | | ----- ----- ----- ----- ----- | S1| | S2| | T1| |...| | Zi| ----- ----- ----- ----- -----

● But IMs bring new problems: ● Lag of an IM all its slaves are lagging

● Failure of an IM all its slaves stop replicating

● Rogue transaction on an IM corruption of all its slave

9

Binlog Server: Read Scaling’’

● Can the IM problems be fixed ?

● Shared disk for HA: ● Filers or doubling the number of servers

● HA needs sync_binlog=1 + trx_commit=1

● After a crash of an IM: ● needs InnoDB recovery slaves mostly useless

● and cache is cold replication will lag

● GTIDs to the rescue: ● They allow slave repointing :-)

● But do not completely solve the lag problem :-(

● And we cannot migrate online :-( :-(

10

Binlog Server: Read Scaling’’’

● New Solution: BLS

----- | M | ----- +----------------+------ ... ------+ | | | / \ / \ / \ / I1\ / I2\ / Im\ ----- ----- ----- +------+ ... +--- ... +--- ... ---+ | | | | | ----- ----- ----- ----- ----- | S1| | S2| | Si| | Sj| | Sn| ----- ----- ----- ----- -----

● If a BLS fails, repoint its slaves to other BLSs: ● This is easy, the binlogs on all BLSs are the same by design:

the same as the one from the master

11

Binlog Server: Remote Site

● Typical deployment for remote site:

-----

| A |

-----

+------+------+---------------+

| | | |

----- ----- ----- -----

| B | | C | | D | | E |

----- ----- ----- -----

+------+------+

| | |

----- ----- -----

| F | | G | | H |

----- ----- -----

● E is an IM same problems as slave scaling.

12

Binlog Server: Remote Site’

● Ideally, we would like this:

-----

| A |

-----

+------+------+---------------+------+------+------+

| | | | | | |

----- ----- ----- ----- ----- ----- -----

| B | | C | | D | | E | | F | | G | | H |

----- ----- ----- ----- ----- ----- -----

● No lag and no Single Point of Failure (SPOF)

● But no master on remote site for writes (solved problem)

● And expensive in WAN bandwidth

13

Binlog Server: Remote Site’’

● New solution: a BLS on the remote site:

-----

| A |

-----

+------+------+---------------+

| | | |

----- ----- ----- / \

| B | | C | | D | / X \

----- ----- ----- -----

+------+------+------+

| | | |

----- ----- ----- -----

| E | | F | | G | | H |

----- ----- ----- -----

14

Binlog Server: Remote Site’’’

● Or deploy 2 BLSs to get better resilience:

----- | A | ----- +------+------+---------------+ | | | | ----- ----- ----- / \ / \ | B | | C | | D | / X \ ------> / Y \ ----- ----- ----- ----- ----- +------+ +------+ | | | | ----- ----- ----- ----- | E | | F | | G | | H | ----- ----- ----- -----

● If Y fails, repoint G and H to X.

● If X fails, repoint Y to A and E and F to Y.

15

Binlog Server: Remote Site’’’’

● Interesting property: In case of a failure of A, E, F G and H converge to a common state.

----- | A | ----- +------+------+---------------+ | | | | ----- ----- ----- / \ | B | | C | | D | / X \ ----- ----- ----- ----- +------+------+------+ | | | | ----- ----- ----- ----- | E | | F | | G | | H | ----- ----- ----- -----

● New master election is easy on remote site.

16

Binlog Server: High Availability

● This property can be used for HA:

-----

| A |

-----

|

/ \

/ X \

-----

+------+------+------+------+------+------+------+

| | | | | | | |

----- ----- ----- ----- ----- ----- ----- -----

| B | | C | | D | | E | | F | | G | | H | | I |

----- ----- ----- ----- ----- ----- ----- -----

17

Binlog Server: HA, DR, and RS

● With this deployment spanning many data centers:

-----

| M |

-----

|

+--- ... ---+--- ... ------------+------------ ...

| | |

/ \ / \ / \ / \

/ I1\ / Ix\ / J1\ ----> / Jy\

----- ----- ----- -----

| | | |

+-- ... +-- ... +-- ... +-- ... --+

| | | | |

----- ----- ----- ----- -----

| S1| | Si| | T1| | Ti| | Tn|

----- ----- ----- ----- -----

18

Binlog Server: HA, DR, and RS’

● If M fails:

-----

| M | <--- Failed master

-----

/--- Most up to date BLS

/

/

/ \ / \ / \ / \

/ I1\ <---- / Ix\ -------------> / J1\ ----> / Jy\

----- ----- ----- -----

| | | |

+-- ... +-- ... +-- ... +-- ... --+

| | | | |

----- ----- ----- ----- -----

| S1| | Si| | T1| | Ti| | Tn|

----- ----- ----- ----- -----

19

Binlog Server: HA, DR, and RS’’

● A primary BLS on all sites might simplify things:

-----

| M |

-----

|

+--------------- ... ------------+------------ ...

| |

/ \ / \ / \ / \

/ I1\ ----> / Ix\ / J1\ ----> / Jy\

----- ----- ----- -----

| | | |

+-- ... +-- ... +-- ... +-- ... --+

| | | | |

----- ----- ----- ----- -----

| S1| | Si| | T1| | Ti| | Tn|

----- ----- ----- ----- -----

20

Binlog Server: Other Use-Cases

● Better Crash-Safe Replication

● http://blog.booking.com/

better_crash_safe_replication_for_mysql.html

● And Better Parallel Replication

21

http://blog.booking.com/better_crash_safe_replication_for_mysql.html




Binlog Server: Better // Replication

● What is parallel (//) replication:

● transactions committing together on the master

are executed in parallel on slaves

● In other words:

transactions finishing at the same time on the master

are started at the same time on the slave

No guarantee that these transactions will complete at the

same time on the slave

Impact when we have Intermediate Masters ?

22

Binlog Server: Better // Replication’

● Four transactions on X, Y and Z: ----- | X | ----- | ----- | Y | ----- | ----- | Z | -----

● IM might stall the // replication pipeline

● To fully benefit from // replication, IM must disappear

● The Binlog Server allows exactly that

23

On Y:

----Time---->

B---C

B---C

B-------C

B-------C

On Z:

----Time--------->

B---C

B---C

B-------C

B-------C

On X:

----Time---->

T1 B---C

T2 B---C

T3 B-------C

T4 B-------C

Binlog Server: Impact on Ecosystem

● MHA (and other HA tools): ● Parsing relay logs is not needed anymore

● Promoting a new master is always needed

● GTIDs: ● Less useful in replication topologies

● Still useful in Group Communication solutions

● Binlog Tailers + Semi-Sync Replication: ● s/Binlog Tailers/Binlog Servers/

24

Binlog Server: Impact on Ecosystem’

● Intermediary Master: ● GTIDs only solve one of its problem

● Even with GTIDs, there is still lag/delay

● Without GTIDs: ● SPOF for all its slaves

● Or performance killer if we deploy HA with shared disk

● Replicating through an IM looks wrong

log-slave-update might become less useful

● The Binlog Servers should work with any version of MySQL (5.7, 5.6, 5.5 and 5.1) or MariaDB (10.1, 10.0, 5.5. 5.3, 5.2 and 5.1).

25

Binlog Server: Links

● http://blog.booking.com/

● http://blog.booking.com/mysql_slave_scaling_and_more.html

● http://blog.booking.com/better_crash_safe_replication_for_mysql.html

● http://blog.booking.com/better_parrallel_replication_for_mysql.html

● https://workingatbooking.com/

● https://mariadb.com/blog/mariadb-replication-maxscale-and-need-binlog-server

● https://mariadb.com/blog/maxscale-proxy-mysql-replication-relay

● https://mariadb.com/blog/maxscale-proxy-replication-relay-part-2-slave-side

Soon: the MaxScale Binlog Server Plugin.

26

http://blog.booking.com/


http://blog.booking.com/mysql_slave_scaling_and_more.html

http://blog.booking.com/mysql_slave_scaling_and_more.html





https://workingatbooking.com/

https://workingatbooking.com/

https://mariadb.com/blog/mariadb-replication-maxscale-and-need-binlog-server














https://mariadb.com/blog/maxscale-proxy-mysql-replication-relay










https://mariadb.com/blog/maxscale-proxy-replication-relay-part-2-slave-side
















Questions

Jean-François Gagné

jeanfrancois DOT gagne AT booking.com

high availability, disaster recovery and extreme read scaling using binlog servers

Technology