webinar: replication and replica sets

78
1 Summer 2013 Replication and Replica Sets Member of Technical Staff, 10gen William Zola

Upload: mongodb

Post on 16-Nov-2014

1.790 views

Category:

Technology


2 download

DESCRIPTION

MongoDB supports replication for failover and redundancy. In this session we will introduce the basic concepts around replica sets, which provide automated failover and recovery of nodes. We'll cover how to set up, configure, and initiate a replica set; methods for using replication to scale reads; and proper architecture for durability.

TRANSCRIPT

Page 1: Webinar: Replication and Replica Sets

1

Summer 2013

Replication and Replica Sets

Member of Technical Staff, 10genWilliam Zola

Page 2: Webinar: Replication and Replica Sets

2

Why Replication?

To keep your data safe

Page 3: Webinar: Replication and Replica Sets

3

Why Replication?

To keep your data available

Page 4: Webinar: Replication and Replica Sets

4

Why Replication?

Because bad things happen to good data centers

Page 5: Webinar: Replication and Replica Sets

5

What is replication and why do we need it?

Replication

ImportantData

Copy of Important

Data

Copy of Important

Data

Page 6: Webinar: Replication and Replica Sets

6

• Using replica sets for high availability– PRIMARY, SECONDARY, and ARBITER nodes– PRIMARY elections

• Using replica sets for disaster recovery• Configure a replica set so there’s no single point of

failure• No-Downtime Maintenance• Durability in a networked environment

Agenda

Page 7: Webinar: Replication and Replica Sets

7

• Not new to DBA or System Administration• New to MongoDB or MongoDB replication

Audience

Page 8: Webinar: Replication and Replica Sets

8

Use Cases

Page 9: Webinar: Replication and Replica Sets

9

Stakeholders

Page 10: Webinar: Replication and Replica Sets

10

• High Availability (automatic failover)

Use Cases

Page 11: Webinar: Replication and Replica Sets

11

• High Availability (automatic failover)

• Disaster Recovery

Use Cases

Page 12: Webinar: Replication and Replica Sets

12

• High Availability (automatic failover)

• Disaster Recovery

• No downtime for maintenance– Backups– Maintenance (index rebuilds, compaction)

Use Cases

Page 13: Webinar: Replication and Replica Sets

13

• High Availability (automatic failover)

• Disaster Recovery

• No downtime for maintenance– Backups– Maintenance (index rebuilds, compaction)

• Replica Set is "transparent" to the application

Use Cases

Page 14: Webinar: Replication and Replica Sets

14

• High Availability (automatic failover)

• Disaster Recovery

• No downtime for maintenance– Backups– Maintenance (index rebuilds, compaction)

• Replica Set is "transparent" to the application

• Read Scaling (extra copies to read from)

Use Cases

Page 15: Webinar: Replication and Replica Sets

15

MongoDB Replication Basics

Page 16: Webinar: Replication and Replica Sets

16

Replica Set Features

• A cluster of N servers• Any (one) node can be

primary• All writes to primary• Reads go to primary (default)

optionally to a secondary

• Consensus election of primary• Automatic failover• Automatic recovery

Node 3

Node 1

Node 2

Primary WRITESREADS

READS

Pick me!

READS

Page 17: Webinar: Replication and Replica Sets

17

• Replica set is two or more nodes

Node 1

Node 2P

Node 3

How MongoDB Replication works

Page 18: Webinar: Replication and Replica Sets

18

• Election establishes the PRIMARY• Data replicates from PRIMARY to SECONDARIES

Node 1

Node 2 Primary

Node 3

How MongoDB Replication works

data data

Page 19: Webinar: Replication and Replica Sets

20

Planned– Hardware upgrade– O/S or file-system tuning– Relocation of data to new file-system / storage– Software upgrade

Unplanned– Hardware failure– Data center failure– Region outage– Human error– Application corruption

Types of outage

AUTOMATIC FAILOVER

MAINTENANCEw/o DOWNTIME

Page 20: Webinar: Replication and Replica Sets

21

Mechanics of Automatic Failover

Page 21: Webinar: Replication and Replica Sets

22

• Data replicates from PRIMARY to SECONDARIES

Node 1

Node 2 Primary

Node 3

Mechanics of Automatic Failover

data data

Page 22: Webinar: Replication and Replica Sets

23

• Election establishes the PRIMARY• Data replicates from PRIMARY to SECONDARIES• Primary might FAIL

Node 1

Node 2 Primary

Node 3

Mechanics of Automatic Failover

data data data

data

dat

a da

ta

data data data

data

dat

a da

ta

Page 23: Webinar: Replication and Replica Sets

24

Node 1 Node 3

• Automatic election of new PRIMARY if majority exists

Node 2 DOWN

negotiate new primary

Mechanics of Automatic Failover

Page 24: Webinar: Replication and Replica Sets

25

Node 1 Node 3

Node 2 DOWN

negotiate new master

Mechanics of Automatic Failover

New PRIMARY elected

Primary

Page 25: Webinar: Replication and Replica Sets

26

Node 1 Node 3

Node 2RECOVERING

negotiate new master

Primary

Mechanics of Automatic Failover

Automatic Recovery of Failed Node

Can performfull resync from secondaryif necessary

Page 26: Webinar: Replication and Replica Sets

27

• Once caught-up resumes syncing from primary• Original replica set configuration is re-established

Node 1

Node 2

Node 3

Mechanics of Automatic Failover

Primary

Page 27: Webinar: Replication and Replica Sets

28

Cluster Size and Rules of Failover

Page 28: Webinar: Replication and Replica Sets

29

Primary Election

Primary

Secondary

Secondary

As long as a partition can see a majority (>50%) of the cluster, then it will elect a primary.

Must have a STRICT majority to be elected primary!!!

Page 29: Webinar: Replication and Replica Sets

30

Simple Failure

Primary

Failed Node

Secondary

66% of cluster visible. Primary is elected

Secondary

Page 30: Webinar: Replication and Replica Sets

31

Failed Node

33% of cluster visible. Read only mode.

Failed Node

Secondary

Simple Failure

Secondary

Secondary

Primary

Page 31: Webinar: Replication and Replica Sets

32

Network Partition

Primary

Secondary

Secondary

Page 32: Webinar: Replication and Replica Sets

33

Network Partition

Primary

Secondary

Secondary

Primary

Failed Node

Secondary

66% of cluster visible.

Primary is elected

Page 33: Webinar: Replication and Replica Sets

34

Secondary

Network Partition

33% visible. Read only mode.

Primary

Secondary

Failed Node

Failed Node

Secondary

Page 34: Webinar: Replication and Replica Sets

35

Secondary

No “Split Brain” Problem

Primary

Secondary

A node must be elected by a strict majority of the set in order to be a primary• Only the primary node

can accept writes• A replica set never has

two primary nodes

Page 35: Webinar: Replication and Replica Sets

36

Even Cluster Size

Primary

Secondary

Secondary

Secondary

Page 36: Webinar: Replication and Replica Sets

37

Primary

Secondary

Secondary

Secondary

Failed Node

Secondary

Failed Node

50% of cluster visible. Read only mode.

Secondary

Even Cluster Size

Page 37: Webinar: Replication and Replica Sets

38

Primary

Secondary

Failed Node

Secondary

Failed Node

50% of cluster visible. Read only mode.

Secondary

Secondary

Secondary

Even Cluster Size✗ODD = good

Page 38: Webinar: Replication and Replica Sets

39

Types of Nodes

Regular • Regular node holds a copy of your data

• Arbiter node has no data• but it can vote! use to break ties

Secondary

Secondary

Arbiter

• Secondary / All data Nodes• different priorities• other configuration options

Primary• Primary

• A data node that won the election

Page 39: Webinar: Replication and Replica Sets

40

Add an Arbiter!

Primary

Secondary

Secondary

Secondary

Arbiter

Add an arbiter node to break ties

• Odd number of votes in set• Arbiter is lightweight – does

not store data

Page 40: Webinar: Replication and Replica Sets

42

High Availability

Page 41: Webinar: Replication and Replica Sets

43

High Availability

Page 42: Webinar: Replication and Replica Sets

44

No Downtime Maintenance

1. Take secondary out of set

2. Perform maintenance

3. Replace secondary in set

4. Wait for it to catch up

Secondary

Secondary

Secondary

Primary1. Take secondary out of set

2. Perform maintenance

3. Replace secondary in set

4. Wait for it to catch up✓

Page 43: Webinar: Replication and Replica Sets

45

No Downtime Maintenance

1. Take secondary out of set

2. Perform maintenance

3. Replace secondary in set

4. Wait for it to catch up

Secondary

Secondary

5. Step down the primary

(wait for new primary to be elected)

6. Repeat steps 1-4

Secondary

Primary

Primary✓

Page 44: Webinar: Replication and Replica Sets

46

Primary

Arbiter

Secondary

Is this a good configuration?

2 Replicas + Arbiter??

Page 45: Webinar: Replication and Replica Sets

47

Primary

Arbiter

Secondary

2 Replicas + Arbiter??

1. Take secondary out of set

2. Perform maintenance

3. Primary node crashes– Uh-oh!– Replica set is down– Data from the primary hasn’t

been replicated

Page 46: Webinar: Replication and Replica Sets

48

Use Three Data Nodes!

Primary

Secondary

Secondary

Use a minimum of three data nodes to assure high availability

Page 47: Webinar: Replication and Replica Sets

49

Avoid Single Points of Failure

Page 48: Webinar: Replication and Replica Sets

50

Avoid Single Points of Failure

Page 49: Webinar: Replication and Replica Sets

51

Avoid Single points of failure

Primary

Secondary

Secondary

Top of rack switch

Rack falls over

Page 50: Webinar: Replication and Replica Sets

52

Better

Primary

Secondary

Secondary

Loss of internet

DC burns down

Page 51: Webinar: Replication and Replica Sets

53

Even Better

Secondary

Secondary

Primary

San Francisco

Dallas

Page 52: Webinar: Replication and Replica Sets

54

Priorities

Secondary

Secondary

Primary

San Francisco

Dallas

Priority 1

Priority 1

Priority 0

Disaster recover data center. Will never become primary automatically.

Page 53: Webinar: Replication and Replica Sets

55

Even Better

Primary

Secondary

Secondary

San Francisco

Dallas

New York

Secondary

Secondary

Page 54: Webinar: Replication and Replica Sets

56

Node Priority

Primary

Secondary

Secondary

Secondary

Secondary

Priority 10

Priority 10

Priority 5

Priority 5

Priority 0 Dallas

New York

SanFrancisco

Page 55: Webinar: Replication and Replica Sets

57

Node Sizing

Primary

Secondary

Secondary

Secondary

Secondary

Priority 10

Priority 10

Priority 5

Priority 5

Priority 0 Dallas

New York

SanFrancisco

Nodes that can become primary should be sized equally

• RAM • Disk• IOPS

Page 56: Webinar: Replication and Replica Sets

58

Recap

Page 57: Webinar: Replication and Replica Sets

59

Replica Set Review

Primary

Secondary

Secondary

Replica set contains N nodes• At most one node is the

PRIMARY• All writes go to the PRIMARY• SECONDARY nodes contain

up-to-date copies of the data• SECONDARY nodes

continually copy data from the PRIMARY

WRITES

Page 58: Webinar: Replication and Replica Sets

60

Failover Review

Primary

Secondary

Secondary

If the PRIMARY fails, the Replica Set can elect a new PRIMARY

• A strict (>50%) majority is required for election

• The former PRIMARY will rejoin the set as a SECONDARY when it recovers

WRITES

Page 59: Webinar: Replication and Replica Sets

61

Partition Review

A Network Partition prevents the nodes from communicating

• The Replica Set treats a partition as a “down node”

• A node must get a strict majority of the votes to be elected PRIMARY

• Even numbers of votes reduce availability

• Use Arbiters to break ties• Spread your nodes across multiple

data centers

Secondary

Primary

Secondary

Page 60: Webinar: Replication and Replica Sets

62

Using Applications with Replica Sets

Page 61: Webinar: Replication and Replica Sets

63

Application View

ApplicationCode Here

MongoDBDriver

Page 62: Webinar: Replication and Replica Sets

64

Replica Set

Under the Covers

ApplicationCode Here

MongoDBDriver

Secondary

Secondary

Primary

Replica Set Connection:

my-set/host1:27017,host2:27017,host3:27017

Page 63: Webinar: Replication and Replica Sets

65

Replica Set

Secondary Reads

ApplicationCode Here

MongoDBDriver

Secondary

Secondary

Primary

Potentially Stale!

Page 64: Webinar: Replication and Replica Sets

66

Replica Set

Failover

MongoDBDriver

Secondary

Secondary

Primary✗Connection Exception

ApplicationCode Here

Page 65: Webinar: Replication and Replica Sets

67

Replica Set

New Election

ApplicationCode Here

MongoDBDriver

Secondary

Secondary

Primary

Secondary✗

Page 66: Webinar: Replication and Replica Sets

68

Durability and Replica Sets

Page 67: Webinar: Replication and Replica Sets

69

• Wikipedia:– In database systems, durability is the ACID property which

guarantees that transactions that have committed will survive permanently.

Durability

Page 68: Webinar: Replication and Replica Sets

70

The Lifetime of a Write Operation (single-node)

ApplicationCode Here

MongoDBDriver

Journal Data in RAM

Network Write

Validate Data

Update RAM Update Journal

Page 69: Webinar: Replication and Replica Sets

71

Get Last Error

ApplicationCode Here

MongoDBDriver

Journal Data in RAM

Network Write

getLastError command

getLastError ResultValidate Data

Page 70: Webinar: Replication and Replica Sets

72

Write Concern

MongoDBDriver

Network Write

getLastError command

getLastError Result

Network Acknowledgement {w:0}

Check for Error {w:1}

Journal Sync {j:1}

Page 71: Webinar: Replication and Replica Sets

76

Replica Sets and Durability

Primary

Secondary

Secondary

Secondary

Secondary

A write that has replicated to a majority of the nodes is durable

• The most up-to-date node will be elected primary

• The write will be present on that node

No guarantee of which nodes will have the write

• Use “tag sets” for finer-grained control

Durable!

Page 72: Webinar: Replication and Replica Sets

77

Network Write Concern

MongoDBDriver

Network Write

getLastError command

getLastError Result

Specific Number of Nodes

{w:2}

Majority of Data Nodes {w: ’majority’}

Tag Set {w: “my tag set”}

Wait for timeout {w:2, wtimeout:2000}

Replica Set

Primary

Secondary

Secondary

Page 73: Webinar: Replication and Replica Sets

78

Wrapping it Up

Page 74: Webinar: Replication and Replica Sets

79

Why Replication?

To keep your data safe and available

Page 75: Webinar: Replication and Replica Sets

80

• High Availability (auto-failover)

• Disaster Recovery

• No downtime for maintenance

• Replica Set is "transparent" to the application

• Writes are durable with appropriate Write

Concern

Features

Page 76: Webinar: Replication and Replica Sets

81

• Easy to setup – Try on a single machine– Multiple nodes with different ports on a single

host

• Check on-line documentation for RS tutorials– http://docs.mongodb.org/manual/replication/

#tutorials

Just Use It!

Page 77: Webinar: Replication and Replica Sets

82

Questions?

Page 78: Webinar: Replication and Replica Sets

83

Thank You!