drbd é um amigo!
DESCRIPTION
TRANSCRIPT
![Page 1: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/1.jpg)
DRBD é um amigo!
![Page 2: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/2.jpg)
What is DRBD?
• DRBD is a block device designed as a building block to form HA clusters.
• This is done by mirroring a whole block device via an assigned network.
• DRBD can be understand as network based RAID1.
• T uses DRBD-8.2, S uses DRBD-8.4(may change in the future).
![Page 3: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/3.jpg)
Block device (Kernel component)
File system
Buffer cache
Block device
Disk sched
Disk driver
![Page 4: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/4.jpg)
DRBD sends I/O to the other node
File system
Buffer cache
DRBD
Disk sched
Disk driver
WRITE ops are sent to secondary over network
![Page 5: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/5.jpg)
Data flow in kernel land
![Page 6: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/6.jpg)
How to set up DRBD
• Prepare DRBD partitions• Create setup files
/etc/drbd.conf (DRBD-8.2)/etc/drbd.d/global_common.conf (DRBD-8.4)/etc/drbd.d/r0.res,r1.res (DRBD-8.4)
• Start DRBD sync
![Page 7: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/7.jpg)
DRBD settings• In DRBD-8.2,
all the settings are in /etc/drbd.conf• In DRBD-8.4,
global settings in /etc/drbd.d/global_common.confresource level settings in /etc/drbd.d/r<N>.res
• Sample: http://www.drbd.org/users-guide/re-drbdconf.html
• HA1 and HA2 have the identical DRBD config files• Usage-count (always no)• Protocol (C WRITE completes when reached the other node as well)• Sync rate (100MB/sec for sync no need for 10Gb NIC)• Partition name (device minor # for /dev/drbdN)• Node name / IP address / port number
![Page 8: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/8.jpg)
Sampe drbd.conf (1)
• global {• usage-count no;• }• common {• net {• protocol C;• }• syncer {• rate 100M;• }• }
![Page 9: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/9.jpg)
Sample drbd.conf (2)• resource r0 {• protocol C;• on Machine-HA1 { (must match what “uname –n” says on HA1)• device /dev/drbd1;• disk /dev/disk/by-label/XX;• address 10.0.128.17:7788;• }• on Machine-HA2 { (must match what “uname –n” says on HA2)• device /dev/drbd1;• disk /dev/disk/by-label/XX;• address 10.0.128.18:7788;• }• }
• [root@Machine-HA2 ~]# uname -n• Machine-HA2• [root@Machine-HA2 ~]#
![Page 10: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/10.jpg)
Resource and Role
• In DRBD, every resource (partition) has a role, which may be primary or secondary.
• A primary DRBD device can be used for any read/write operations.
• A DRBD secondary device can NOT be used for any read/write operations.
• Secondary only receives WRITEs from primary.
![Page 11: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/11.jpg)
Connection state
• DRBD always uses bond1HA1: 10.0.128.17 (ping drbd1)HA2: 10.0.128.18 (ping drbd2)
![Page 12: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/12.jpg)
Monitor DRBD (1)Healthy state
Shutdown bond1
![Page 13: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/13.jpg)
Monitor DRBD (2)
Enabled bond1 again
DRBD became WFC status (Waiting For Connection)
![Page 14: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/14.jpg)
Nothing can separate DRBD
![Page 15: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/15.jpg)
Nada pode separar DRBD
![Page 16: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/16.jpg)
What causes DRBD problems
There are 3 types of problems.1. Network error (bond1)
Outdated2. Disk error (disk error or filesystem error)
Diskless3. Role change without sync
(typically caused by multiple host reboots)
Inconsistent
![Page 17: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/17.jpg)
1. Network problem
• When bond1 stops working between HA1 and HA2, DRBD devices on standby node becomes Outdated
How to fix? • Fix the network issue at first.• Then DRBD will fix automatically.• Without heartbeat, you may need manual
intervention.
![Page 18: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/18.jpg)
Healthy State
![Page 19: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/19.jpg)
Bond1 stopped (ifdown bond1)
CS (connection Status) becomes WFConnection (Waiting For Connection).ST (Status) becomes Unknown on peer side.DS (Disk Status) becomes Outdated on secondary devices.
![Page 20: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/20.jpg)
How to fix
• Find where the problem is. It can be bond1 on HA1 or bond1 on HA2, or the network cable.
• Fix the network issue.• Then the DRBD problem will be fixed
automatically.• If heartbeat is NOT running, DRBD may not be
fixed automatically.
![Page 21: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/21.jpg)
Disk I/O error on secondary
• DRBD device will be Detached automatically upon disk error.
• drbd.confResource r0 { disk { on-io-error detach; }}
![Page 22: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/22.jpg)
Disk I/O error on secondary
• Upon disk error, drbdadm detach <res> will run.
Secondary devices become Diskless state. After fixing the disk issue,You need to attach drbdadm attach allIf the internal data on the disk is broken, sync will run from UpToDatedevice to the peer.
![Page 23: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/23.jpg)
• Fix the disk issue at first.• Then run drbdadm attach all• Sync may run.
Disk I/O error on secondary
![Page 24: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/24.jpg)
Disk I/O error on primary
• If disk I/O error happened on primary, Primary DRBD devices become Diskless.
![Page 25: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/25.jpg)
Disk I/O error on primary• Fix the disk issue at first. Then run
drbdadm attach all on the bad node.• Sync will run from UpToDate (secondary) to
Inconsistent (Primary).
![Page 26: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/26.jpg)
• Attach/Detach attaches/detaches lower disks
• Connect/Disconnect connect-to/disconnect-from peer node
• Primary/Secondary define the role of resource
• Invalidate invalidate the data
• Pre-DRBD-8.4
drbdadm -- --discard-my-data connect <res>DRBD-8.4
drbdadm connect --discard-my-data <res> discard data on the resource
![Page 27: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/27.jpg)
How to check if split-brain happens
• Once SB happens, you seeSplit-Brain detected, dropping connection!In /var/log/messages
• When SB happens, at least one node becomes StandAlone. The peer can be WFConnection or StandAlone too.
• If SB happens, you need to discard data on one node.
![Page 28: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/28.jpg)
Sample plan to fix SB (1)
1. Take hostbackup2. Identify the bad host3. Identify which are primary and secondary
(DRBD)4. Stop DB
service heartbeat stop (HA1/HA2)make sure DRBD partitions are not mounted
![Page 29: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/29.jpg)
Sample plan to fix SB (2)
• drbdadm disconnect all (HA1 / HA2)• drbdadm secondary all (HA1 / HA2)• drbdadm disconnect all (HA1 / HA2)• drbdadm -- --discard-my-data connect all
(only on bad host)• drbdadm connect all (good host)• drbdadm connect all (bad host)
![Page 30: DRBD é um amigo!](https://reader036.vdocuments.net/reader036/viewer/2022081720/54c901484a795961428b4574/html5/thumbnails/30.jpg)
Sample plan to fix SB (3)
5. Start heartbeat on the good host to make it Primary.