red hat global file system (gfs)

24
1 Red Hat Storage Cluster (GFS, CLVM, GNBD) GFS allows multiple nodes to share storage at a block level as if the storage were connected locally to each cluster node. Schubert Zhang, Guangxian Liao Jul, 2008

Upload: schubert-zhang

Post on 04-Dec-2014

18.905 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Red Hat Global File System (GFS)

1

Red Hat Storage Cluster

(GFS, CLVM, GNBD)

GFS allows multiple nodes to share storage at a block level as if the

storage were connected locally to each cluster node.

Schubert Zhang, Guangxian Liao

Jul, 2008

Page 2: Red Hat Global File System (GFS)

2

Deployment

GNBD Client

import

/dev/gnbd/x

CLVM/LVM

vg-lv

GFS

/mnt/gfs

GNBD Server

export

GNBD Server

export

TCP/IP Network

Block Devices

Disks...

Block Devices

Disks...

GNBD Server Node GNBD Server Node

Applications

GNBD Client

import

/dev/gnbd/x

CLVM/LVM

vg-lv

GFS

/mnt/gfs

Applications

GNBD Client

import

/dev/gnbd/x

GFS

/mnt/gfs

Applications

CLVM/LVM

vg-lv

GFS node

Storage Level

GFS node GFS node

“Economy and Performance” deployment.

Page 3: Red Hat Global File System (GFS)

3

Architecture

• Pay attention to the relationship of cluster software components. – Cluster Infrastructure (Common): CMAN, DLM, CCS, Fencing

– GNBD: Server and Client

– CLVM

– GFS

Page 4: Red Hat Global File System (GFS)

4

Cluster Infrastructure • We only concern cluster type for storage.

• Necessary infrastructure components: – Cluster management (CMAN)

• It’s a distributed cluster manager and runs in each cluster node.

• Keeps track of cluster quorum, avoids "split-brain". (Like chubby’s Paxos Protocol?)

• Keeps track of membership.

• libcman.so library, cman_tool

• dlm_controld: started by cman init script to manage dlm in kernel

• gfs_controld : started by cman init script to manage gfs in kernel

• groupd: started by cman init script to interface between openais/cman and dlm_controld/gfs_controld/fenced , group_tool

– Lock management (DLM) • To synchronize access to shared resources (shared storage, etc.).

• Runs in each cluster node.

• libdlm.so library.

• GFS and CLVM use locks from DLM.

• GFS uses locks from the lock manager to synchronize access to file system metadata (on shared storage).

• CLVM uses locks from the lock manager to synchronize updates to LVM volumes and volume groups (also on shared storage).

• Like chubby’s lock function?

Page 5: Red Hat Global File System (GFS)

5

Cluster Infrastructure (cont.)

– Cluster configuration management (CCS) • Runs in each cluster node. (ccsd)

• Synchronization/up-to-date of cluster configuration file. Propagates modification in a cluster.

• Other components (eg. CMAN) access configuration info via CCS.

• /etc/cluster/cluster.conf (Cluster Name, Cluster nodes, Fence, Resources, etc.)

• ccs_tool: to make online updates of CCS configuration files

• ccs_test: to retrieve information from configuration files through ccsd.

– Fencing • Fencing is the disconnection of a node from the cluster's shared storage.

Fencing cuts off I/O from shared storage, thus ensuring data integrity. The cluster infrastructure performs fencing through the fence daemon, fenced.

• GNBD fencing: fence_gnbd?

• fence_tool

• cluster configuration file: fencing-method, fencing agent, fencing device for each node in the cluster.

Page 6: Red Hat Global File System (GFS)

6

GNBD

• An ancillary component of GFS that exports

block-level storage to Ethernet.

• Global Network Block Device

– GNBD provides block-device access for Red Hat GFS

over TCP/IP. GNBD is similar in concept to NBD;

– GNBD is GFS-specific and tuned solely for use with

GFS.

• Two major components

– GNBD Server

– GNBD Client

Page 7: Red Hat Global File System (GFS)

7

GNBD (server)

• Exports block-level storage from its local

storage.

• gnbd_serv process

– GNBD server need not join the cluster

manager.

• gnbd_export

– Export block devices.

Page 8: Red Hat Global File System (GFS)

8

GNBD (client)

• A GNBD client runs in a node with GFS.

• Imports block device exported by GNBD server.

• Multiple GNBD clients can access a device exported by a GNBD server, thus making a GNBD suitable for use by a group of nodes running GFS.

• gnbd.ko – A kernel module

• gnbd_import – Import remote block devices form GNBD server.

Page 9: Red Hat Global File System (GFS)

9

CLVM

• Provides volume management of cluster storage.

• A cluster-wide version of LVM2

• CLVM provides the same capabilities as LVM2 on a single node, but makes the logical volumes created with CLVM available to all nodes in a cluster.

• CLVM uses the lock-management service provided by the cluster infrastructure.

• Using CLVM requires minor changes to /etc/lvm/lvm.conf for cluster-wide locking.

• clvmd: – A daemon that provides clustering extensions to the standard LVM2 tool

set and allows LVM2 commands to manage shared storage.

– Runs on each cluster node.

– Distributes LVM metadata updates in a cluster, thereby presenting each cluster node with the same view of the logical volumes

Page 10: Red Hat Global File System (GFS)

10

CLVM (cont.)

• lvm: LVM2 command line tools.

• /etc/lvm/lvm.conf

• pvcreate

– block devices/partitions -> PV

• vgcreate

– PV(s) -> VG

• lvcreate

– VG -> LV(s)

• Ready for make file system on LV.

Page 11: Red Hat Global File System (GFS)

11

GFS

• To simultaneously access a block device that is shared among the nodes.

• Single, consistent view of the FS name space across GFS nodes in a cluster.

• Native FS under VFS, POSIX interface to applications.

• Distributed metadata and multiple journals.

• Uses lock manager to coordinate I/O.

• When one node changes data on a GFS file system, that change is immediately visible to the other cluster nodes using that file system.

• Scale the cluster seamlessly by adding servers or storage on the fly.

• We use a “Economy and Performance” deployment.

Page 12: Red Hat Global File System (GFS)

12

GFS (cont.)

• gfs.ko: kernel module, loaded on each GFS cluster node.

• gfs_mkfs: create a GFS on a storage device.

• gfs_tool: configures or tunes a GFS.

• gfs_grow: grows a mounted GFS.

• gfs_jadd: adds journals to a mounted GFS.

• gfs_quota: manages quotas on a mounted GFS.

• gfs_fsck: repairs an unmounted GFS.

• mount.gfs: mount helper called by mount.

Page 13: Red Hat Global File System (GFS)

13

Fencing

• We must configure each GFS node in the

cluster for at least one form of fencing.

Page 14: Red Hat Global File System (GFS)

14

Setup a Cluster

(prepare) • Software Installation

– Install default packages of “Clustering” and “Storage Clustering” in each node.

– The rpms in cdrom: /ClusterStorage and /Cluster

– cman’s rpm in cdrom: /Server

– Major RPM (if need dependence, install depended rpm) • cman-2.0.60-1.el5.i386.rpm

• modcluster-0.8-27.el5.i386.rpm

• gnbd-1.1.5-1.el5.i386.rpm

• kmod-gnbd-0.1.3-4.2.6.18_8.el5.i686.rpm

• kmod-gfs-0.1.16-5.2.6.18_8.el5.i686.rpm

• lvm2-cluster-2.02.16-3.el5.i386.rpm

• Global_File_System-en-US-5.0.0-4.noarch.rpm

• gfs-utils-0.1.11-1.el5.i386.rpm

• gfs2-utils-0.1.25-1.el5.i386.rpm

• etc.

• Network – Disable firewall and SELinux

– Enable multicast and IGMP.

– Configure /etc/hosts or DNS for hostname (import!)

• Machine hostnames – 192.168.1.251 test1 (gnbd server)

– 192.168.1.252 test2 (gfs node)

– 192.168.1.253 test3 (gfs node)

– 192.168.1.254 test4 (gnbd server)

Page 15: Red Hat Global File System (GFS)

15

Setup a Cluster

(GNBD server) • In GNBD server node, need not start cman, i.e., the GNBD server is not a member of

the cluster.

(1) Start GNBD server process (man gnbd_serv) # gnbd_serv -n

(2) Export block device (man gnbd_export) # gnbd_export -v -d /dev/sda3 -e gnbdnode1 –c

Note1: must enable cache since no cman.

Note2: the block device should be a disk partition (how about LV? The document says LV is not supported).

(3) Check the export

# gnbd_export -l

(4) Add to /etc/rc.local gnbd_serv –n

gnbd_export -v /dev/sda3 -e gnbdnode1 -c

Page 16: Red Hat Global File System (GFS)

16

Setup a Cluster

(cluster infrastructure) • Initially configure a cluster

– /etc/cluster/cluster.conf: generated by “system-config-cluster” or manually. (only config clustername and node members)

– In one GFS node, use “system-config-cluster” to create a new cluster(name:cluster1) and add a node (name: test2)

– The cluster.conf <?xml version="1.0" ?>

<cluster alias="cluster1" config_version="5" name="cluster1">

<fence_daemon post_fail_delay="0" post_join_delay="3"/>

<clusternodes>

<clusternode name="test2" nodeid="1" votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman expected_votes="1" two_node="1"/>

<fencedevices/>

<rm>

<failoverdomains/>

<resources/>

</rm>

</cluster>

• Start infrastructure components – service cman start (refer to /etc/init.d/cman)

• Load kernel modules (configfs, dlm, lock_dlm)

• Mount configfs (I think it like chubby’s files ystem space)

• Start ccsd daemon

• Start cman (no daemon, use cman_tool join to join this node to cluster)

• Start daemons (start groupd, fenced, dlm_controld, gfs_controld)

• Start fencing (start fenced daemon, and use fence_tool to join this node to fence domain)

Page 17: Red Hat Global File System (GFS)

17

Setup a Cluster

(GNBD client) • Load kernel module gnbd.ko

# echo “modprobe gnbd” >/etc/sysconfig/modules/gnbd.modules

# chmod 755 /etc/sysconfig/modules/gnbd.modules

# modprobe gnbd

Then, the gnbd.ko will be loaded when the node boot-up.

• Import GNBD # gnbd_import -i test1

Then, we can find a block device /dev/gnbd/gnbdnode1, and it is same as /dev/gnbd0.

create a /etc/init.d/gnbd-client as a service script # chmod 755 /etc/init.d/gnbd-client

# chkconfig --add gnbd-client

Since gnbd_import -i should be done early then clvmd and gnbd_import –R should be done later then clvmd, we assign special start-number (23<24) and stop-number(77>76) in script /etc/init.d/gnbd-client.

Thus, the gnbd will be imported automatically when node boot-up.

Page 18: Red Hat Global File System (GFS)

18

Setup a Cluster

(CLVM) • Start clvmd

# service clvmd start

# chkconfig --level 35 clvmd

• pvcreate # pvcreate /dev/gnbd0, or

# pvcreate /dev/gnbd/gnbdnode1

Use lvmdiskscan or pvdisplay or pvscan to display status.

• vgcteare # vgcreate vg1 /dev/gnbd0

Use vgdisplay or vgscan to display status.

• lvcreate # lvcreate -l 100%FREE -n lv1 vg1

Use lvdisplay and lvscan to display status.

• Restart clvmd # service clvmd rerestart

clvmd is responsible for sync the lvm configuration among the cluster nodes.

Now, we can find a new logical volume block device @ /dev/vg1/lv1, and we can make file system on it.

Page 19: Red Hat Global File System (GFS)

19

Setup a Cluster

(GFS) • Make sure that the clocks on the GFS nodes are synchronized. (Use

NTP)

• Make GFS file system # gfs_mkfs -p lock_dlm -t cluster1:testgfs -j 4 /dev/vg1/lv1

Note: -j number same node number. One journal is required for each node that mounts a GFS file system. Make sure to account for additional journals needed for future expansion.

• Mount GFS # mkdir /mnt/gfs

(1) # mount -t gfs -o acl /dev/vg1/lv1 /mnt/gfs

(2) add to /etc/fstab

/dev/vg1/lv1 /mnt/gfs gfs defaults,acl 0 0

# mount -a -t gfs

# chkconfig --level 35 gfs

(refer to /etc/init.d/gfs)

Now, the GFS file system is accessible @ /mnt/gfs/

Page 20: Red Hat Global File System (GFS)

20

Setup a Cluster

(add a new GFS node) • Install packages in the new node

• Add new node information in cluster.conf in a existing node. – Use ccs_tool addnode, or

– Use system-config-cluster

• Copy (scp) the /etc/cluster.conf to new node.

• Import GNBD on the new node

• Stop components on all running node (if there are more than 2 node existing, need not do this step) # service gfs stop

# service clvmd stop

# service cman stop

• Then start components on all running node and the new added node # service cman start

# service clvmd start

# service gfs start

The clvmd will sync the metadata of logical volumes to the new added node, so, when clvmd is started, the /dev/vg1/lv1 will be visible on the now node.

Page 21: Red Hat Global File System (GFS)

21

Setup a Cluster

(add a new GNBD node) • Setup a new GNBD server (machine test4)

and export a new block device. # gnbd_serv -n

# gnbd_export -v -d /dev/sda3 -e gnbdnode2 –c

# gnbd_export -l

• Import the new GNBD (on all of the cluster nodes) # gnbd_import -i test4

# gnbd_import –l

• Make a new PV (on one of the cluster node) # pvcreate –v /dev/gnbd1

Then, we can find the new PV on all nodes in the cluster, by pvdisplay or lvmdiskscan or pvscan or pvs.

• Extend the VG (on one of the cluster node) # vgextend -v vg1 /dev/gnbd1

Then, we can find the extended VG and changed PV on all nodes in the cluster, by vgdisplay or vgfscan or vgs.

• Extend the LV (on one of the cluster node) # lvextend -v -l +100%FREE

/dev/vg1/lv1

Then, we can find the extended LV on all nodes in the cluster, by lvdisplay or lvfscan or lvs.

• Grow the GFS (on one of the cluster node) (next page)

# gnbd_import -l

Device name : gnbdnode1

----------------------

Minor # : 0

sysfs name : /block/gnbd0

Server : test1

Port : 14567

State : Open Connected Clear

Readonly : No

Sectors : 3984120

Device name : gnbdnode2

----------------------

Minor # : 1

sysfs name : /block/gnbd1

Server : test4

Port : 14567

State : Close Connected Clear

Readonly : No

Sectors : 4273290

Page 22: Red Hat Global File System (GFS)

22

Setup a Cluster

(grow the GFS)

• The gfs_grow command must be run on a

mounted file system. Only needs to be run

on one node in a cluster.

• Grow

# gfs_grow /mnt/gfs

• Sometimes after gfs_grow, df and

lvdisplay are hung. Need reboot the

system.

Page 23: Red Hat Global File System (GFS)

23

Evaluation

• Size – GFS is based on a 64-bit architecture, which can theoretically

accommodate an 8 EB file system. However, the current supported maximum size of a GFS file system is 25 TB. (But there is a note in Red Hat document: “If your system requires GFS file systems larger than 25 TB, contact your Red Hat service representative.”)

• Essential benchmarks – Refer to GlusterFS’s benchmarks:

http://www.gluster.org/docs/index.php/GlusterFS#GlusterFS_Benchmarks

• Iozone benchmarks – Refer to http://www.iozone.org

• Lock test – Always blocks the process when it set a new lock which conflict with

other process.

Page 24: Red Hat Global File System (GFS)

24

Conclusions

• The cluster infrastructure is too complex and weak, sometimes fail (such as cman, fencing, etc).

• The GNDB is simple and robust, but lacks flexibility.

• The CLVM is ok, but too complex to use.

• The GFS is very weak, sometime fail. (mount is ok, but umount often fail)

• The “two levels” (GNBDStorage+GFSCluster) deployment is not meet the “cloud” goal.

• Not easy to add a new GFS cluster node or a GNBD storage node.

• No data replicas for safety.

• Risks when a GNBD node fails. When one GNBD node fails, the data GFS is not accessible.

The Red Hat Cluster solution is not based on the Assumptions like GoogleFS and GlusterFS, (i.e. “The system is built from many computer that often fail.”), it is not easy and not safe to use in a moderate scale cluster. So, I think the Red Hat Storage Cluster is not a good designed solution, and has no good future.