This article will help you learn how to setup/configure a High-Availability (HA) cluster on Linux/Unix based systems. Cluster is nothing but a group of computers (called nodes/members) to work together to execute a task. Basically there are four types of clusters available, which are Storage Cluster, High-availability Cluster, Load-balancing Cluster, and HIGH-Performance Computing Cluster. In production, HA (High-Availability) and LB (Load Balancing) Clusters are the most deployed cluster types in the clustered environment. They offer, uninterrupted availability of services/data as they can be (for eg: web services) to the end-user community. HA Cluster configurations are sometimes grouped into two subsets: (Active-active and Active-passive).
Active-active: Typically you need a minimum of two nodes, both nodes should be running the same service/application actively. This is mainly used to achieve Load Balancing (LB) Cluster to distribute the workloads across the nodes.
Active-passive: It also needs a minimum of two nodes to provide a fully redundant system. Here, the service/application runs only on one node at a time and it is mainly used to achieve the High Availability (HA) Cluster as one node will be active and the other will be a standby (passive).
In our setup, we will be focusing only on High-Availability (Active-passive) also known as Fail-over cluster. One of the biggest achievements by having the nodes in the HA cluster will be tracking each other's nodes and migrating the service/application to the next node in-case of any failures in the nodes. Also, the faulty node won't be visible to the clients from outside, but there will be a small service disruption during the migration period. It also maintains the data integrity of the service using HA.
The High-Availability Cluster in RedHat / Centos 7 is completely different from the previous versions. In RedHat version 7 onwards “pacemaker” becomes the default Cluster Resource-Manager (RM) and Corosync is responsible for exchanging and updating cluster information with other cluster nodes regularly. Both Pacemaker and Corosync are very powerful opensource technologies that are completely the replacement of CMAN and RGManager from the previous versions of RedHat clusters.
This step-by-step guide will help you on how to configure a High-Availability (HA) / Fail-over cluster with common iscsi shared storage on RHEL/CentOS 7.6. You can use the same guide for all the versions of RHEL/CentOS/Fedora with a few minimal changes.
Prerequisites:
Operating System : CentOS Linux 7
Shared Storage : iSCSI SAN
Floating IP address : For Cluster nodes
package : pcs, fence-agents-all and targetcli
My Lab Setup :
For the lab setup, I am using 3 centos machines. Two for Cluster nodes and one for ISCSI/Target Server
Node-1:
Operating System:- CentOS Linux 7 (Core)
hostname:- node1.lteck.local
IP Address:- 192.168.3.100
Node-2:
Operating System:- CentOS Linux 7 (Core)
hostname:- node2.lteck.local
IP Address: -192.168.3.101
ISCSI - Server:
Operating System:- CentOS Linux 7 (Core)
hostname:- iscsi-server.local
IP Address:- 192.168.3.102
Block device :- /dev/sdb
Other Info:
Cluster Name :- linuxteck_cluster
Virtual IP:- 192.168.3.105
Step 1: Setup Storage Server (iSCSI)
Use the following command to check the available block device to use for a Storage Server.
# lsblk
Output:
From the above command, it will list all (/dev/sda and /dev/sdb) the block devices in a tree format. In our demo, I will be using "/dev/sdb" with 1GB disk as shared storage for cluster nodes.
Note:
Add the following entries into /etc/hosts file in the following format "IP Address Domain-name [Domain-aliases]". This will help resolve host-names, which means it can easily bind local IP addresses into a host name, web address, or URLs.
# vi /etc/hosts
192.168.3.102 storage.lteck.local storage
Note:
To learn more about DNS: click here How to set up Domain Name Services (DNS) on Linux
First, let's update to the latest current version and then install the target utility package.
# yum update -y
# yum install -y targetcli
Now follow the below command to get into the interactive shell of the iSCSI Server.
# targetcli
targetcli shell version 2.1.fb49
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'./>
(a) Create a backstore block device:
/> /backstores/block create ltecklun1 /dev/sdb
(b) Create iSCSI for IQN target:
/> /iscsi create iqn.2020-01.local.server-iscsi:server
(c) Create ACLs:
/> /iscsi/iqn.2020-01.local.server-iscsi:server/tpg1/acls create iqn.2020-01.local.client-iscsi:client1
(d) Create LUNs under the ISCSI target:
/> /iscsi/iqn.2020-01.local.server-iscsi:server/tpg1/luns create /backstores/block/ltecklun1
(e) Enable CHAPP Authentication
/> cd /iscsi/iqn.2020-01.local.server-iscsi:server/tpg1
/iscsi/iqn.20...i:server/tpg1> set attribute authentication=1
/iscsi/iqn.20...i:server/tpg1> cd acls/iqn.2020-01.local.client-iscsi:client1
/iscsi/iqn.20...iscsi:client1> set auth userid=linuxteck
/iscsi/iqn.20...iscsi:client1> set auth password=password@123
/iscsi/iqn.20...iscsi:client1> cd /
/> ls
/> saveconfig
/> exit
(f) Add a firewall rule to permit iscsi port 3260 OR disable it
# firewall-cmd --permanent --add-port=3260/tcp
# firewall-cmd --reload
# firewall-cmd --list-all
OR
# systemctl disable firewalld.service
# systemctl stop firewalld.service
(g) Disable the SELinux
(h) Finally, enable and start the iSCSI target.
# systemctl enable target.service
# systemctl restart target.service
# systemctl status target.service
Note:
Step 2: Setup High-Availability (HA) Cluster
Add the following host entries to all the nodes and shared storage in the cluster. It will help the systems to communicate with each other using hostnames.
Node:1
# vi /etc/hosts
192.168.3.100 node1.lteck.local node1
192.168.3.101 node2.lteck.local node2
192.168.3.102 storage.lteck.local storage
Node:2
# vi /etc/hosts
192.168.3.100 node1.lteck.local node1
192.168.3.101 node2.lteck.local node2
192.168.3.102 storage.lteck.local storage
(a) Import the LUNs on all the nodes across the cluster (Node1 and Node2)
(i) Before importing LUN from the shared storage, let's update the latest current version of Centos 7.x on both nodes (Node1 and Node2)
# yum update -y
(ii) Install the iscsi-initiator package on both nodes (Node1 and Node2)
# yum install -y iscsi-initiator-utils
(iii) Use the following command to add the initiator name on both nodes (Node1 and Node2). You can pick the initiator name from the target server which was already created, in our case it is "iqn.2020-01.local.client-iscsi:client1".
# vi /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2020-01.local.client-iscsi:client1
(iv) Save and restart the iscsid service on both nodes
# systemctl restart iscsid.service
# systemctl enable iscsid.service
# systemctl status iscsid.service
(v) Next, configure CHAP authentication on both nodes (Node1 and Node2)
# vi /etc/iscsi/iscsid.conf
node.session.auth.authmethod = CHAP
node.session.auth.username = linuxteck
node.session.auth.password = password@123
Save the file:
(vi) Now is the time to Discover the iSCSI Shared Storage (LUNs) on both nodes (Node1 and Node2)
# iscsiadm --mode discoverydb --type sendtargets --portal 192.168.3.102 --discover
Output:
192.168.3.102:3260,1 iqn.2020-01.local.server-iscsi:server
Note:
(vii) Use the following command to log in to the Target Server:
# iscsiadm -m node --login
Output:
Logging in to [iface: default, target: iqn.2020-01.local.server-iscsi:server, portal: 192.168.3.102,3260] (multiple)
Login to [iface: default, target: iqn.2020-01.local.server-iscsi:server, portal: 192.168.3.102,3260] successful.
(viii) Use the following command to verify the newly added disk on both nodes
# lsblk
Note:
(ix) Use the following command to create a filesystem for the newly added block device (/dev/sdb) to any one of your nodes, either node1 or node2. I will use it in our demo on Node1.
# mkfs.xfs /dev/sdb
Note:
For testing purposes, use the following steps to mount the newly added disk temporarily with /mnt directory and create 3 files named "1, 2, 3", then use 'ls' command to verify these files are placed in /mnt directory and finally unmount the /mnt directory from Node1.
# mount /dev/sdb /mnt
# cd /mnt
[root@node1 mnt]# touch 1 2 3
[root@node1 mnt]# ls
1 2 3
[root@node1 mnt]# cd
[root@node1 ~]# umount /mnt/
Now, move on to Node2 and run the following command to see if those files created on Node1 are available on Node2.
[root@node2 ~]# mount /dev/sdb /mnt/
[root@node2 ~]# cd /mnt/
[root@node2 mnt]# ls
1 2 3
[root@node2 mnt]# cd
[root@node2 ~]# umount /mnt/
Note:
(b) Install and configure Cluster Setup
(i) Use the following command to Install cluster Packages (pacemaker) on both nodes (Node1 and Node2)
# yum install pcs fence-agents-all -y
Note:
# firewall-cmd --permanent --add-service=high-availability
# firewall-cmd --reload
# firewall-cmd --list-all
(ii) Now, start the cluster service and enable it for every reboot on both nodes (Node1 and Node2).
# systemctl start pcsd
# systemctl enable pcsd
# systemctl status pcsd
(iii) Cluster Configuration -: Use the following command to set the password for "hacluster" user on both nodes (Node1 and Node2).
# echo <EnterYourPassword> | passwd --stdin hacluster
Note:
(iv) Use the following command to authorize the nodes. Execute it to only one of your nodes in the Cluster. In our case, I would prefer to run it on Node1.
# pcs cluster auth node1.lteck.local node2.lteck.local
Output:
Username: hacluster
Password:
node2.lteck.local: Authorized
node1.lteck.local: Authorized
Note:
(v) Start and configure Cluster Nodes. Execute the following command to only one of your nodes. In our case, Node1
# pcs cluster setup --start --enable --name linuxteck_cluster node1.lteck.local node2.lteck.local
Note:
(vi) Enable the Cluster service for every reboot
# pcs cluster enable --all
Output:
node1.lteck.local: Cluster Enabled
node2.lteck.local: Cluster Enabled
Note:
(vii) Use the following command to get the simple or detailed cluster status
# pcs cluster status
Output:
Cluster Status:
Stack: corosync
Current DC: node1.lteck.local (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Wed Mar 11 19:46:41 2020
Last change: Wed Mar 11 18:58:35 2020 by hacluster via crmd on node1.lteck.local
2 nodes configured
0 resources configured
PCSD Status:
node1.lteck.local: Online
node2.lteck.local: Online
Note:
# pcs status
Output:
Cluster name: linuxteck_cluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: node1.lteck.local (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Wed Mar 11 19:47:06 2020
Last change: Wed Mar 11 18:58:35 2020 by hacluster via crmd on node1.lteck.local
2 nodes configured
0 resources configured
Online: [ node1.lteck.local node2.lteck.local ]
No resources
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Note:
# crm_verify -L -V
WARNING:
(viii) Setup Fencing
Fencing, also known as STONITH "Shoot The Other Node In The Head", is one of the important tools in the cluster which can be used to safeguard the data corruption on the shared storage. Fencing plays a vital role when the nodes are not able to talk to each other. This will detach the shared storage access from the faulty node. There are two types available in Fencing: Resource Level Fencing and Node Level Fencing.
For this demo, I am not going to run Fencing (STONITH), as our machines are running in a VMware environment, which doesn't support it, but for those who are implementing in a production environment please click here to see the entire setup of fencing
Use the following command to disable the STONITH and ignore the quorum policy and check the status of Cluster Properties to ensure both are disabled:
# pcs property set stonith-enabled=false
# pcs property set no-quorum-policy=ignore
# pcs property list
Output:
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: linuxteck_cluster
dc-version: 1.1.20-5.el7_7.2-3c4c782f70
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false
Note:
(ix) Resources / Cluster Services
For Clustered services, the resources would be either a physical hardware unit such as disk drive or logical units like IP address, Filesystem or applications. In a cluster, a resource can run only on a single node at a time. In our demo we will be using the following resources:
Httpd Service
IP Address
Filesystem
First, let us install and configure the Apache server on both nodes (Node1 and Node2). Follow these steps:
# yum install -y httpd
Add the below entries at the end of the apache configuration file ('/etc/httpd/conf/httpd.conf')
# vi /etc/httpd/conf/httpd.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
Save the file.
Note:
# mount /dev/sdb /var/www/
# mkdir /var/www/html
# echo "Red Hat Hight Availability Cluster on LinuxTeck" > /var/www/html/index.html
# umount /var/www
Note:
# firewall-cmd --permanent --add-port=80/tcp
# firewall-cmd --permanent --add-port=443/tcp
# firewall-cmd --reload
# firewall-cmd --list-all
OR
# systemctl disable firewalld.service
# systemctl stop firewalld.service
Disable the SELinux or click here to configure SELinux for Apache.
(x) Create Resources. In this section, we will add three cluster resources: "FileSystem resources named as APACHE_FS", "Floating IP address resources named as APACHE_VIP", "Webserver resources named as APACHE_SERV". Use the following command to add the three resources to the same group.
(i) Add the first resource: Filesystem with the combination of shared storage (iSCSI Server)
# pcs resource create APACHE_FS Filesystem device="/dev/sdb" directory="/var/www" fstype="xfs" --group apache
Output:
Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')
(ii) Add a second resource: Floating IP address
# pcs resource create APACHE_VIP IPaddr2 ip=192.168.3.105 cidr_netmask=24 --group apache
Output:
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')
(iii) Add the third recourse: APACHE_SERV
# pcs resource create APACHE_SERV apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status" --group apache
Output:
Assumed agent name 'ocf:heartbeat:apache' (deduced from 'apache')
Note:
# pcs cluster start --all
Output:
node1.lteck.local: Starting Cluster (corosync)...
node2.lteck.local: Starting Cluster (corosync)...
node2.lteck.local: Starting Cluster (pacemaker)...
node1.lteck.local: Starting Cluster (pacemaker)...
Note:
# pcs status
Output:
Cluster name: linuxteck_cluster
Stack: corosync
Current DC: node1.lteck.local (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Thu Mar 12 19:09:13 2020
Last change: Thu Mar 12 19:09:00 2020 by root via cibadmin on node1.lteck.local
2 nodes configured
3 resources configured
Online: [ node1.lteck.local node2.lteck.local ]
Full list of resources:
Resource Group: apache
APACHE_FS (ocf::heartbeat:Filesystem): Started node1.lteck.local
APACHE_VIP (ocf::heartbeat:IPaddr2): Started node1.lteck.local
APACHE_SERV (ocf::heartbeat:apache): Started node1.lteck.local
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Note:
(xi) Test High-Availability (HA)/Failover Cluster
The final step in our High-Availability Cluster is to do the Failover test, manually we stop the active node (Node1) and see the status from Node2 and try to access our webpage using the Virtual IP.
# pcs cluster stop node1.lteck.local
Output:
node1.lteck.local: Stopping Cluster (pacemaker)...
node1.lteck.local: Stopping Cluster (corosync)...
Note:
[root@node2 ~]# pcs status
Output:
Cluster name: linuxteck_cluster
Stack: corosync
Current DC: node2.lteck.local (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Thu Mar 12 21:59:04 2020
Last change: Thu Mar 12 19:09:00 2020 by root via cibadmin on node1.lteck.local
2 nodes configured
3 resources configured
Online: [ node2.lteck.local ]
OFFLINE: [ node1.lteck.local ]
Full list of resources:
Resource Group: apache
APACHE_FS (ocf::heartbeat:Filesystem): Started node2.lteck.local
APACHE_VIP (ocf::heartbeat:IPaddr2): Started node2.lteck.local
APACHE_SERV (ocf::heartbeat:apache): Started node2.lteck.local
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Note:
Additionally, here are some more cluster commands that are here. This may help you manage your cluster.
Start or stop the cluster (using the '--all' option will help to start/stop all the nodes across your cluster)
# pcs cluster start
#pcs cluster stop
To stop the cluster service on a particular node
# pcs cluster stop node2.lteck.local
Add a new node to the cluster
# pcs cluster node add newnode.lteck.local
Remove a node from the cluster
# pcs cluster node remove newnode.lteck.local
# pcs stonith remove fence_newnode newnode.lteck.local
How to make a node on standby
# pcs cluster node standby newnode.lteck.local
How to get the cluster on standby
# pcs cluster standby --all
How to revoke/unset standby
# pcs cluster unstandby --all
How to find quorum status
# corosync-quorumtool
Cluster configuration file
# /etc/corosync/corosync.conf
How to find the status of the resource
# pcs resource show
How to find the fencing status
# pcs stonith show
Congratulations, you have successfully configured TWO Node High-Availability clusters on RHEL/CentOS 7.6. If you have any difficulties in configuring the same, just let us know through the comment box.
I hope this article will help you to understand a few things about the 'HA/Failover Cluster'. Drop me your feedback/comments. If you like this article, kindly share it and it may help others as well.
Thank you!
8 replies on “How to configure Two Node High Availability Cluster On RHEL/CentOS/RockyLinux”
Detailed and step by step guide to High Availability Cluster on CentOS. Thanks.
Excellent information! do you have your own learning website? Thank you for the information. Great guide.
Thank you Al Rubel Ran. Yes, I have.
Excellent article!
Please show suse ha cluster configuration.
Excellent article!! Great
You didn’t tell us how to set up this, can you please tell us about it?
>Other Info:
>Cluster Name :- linuxteck_cluster
>Virtual IP:- 192.168.3.105
Please follow the guide carefully, and you will discover how to set up everything. I’ve written the guide in a simple, step-by-step format so that anyone can understand it, even without prior experience in Linux.