RedHat High Availability clusters use Pacemaker (resource manager) and Corosync (messaging layer) to keep critical services running even when servers fail. Clustering is a core RHCA topic.
Cluster Components
- Corosync — cluster communication layer, tracks node membership
- Pacemaker — cluster resource manager, decides where resources run
- STONITH/Fencing — kills failed nodes to prevent data corruption ("Shoot The Other Node In The Head")
- CIB — Cluster Information Base — XML database of cluster config
Cluster Packages
# yum install pacemaker corosync pcs fence-agents-all -y
# pcs — Pacemaker/Corosync CLI management tool
# crm — alternative CLI (from crmsh package)
Set Up a Two-Node Cluster
# On ALL nodes:
# systemctl start pcsd
# systemctl enable pcsd
# passwd hacluster # set password for hacluster user
# On NODE 1 only:
# pcs cluster auth node1 node2 -u hacluster -p password
# Create cluster:
# pcs cluster setup --name mycluster node1 node2
# Start cluster:
# pcs cluster start --all
# pcs cluster enable --all
# Check status:
# pcs status
# crm_mon # real-time monitor
Configure STONITH (Fencing)
# Fencing is REQUIRED for production clusters (prevents split-brain)
# Disable for testing (NOT for production):
# pcs property set stonith-enabled=false
# Configure IPMI fencing:
# pcs stonith create fence-node1 fence_ipmilan \
ipaddr=192.168.1.201 login=admin passwd=pass pcmk_host_list=node1
# Test fencing:
# pcs stonith fence node1
Create Cluster Resources
# Virtual IP resource:
# pcs resource create virtual_ip ocf:heartbeat:IPaddr2 \
ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s
# Apache resource:
# pcs resource create webserver ocf:heartbeat:apache \
configfile=/etc/httpd/conf/httpd.conf \
op monitor interval=30s
# Group resources to run together:
# pcs resource group add webgroup virtual_ip webserver
# Set ordering constraint (VIP before Apache):
# pcs constraint order virtual_ip then webserver
# Set location constraint (prefer node1):
# pcs constraint location webserver prefers node1=100
# List resources:
# pcs resource show
# pcs resource status
Cluster Management
# Move resource to other node:
# pcs resource move webserver node2
# Put node in standby (maintenance):
# pcs node standby node1
# Bring node back:
# pcs node unstandby node1
# Clear failed resource:
# pcs resource cleanup webserver
# Stop cluster:
# pcs cluster stop --all
# Cluster config:
# pcs config show
# cat /etc/corosync/corosync.conf
Quorum
# Quorum prevents split-brain — majority of nodes needed to operate
# 2-node cluster: add quorum option to allow operation with 1 node:
# pcs property set no-quorum-policy=ignore # 2-node clusters only
# Check quorum:
# corosync-quorumtool -s