RedHat Cluster Configuration: High Availability with Pacemaker and Corosync

RedHat High Availability clusters use Pacemaker (resource manager) and Corosync (messaging layer) to keep critical services running even when servers fail. Clustering is a core RHCA topic.

Cluster Components

  • Corosync — cluster communication layer, tracks node membership
  • Pacemaker — cluster resource manager, decides where resources run
  • STONITH/Fencing — kills failed nodes to prevent data corruption ("Shoot The Other Node In The Head")
  • CIB — Cluster Information Base — XML database of cluster config

Cluster Packages

# yum install pacemaker corosync pcs fence-agents-all -y

# pcs — Pacemaker/Corosync CLI management tool
# crm — alternative CLI (from crmsh package)

Set Up a Two-Node Cluster

# On ALL nodes:
# systemctl start pcsd
# systemctl enable pcsd
# passwd hacluster                    # set password for hacluster user

# On NODE 1 only:
# pcs cluster auth node1 node2 -u hacluster -p password

# Create cluster:
# pcs cluster setup --name mycluster node1 node2

# Start cluster:
# pcs cluster start --all
# pcs cluster enable --all

# Check status:
# pcs status
# crm_mon                             # real-time monitor

Configure STONITH (Fencing)

# Fencing is REQUIRED for production clusters (prevents split-brain)

# Disable for testing (NOT for production):
# pcs property set stonith-enabled=false

# Configure IPMI fencing:
# pcs stonith create fence-node1 fence_ipmilan \
    ipaddr=192.168.1.201 login=admin passwd=pass pcmk_host_list=node1

# Test fencing:
# pcs stonith fence node1

Create Cluster Resources

# Virtual IP resource:
# pcs resource create virtual_ip ocf:heartbeat:IPaddr2 \
    ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s

# Apache resource:
# pcs resource create webserver ocf:heartbeat:apache \
    configfile=/etc/httpd/conf/httpd.conf \
    op monitor interval=30s

# Group resources to run together:
# pcs resource group add webgroup virtual_ip webserver

# Set ordering constraint (VIP before Apache):
# pcs constraint order virtual_ip then webserver

# Set location constraint (prefer node1):
# pcs constraint location webserver prefers node1=100

# List resources:
# pcs resource show
# pcs resource status

Cluster Management

# Move resource to other node:
# pcs resource move webserver node2

# Put node in standby (maintenance):
# pcs node standby node1

# Bring node back:
# pcs node unstandby node1

# Clear failed resource:
# pcs resource cleanup webserver

# Stop cluster:
# pcs cluster stop --all

# Cluster config:
# pcs config show
# cat /etc/corosync/corosync.conf

Quorum

# Quorum prevents split-brain — majority of nodes needed to operate
# 2-node cluster: add quorum option to allow operation with 1 node:
# pcs property set no-quorum-policy=ignore   # 2-node clusters only

# Check quorum:
# corosync-quorumtool -s