Exisiting Pacemaker How-To Guides, like Highly Available iSCSI Target, use Heartbeat, which is an alternative to Pacemaker. Unfortunately such documentation does not mention that Heartbeat is no longer preferred. Ubuntu ClusterStack docs and others use Pacemaker, but are complicated, incomplete, or application-specific. Clusterlab’s Ubuntu Quickstart is clean and simple, but like other guides, it too lacks a critical bit of information, which is:
Corosync uses the hostname to bind to an interface. If the local machine’s hostname resolves to 127.0.1.1 (as it should), Corosync will only bind to the loopback interface, and other nodes will be unreachable (OFFLINE).
This may not be a problem in later version of Corosync, but it affects the version installed on Ubuntu 12.04.
Step 1: Setup Network
Perform this step on all nodes in the cluster.
Typically nodes in clusters don’t rely on DHCP or DNS, so configure the network interface of each machine to use a static IP by editing /etc/network/interfaces to look something like this:
auto ethN # where N is the interface number, ie. eth0 on machines with a single interface iface ethN inet static address 10.0.0.1 #change this for each node netmask 255.255.255.0
In a two-node cluster you can connect both nodes directly using a cross-over cable. If using a switch, make sure it allows multicast.
Step 2: Setup Hosts
Perform this step on all nodes in the cluster.
Since DNS is typically not used, and Corosync relies on host names to communicate between nodes and bind to network interfaces, you’ll need to edit /etc/hosts. For a server with the hostname ‘node1.test.local’ and IP address 10.0.0.1, /etc/hosts will look like this:
127.0.0.1 localhost #127.0.1.1 node1.test.local node1 10.0.0.1 node1.test.local node1 10.0.0.2 node2.test.local node2 10.0.0.N nodeN.test.local nodeN # add a line for each additional node
You’ll need to comment-out the 127.0.1.1 line, then add the IP address and hostname of all other nodes in the cluster. Make the changes to /etc/hosts on all nodes in the cluster. If everything is going well, all nodes will be ‘pingable’ at their IP addresses. Note that you should see this:
node1:$ping node1 PING node1.test.local (10.0.0.1) 56(84) byes of data 64 bytes from node1.test.local (10.0.0.1): icmp_req=1 ttl=64 time=0.060 ms ...
and not this:
node1:$ping node1 PING node1.test.local (127.0.1.1) 56(84) byes of data 64 bytes from node1.test.local (127.0.1.1): icmp_req=1 ttl=64 time=0.060 ms ...
Step 3: Configure Corosync Binding
Perform this step on all nodes in the cluster.
Install some packages and ensure they start at boot:
sudo apt-get install pacemaker cman fence-agents update-rc.d -f pacemaker remove update-rc.d pacemaker start 50 1 2 3 4 5 . stop 01 0 6 .
Then edit /etc/corosync/corosync.conf:
#bindnetaddr: 127.0.0.1 bindnetaddr: 10.0.0.0
Where 10.0.0.0 is the IP address of the cluster’s subnet (it may vary depending on your IP address and netmask).
Step 4: Configure and Start the Cluster
Perform this step on all nodes in the cluster.
Copy the following into /etc/cluster/cluster.conf:
<?xml version="1.0"?> <cluster config_version="1" name="pacemaker1"> <logging debug="off"/> <clusternodes> <clusternode name="node1" nodeid="1"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="node1"/> </method> </fence> </clusternode> <clusternode name="node2" nodeid="2"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="node2"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="pcmk" agent="fence_pcmk"/> </fencedevices> </cluster>
Replace all occurances of “node1” and “node2” if using other hostnames, or add additional clusternodes if required.
Next, if using a 2-node cluster, modify /etc/default/cman:
echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/default/cman
Finally, start everything up:
sudo service cman start sudo service pacemaker start
Step 5: Verify the Cluster
After you have started the cluster on all nodes, run some basic checks to make sure everything is working. On one of the nodes:
nodeN:$ sudo crm status ============ Last updated: Fri Sep 20 11:48:34 2013 Last change: Thu Sep 19 16:26:46 2013 via crmd on node1 Stack: cman Current DC: nodeN - partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, unknown expected votes 0 Resources configured. ============ Online: [ node1 node2 ] nodeN:$ sudo corosync-cfgtool -s Printing ring status. Local node ID 1 RIND ID 0 id = 10.0.0.N status = ring 0 active with no faults
If corosync-cfgtool returns 127.0.1.1, or only 1/2 nodes are online, repeat this step on all other nodes and/or review Steps 1 and 2 to ensure the network is correctly configured.
Step 6: Configure and Use the Cluster
This is where things get complicated and implementation-specific, so the following is only a simple example. If using a two-node cluster, consider adding:
sudo crm configure property no-quorum-policy=ignore sudo crm configure property stonith-enabled=false
Then add a Dummy resource:
sudo crm configure primitive DummyService ocf:pacemaker:Dummy op monitor interval=60s
Check that the DummyService is running on one node, but visible to all the others using the “sudo crm status” command. To transfer the service to another node (nodeN) in the cluster:
sudo crm_resource --resource DummyService --move --node nodeN
At this point, you should have a working, if not pointless high-availability cluster running a useless service! For more information about configuring Pacemaker, please see Clusters From Scratch, or Cluster Lab’s Example Configurations.
Step 6: Configure and Use the Cluster
sudo crm configure property no-quorum-policy-ignore
it’s doesn’t work!sudo crm configure property no-quorum-policy=ignore
corrected. now it worksNoted and updated. Thanks.