How to setup Pacemaker/Corosync on Ubuntu 12.04 LTS

By Andrew On September 20, 2013 · 2 Comments

Exisiting Pacemaker How-To Guides, like Highly Available iSCSI Target, use Heartbeat, which is an alternative to Pacemaker. Unfortunately such documentation does not mention that Heartbeat is no longer preferred. Ubuntu ClusterStack docs and others use Pacemaker, but are complicated, incomplete, or application-specific. Clusterlab’s Ubuntu Quickstart is clean and simple, but like other guides, it too lacks a critical bit of information, which is:

Corosync uses the hostname to bind to an interface. If the local machine’s hostname resolves to 127.0.1.1 (as it should), Corosync will only bind to the loopback interface, and other nodes will be unreachable (OFFLINE).

This may not be a problem in later version of Corosync, but it affects the version installed on Ubuntu 12.04.

Step 1: Setup Network

Perform this step on all nodes in the cluster.

Typically nodes in clusters don’t rely on DHCP or DNS, so configure the network interface of each machine to use a static IP by editing /etc/network/interfaces to look something like this:

auto ethN # where N is the interface number, ie. eth0 on machines with a single interface
iface ethN inet static
address 10.0.0.1 #change this for each node
netmask 255.255.255.0

In a two-node cluster you can connect both nodes directly using a cross-over cable. If using a switch, make sure it allows multicast.

Step 2: Setup Hosts

Perform this step on all nodes in the cluster.

Since DNS is typically not used, and Corosync relies on host names to communicate between nodes and bind to network interfaces, you’ll need to edit /etc/hosts. For a server with the hostname ‘node1.test.local’ and IP address 10.0.0.1, /etc/hosts will look like this:

127.0.0.1  localhost
#127.0.1.1 node1.test.local  node1
10.0.0.1   node1.test.local  node1
10.0.0.2   node2.test.local  node2
10.0.0.N   nodeN.test.local  nodeN # add a line for each additional node

You’ll need to comment-out the 127.0.1.1 line, then add the IP address and hostname of all other nodes in the cluster. Make the changes to /etc/hosts on all nodes in the cluster. If everything is going well, all nodes will be ‘pingable’ at their IP addresses. Note that you should see this:

node1:$ping node1
PING node1.test.local (10.0.0.1) 56(84) byes of data
64 bytes from node1.test.local (10.0.0.1): icmp_req=1 ttl=64 time=0.060 ms
...

and not this:

node1:$ping node1
PING node1.test.local (127.0.1.1) 56(84) byes of data
64 bytes from node1.test.local (127.0.1.1): icmp_req=1 ttl=64 time=0.060 ms
...

Step 3: Configure Corosync Binding

Perform this step on all nodes in the cluster.

Install some packages and ensure they start at boot:

sudo apt-get install pacemaker cman fence-agents
update-rc.d -f pacemaker remove
update-rc.d pacemaker start 50 1 2 3 4 5 . stop 01 0 6 .

Then edit /etc/corosync/corosync.conf:

#bindnetaddr: 127.0.0.1
bindnetaddr: 10.0.0.0

Where 10.0.0.0 is the IP address of the cluster’s subnet (it may vary depending on your IP address and netmask).

Step 4: Configure and Start the Cluster

Perform this step on all nodes in the cluster.

Copy the following into /etc/cluster/cluster.conf:

<?xml version="1.0"?>
<cluster config_version="1" name="pacemaker1">
  <logging debug="off"/>
  <clusternodes>
    <clusternode name="node1" nodeid="1">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="node1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node2" nodeid="2">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="node2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice name="pcmk" agent="fence_pcmk"/>
  </fencedevices>
</cluster>

Replace all occurances of “node1” and “node2” if using other hostnames, or add additional clusternodes if required.

Next, if using a 2-node cluster, modify /etc/default/cman:

echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/default/cman

Finally, start everything up:

sudo service cman start
sudo service pacemaker start

Step 5: Verify the Cluster

After you have started the cluster on all nodes, run some basic checks to make sure everything is working. On one of the nodes:

nodeN:$ sudo crm status
============
Last updated: Fri Sep 20 11:48:34 2013
Last change: Thu Sep 19 16:26:46 2013 via crmd on node1
Stack: cman
Current DC: nodeN - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, unknown expected votes
0 Resources configured.
============

Online: [ node1 node2 ]

nodeN:$ sudo corosync-cfgtool -s
Printing ring status.
Local node ID 1
RIND ID 0
     id = 10.0.0.N
     status = ring 0 active with no faults

If corosync-cfgtool returns 127.0.1.1, or only 1/2 nodes are online, repeat this step on all other nodes and/or review Steps 1 and 2 to ensure the network is correctly configured.

Step 6: Configure and Use the Cluster

This is where things get complicated and implementation-specific, so the following is only a simple example. If using a two-node cluster, consider adding:

sudo crm configure property no-quorum-policy=ignore
sudo crm configure property stonith-enabled=false

Then add a Dummy resource:

sudo crm configure primitive DummyService ocf:pacemaker:Dummy op monitor interval=60s

Check that the DummyService is running on one node, but visible to all the others using the “sudo crm status” command. To transfer the service to another node (nodeN) in the cluster:

sudo crm_resource --resource DummyService --move --node nodeN

At this point, you should have a working, if not pointless high-availability cluster running a useless service! For more information about configuring Pacemaker, please see Clusters From Scratch, or Cluster Lab’s Example Configurations.

2 Responses to How to setup Pacemaker/Corosync on Ubuntu 12.04 LTS

clusternobie says:

January 27, 2014 at 12:00 PM

Step 6: Configure and Use the Cluster
sudo crm configure property no-quorum-policy-ignore it’s doesn’t work!
sudo crm configure property no-quorum-policy=ignorecorrected. now it works
- Andrew says:
  
  January 31, 2014 at 3:53 PM
  
  Noted and updated. Thanks.