This is my OLD blog. Thew new one can be found here

Intro to pacemaker on heartbeat

With squeeze around the corner it’s time to reconsider your everything with debian – once again ;) Of course i am aware, that pacemaker is available since 2008, but sticking to debian stable, my first contact with pacemaker came with squeeze. Also, i was not really aware what i missed, because my simple setup just worked. With my new database server, already squeezing, and my virtual army of test installations, i’ve spent about 50 hours or so with 6.0 and can say: nicely done, again. However, sometimes there is work before pleasure, and that’s the case with heartbeat 3.0.2 in squeeze (lenny: 2.1.3). This article will not cover pacemakers abilities in depth, it aims to help with the very first steps into the cold water and / or give you an impression of what you can expect, before you decide to change everything.


Pre-Prelude: What is heartbeat 2 ?^

This article targets heartbeat 2 users, which are already familiar with the matter. However, here is a short, rough, strongly oversimplified summary on clusters in general and heartbeat 2 in particular for readers totally new to clustering:

  • Clusters are “a group of linked computers, working together closely thus in many respects forming a single computer” (wikipedia)
  • The most simple implication in a real world scenario is probably an active-passive composite of two computers, in which the active is called master and the passive slave. If the master goes down, the slave can takeover all services transparently (for the clients using the master) and thus keep the services online. There are more sophisticated cluster structures then active-passive, but not in heartbeat (2).
  • Heartbeat 2 is a simple clustering solution, with which you can build active-passive clusters of two nodes. The nodes watch each other via unicast or multicast pings and thus determine whether the “other” is available or not (and act accordingly).
  • Heartbeat can handle two kind of “resources”: OCF and LSB. This means: it comes with a bunch of scripts (OCF) which provide such things as setting up / shutting down virtual IP addresses, mounting DRDB or iSCSI volumes and so on. LSB resources are roughly the scripts found in /etc/init.d/, as long as they can start, stop and status.
  • Heartbeat 2 is growing old. The next generation of new and far more capable (and complex) cluster managers use the services heartbeat provides merely as a messaging layer, and add their logic above it. Such a cluster resource manager (CRM) is pacemaker.. and this is where this article starts.


Before heartbeat 3.0 (or 2.9.9), heartbeat was “standalone”, self sufficient and could be installed and configure in a matter of minutes. For many installations, it was enough and if you wanted a little bit more (eg monitoring services on a higher OSI layer), you could easily add mon and you were fine. Starting with 2.9.9, the linux-ha team strongly discourages from using heartbeat without pacemaker. So, thats what i will do. Naive as i am, and based on my experiences with heartbeat beforehand, i went right into it and straight against the next wall. Do not confuse pacemaker with your good ol’ heartbeat. It is not, it is far more (especially far more complex), also far better (imho). But once, you are into to it, you will like it.

What is the difference ?^

Using heartbeat, you might be under the false assumption, that you have a cluster. That, so i learned, is not the case. Heartbeat is now degraded to a simple messaging layer (or a cluster engine) and insofar rather part of the whole cluster stack. So how is this any useful to you? Well, you can (correctly) assume that heartbeat is substitutable, namely with corosync (a fork from OpenAIS, stripped by everything not needed by pacemaker). The first time you encounter this diametrical difference is when you try to install heartbeat on a squeeze machine:

aptitude install heartbeat -s
The following NEW packages will be installed:
 ca-certificates{a} cluster-agents{a} cluster-glue{a} fancontrol{a}
 file{a} gawk{a} heartbeat libcluster-glue{a} libcorosync4{a} libcurl3{a}
 libesmtp5{a} libheartbeat2{a} libidn11{a} libltdl7{a} libmagic1{a}
 libnspr4-0d{a} libnss3-1d{a} libopenhpi2{a} libopenipmi0{a}
 libperl5.10{a} libsensors4{a} libsnmp-base{a} libsnmp15{a} libssh2-1{a}
 libtimedate-perl{a} libxml2-utils{a} libxslt1.1{a} lm-sensors{a}
 mime-support{a} openhpid{a} openssl{a} pacemaker{a} python{a}
 python-central{a} python-minimal{a} python2.6{a} python2.6-minimal{a}
0 packages upgraded, 37 newly installed, 0 to remove and 19 not upgraded.
Need to get 16.1MB of archives. After unpacking 48.4MB will be used.

Thats the point where you have to decide whether you stick with your old and haresources files (and rebuild the package with fewer dependencies) or dare to welcome the future ;) My native play instinct compelled me to do the latter.

Ok, back to the topic. What is the difference? Well, at first glance not very much. You end up with a cluster of your nodes, whereas one is the master and one the slave. That is, if you want to keep it that simple. With pacemaker, you can build clusters of more then two nodes. There is no limit (that i am aware of apart from the common sense). Furthermore, pacemaker allows you to build also N+1 and N-to-N clusters. A good example for the former is a 3-node cluster, in which the third is the failover node for both. You assume that not both (active) nodes will fail at the same time and preserve therefore only one passive node which can overtake the services, which is of course much more economic and realisitic then two active-passive clusters. The N-to-N cluster, you can imagine when thinking about four nodes, which all have the same resources and export (actively) the same services. Your assumption would be, that three of the four nodes can bear the load of all – at least for the time you require to bring the failed node up again. You probably can carry the though further by yourself, eg a cluster of 10 nodes where 2 are passive and 8 active and so on (think: Raid1, Raid5, Raid6, ..).

A very first step^

Before i go on, here some conventions and assumtptions:

  • I’ll build an active-passive cluster of two nodes
  • Node 1 is called nfs4test1 and has the IP
  • Node 2 is called nfs4test2 and has the IP
  • Router (for ping) is
  • Failover / virtual IP will be
  • Do not do this on a live system, it will become offline

Assuming you have installed heartbeat 3.0.2 + pacemaker (1.0.9) with the simple one-liner from above, we can directly begin with the first required changes. (If you have only lenny available, use backports). If you are upgrading from an existing heartbeat 2 cluster, shut it down beforehand!

Open the /etc/ha.d/ file with your favorite editor and change it, so that is looks kind of this (replace with your IPs and your node names):

# enable pacemaker, without stonith
crm             yes

# log where ?
logfacility     local0

# warning of soon be dead
warntime        10

# declare a host (the other node) dead after:
deadtime        20

# dead time on boot (could take some time until net is up)
initdead        120

# time between heartbeats
keepalive       2

# the nodes
node            nfs4test1
node            nfs4test2

# heartbeats, over dedicated replication interface!
ucast           eth0 # ignored by node1 (owner of ip)
ucast           eth0 # ignored by node2 (owner of ip) 

# ping the switch to assure we are online

Then assure that you have each host in each /etc/hosts file:    nfs4test1    nfs4test2

Now you should setup the secret key in /etc/ha.d/authkeys. It should look something like this (change to 0600, only owner access afterwards!):

auth 1
1 sha1 your-secret-password

Everything on both nodes, of course!

Old haresources: If you have an old /etc/ha.d/haresources file, do not remove it, cause it will come in handy later.

So far, nothing new, besides the second line in the “crm yes”. This is, where you tell heartbeat: you are controlled by pacemaker. So do not miss it.

Now let’s start heartbeat on both nodes:

#> /etc/init.d/heartbeat start

Exploring the wild^

As soon it finishes with success, go to any of the two nodes and let’s examine the monster – i meant the pacemaker config:

#> cibadmin -Q

This should output a large XML file. You can see the contents of my initial start here. The first and most important task for you is: don’t get scared and do not bother about all the funny long id-attributes.You will not have to deal with the pure XML files – or let’s say: only very seldom (eg backup and recover).

What you should see are the following four lines:

  <node id="8d0882b9-1d12-4551-b19f-d785407a58e3" uname="nfs4test1" type="normal"/>
  <node id="e90c2152-3e47-4858-9977-37c7ed3c325b" uname="nfs4test2" type="normal"/>

Those are in a configuration-tag, which is under the root cib-tag. They are generated from your file and declare the two nodes of our new cluster. For now, they don’t do very much. Now let us have a look at on of the most important pacemaker components: the cluster resource manger monitor, or in short: crm_mon. If you create larger clusters later on by yourself, you might want to know where a particular resource (eg a virtual IP address) is for the moment.

#> crm_mon -1r
Last updated: Wed Nov 11 04:57:19 2009
Stack: Heartbeat
Current DC: nfs4test1 (8d0882b9-1d12-4551-b19f-d785407a58e3) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
0 Resources configured.

Online: [ nfs4test1 nfs4test2 ]

Full list of resources:

The “-1″ means: show once (without it acts like top) and the “-r” stands for “show also inactive ressources”. What you can learn from the output is that both of the nodes are online. You can amuse yourself by killing one and see what happens. Then again, maybe you postpone this for later.

Before we will add resources, i will introduce another very important command line  tool: crm, which is the actual pacemaker shell.

#> crm status

This will show the same as the crm_mon command above, but crm_mon can do more, so stick with it for monitoring.

#> crm configure show
  node $id="8d0882b9-1d12-4551-b19f-d785407a58e3" nfs4test1
  node $id="e90c2152-3e47-4858-9977-37c7ed3c325b" nfs4test2
  property $id="cib-bootstrap-options" \
      dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \

This shows your configuration, much the same way as “cibadmin -Q” did, but readable (no XML). Here again, are your two nodes. The other output can be ignored for now.

You do not have to always run the whole line from your linux shell, you can directly enter the crm shell by typing crm, without args:

#> crm
crm(live)# configure
crm(live)configure# show
  node $id="8d0882b9-1d12-4551-b19f-d785407a58e3" nfs4test1
  node $id="e90c2152-3e47-4858-9977-37c7ed3c325b" nfs4test2
  property $id="cib-bootstrap-options" \
      dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
crm(live)configure# exit

You will use the crm shell whenever you have to add a complex resource or so, because you can make all changes and execute then a “commit” (by typing so). Otherwise, you might produce unhealthy states in the process.

First resource^

Before adding the resource, better disable stonith, because, if you have no such device it will produce a lot of errors (if you do not know what it is, even more so):

#> crm configure property stonith-enabled=false

Now we will add the first actual resource and give the whole cluster manger a sense in live. We use the heartbeat OCF scripts for this.

#> crm configure primitive FAILOVER-IP ocf:heartbeat:IPaddr params ip="" op monitor interval="10s"

The resource is added and should be immediately online. The “op monitor interval=10s” part is very important, cause without it, pacemaker will not check wheter the resource is online (on the nodes). Check the resource by asking crm_mon:

#> crm_mon -1
Last updated: Wed Nov 11 05:20:23 2009
Stack: Heartbeat
Current DC: nfs4test1 (8d0882b9-1d12-4551-b19f-d785407a58e3) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
1 Resources configured.

Online: [ nfs4test1 nfs4test2 ]

 FAILOVER-IP    (ocf::heartbeat:IPaddr):    Started nfs4test1

You can see, that is is on the host nfs4test1, in my case. Very interesting is also the crm_resource command:

#> crm_resource -r FAILOVER-IP -W
resource FAILOVER-IP is running on: nfs4test1

Now let us look again in the configuration. But not on the node, where you have been working on: on the other.

#nfs4test2> crm configure show
  node $id="8d0882b9-1d12-4551-b19f-d785407a58e3" nfs4test1
  node $id="e90c2152-3e47-4858-9977-37c7ed3c325b" nfs4test2
  primitive FAILOVER-IP ocf:heartbeat:IPaddr \
      params ip="" \
      op monitor interval="10s"
  property $id="cib-bootstrap-options" \
      dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
      cluster-infrastructure="Heartbeat" \
      stonith-enabled="false" \

As you can see, the second node is totally aware of everything we did. It is not necessary, as with heartbeat < 3, to deploy the haresources file on each node.

Practical application^

In this topic, i will highlight some useful practices, which you will probably like to know in most real-world deployments within the first 15 minutes.

Common resource operations^

One thing, you might want to do is remove a resource, this can be achieved by running:

#> crm configure delete FAILOVER-IP

You will probably receive an error message, saying: “WARNING: resource FAILOVER-IP is running, can’t delete it“. Ok then. Let us disable the resource beforehand:

#> crm resource stop FAILOVER-IP
#> crm configure delete FAILOVER-IP

And now it works! This behavior might save your a** eventually..  Ok, add the resource again (above command).

If you ever need to stop / (re)start a resource, just do as follows:

#> crm resource stop FAILOVER-IP
#> # wait some time..
#> crm resource start FAILOVER-IP

Also very common is the moving of a single resource to another node:

#nfs4test1> crm resource move FAILOVER-IP nfs4test2

As always, check with “crm_mon -1″..

Resources are not only virtual IP addresses or single services (like apache, nfs-kernel-server and so on), but also resource groups. I will show a usage example later on.

Node maintenance^

Before we proceed to special cases, let’s cover the basic admin tasks. The most important is to force one node to takeover all resources (eg cause you want to restart the other). Here is command (execute it on the node which has currently not the virtual ip):

#> crm node standby nfs4test1

Thats all. The other node takes over the resources (the virtual ip) and you can restart or whatever the standby node. Have also a look at the monitor output:

#> crm_mon -1r
Last updated: Tue Nov 10 11:16:57 2009
Stack: Heartbeat
Current DC: nfs4test1 (8d0882b9-1d12-4551-b19f-d785407a58e3) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
1 Resources configured.

Node nfs4test1 (8d0882b9-1d12-4551-b19f-d785407a58e3): standby
Online: [ nfs4test2 ]

Full list of resources:

 FAILOVER-IP    (ocf::heartbeat:IPaddr):    Started nfs4test2

As you can see, nfs4test1 is in standby, all resources are hosted by nfs4test2. For the sake of example, you might want to reboot the standby node now. While rebooting, the node will be listed as OFFLINE:

Node nfs4test1 (8d0882b9-1d12-4551-b19f-d785407a58e3): OFFLINE (standby)

It might take some seconds until the node is  recognized as online again (but still standby), but it should show up eventually. You can activate it now again with:

#> crm node online nfs4test1

After this, the resource will re-migrate back to nfs4test1.

Resource groups, dependencies and startup-order^

Above, we worked with a single resources. Normally, you would probably have multiple resources, which only work combined (eg: the virtual IP is for the host which exports the NFS storage).

So let’s add a nfs-storage:

#> crm configure primitive NFS-EXPORT lsb:nfs-kernel-server op monitor interval="60s"

LSB stands for Linux Standard Base and means, that the service you implement has to apply some standards. For debian this reads: most services in /etc/init.d/ can be used, as long as they can “start”, “stop” and “status”. Eg nginx under lenny can not. However, the nfs-kernel-server can.

#> crm configure show
  node $id="8d0882b9-1d12-4551-b19f-d785407a58e3" nfs4test1
  node $id="e90c2152-3e47-4858-9977-37c7ed3c325b" nfs4test2
  primitive FAILOVER-IP ocf:heartbeat:IPaddr \
      params ip="" \
      op monitor interval="10s"
  primitive NFS-EXPORT lsb:nfs-kernel-server
      op monitor interval="60s"
  property $id="cib-bootstrap-options" \
      dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
      cluster-infrastructure="Heartbeat" \
      stonith-enabled="false" \

Ok, now you will see, why resource groups are mandatory. After adding the resource, it cam up (for me) on nfs4test2, whereas the IP came up at nfs4test1. This, of course, is totally rubbish.

Lets put them in a group.

#> crm configure group NFS-AND-IP FAILOVER-IP NFS-EXPORT

This declared a resource group named NFS-AND-IP, containing the two created resources. Unfortunately, it is not done yet. Being in a group only helps us moving the resource easily (we do not move FAILOPVER-IP, but NFS-AND-IP). We have to declare a dependency between those two.

#> crm configure colocation NFS-WITH-IP inf: FAILOVER-IP NFS-EXPORT

As soon as this is executed, the resources will re-locate themself on a single server only. In the above example, NFS-WITH-IP is the name of the colocation (also a resource, so it has to have a name) and inf is short for infinity, which assures the precedence of this dependency. You can work with multiple relations and numeric scores, i leave this to the reader to figure out.

Not enough for us, let us set the correct order in which the resources have to come up:

#> crm configure order IP-BEFORE-NFS inf: FAILOVER-IP NFS-EXPORT

This created another resource IP-BEFORE-NFS .. yes, everything is somehow a resource. Now we are done.

You can build dependencies (colocations), orders and groups of any kind of resource. Try not to break it, but play somewhat around. Review the resource commands above. They apply also on groups (eg stop a whole group, migrate and so on..).


How can i use the /etc/init.d services as i could in heartbeat 2 ?^

Very simple, just add the resource like so:

#> crm configure primitive NAME lsb:INIT_D_NAME op monitor interval="123s"

Whereas INIT_D_NAME is the script name in /etc/init.d/ .. eg apache2 for /etc/init.d/apache2

What OCF resources are there ?^

Execute the following:

#> crm ra list ocf heartbeat

You can get a description about the resources and all parameters with

#> crm ra meta NAME

How can i add my custom resource ?^

Have a look in /usr/lib/ocf/resource.d/heartbeat/ (debian) for starters. Look at the scripts and how they are written. Then add yours.

My resources does not come up again^

Try reset the failcounter:

#> crm resource failcount RESOURCE-NAME set NODE-NAME 0

and bring it online afterwards:

#> crm resource start RESOURCE-NAME

I want to maintain a resource, but dont want to disabled it / shut it down^


#> crm resource meta RESOURCE-NAME set is-managed false

do what you have to do, and set it managed again:

#> crm resource meta RESOURCE-NAME set is-managed true

Pacemaker will not migrate it or try to set it online somewhere else, until you enabled managing again (afaik).

How do i backup/restore my config ?^

You can either save/reload your configuration in/from CIB format or XML.


# Save backup
#> crm configure save /path/to/backup
# Restore backup
#> crm configure load replace /path/to/backup
# .. or, if you have only slight "update" modifications
#> crm configure load update /path/to/backup


# Save backup
#> cibadmin -Q > /path/to/backup
# Restore backup
#> cibadmin --replace --xml-file /tmp/resources.xml

Do is still require mon ?^

Well.. probably not. Thats where the “op monitor ..” resource parameters come in. On the other hand, it does only perform the checks on the host itself. With mon, you can test wheter a particular service is available from any other point of your network. It depends on your structures/setup, i guess.

How can i use the good old “auto_failback off”^

Yes you can.

#> crm configure property default-resource-stickiness=1


As always: Do not blame me if this destroys your servers or makes you drink three more beer. No warranties or guaranties of any kind.

Further Reading^

#> cibadmin --replace --xml-file /tmp/resources.xml