This is my OLD blog. Thew new one can be found here


Intro to pacemaker part 2: Advanced topics

Not everything fitted in the first article and i felt a too long “first steps” article might alienate beginners. This time, i will go into some more advanced matters such as: stickiness vs location, N+1 clusters, advanced resource locations, clones and master/slave situations.

Before reading this article, you should at least get your feed wed playing around with pacemaker. I’ve read the pacemaker configuration A-Z guide before getting the basics straight and became very soon very lost with many described concepts. Shortly after i actually began using pacemaker (even only on my testing nodes), i discovered for myself what features seem to be missing and what would be nice to have – guess what: the guide somehow began to make sense all of a sudden. But maybe it’s just me.

Interlude^

Each concept introduced, will be interluded by a short use case, showing you a practical application before explaining the how-to. All of the examples will show a cluster using at least a virtual IP and one additional service (some more). The services are chosen because they somehow made sense or were easy to install – it is not about the services, but about pacemaker.

If not otherwise announced, i will use the following names, definitions and conventions:

  • Nodes are named node1, node2, node3, ..
  • Node IPs are 10.0.0.1, 10.0.0.2, 10.0.0.3, ..
  • Virtual IPs are 10.0.0.100, 10.0.0.101, ..
  • “#>” means: a command on any node
  • “#node1>” means a command on a certain node
  • A line with in a pre block before some commands beginning with “# ” is a comment, not output
  • A pre block without “#>” before is output
  • My heartbeat is 3.0.3
  • My pacemaker is 1.0.9
  • My nodes are either lenny (+backports) or squeeze

Assumptions to consider (please read carefully):

  • You know the basics about pacemaker (and heartbeat), maybe you have read the previous part.
  • If you want to replay what i did: you have setup the required amount of nodes and installed the services (eg apache) and pacemaker + heartbeat3.
  • Please read each topic to it’s full end, cause i have the bad habit to implement mistakes on purpose to make a point (and there are most probably some unintended flaws, which hopefully can be understood on the whole).
  • I try to use only crm on the command line, instead of the crm_* tools (crm_resource, crm_node and so on .. besides crm_mon), because it will help you to understand your cib configuration better (same syntax). In addition, i think working with crm’s coherent structure decreases the learn effort. I guess, it’s personal preference, eventually.
  • Service configurations, such as: ProFTPd, OpenLDAP, NFS, .. are not explained, but assumed – this would go beyond the scope of this article.
  • Most topics presume an empty configuration (if not otherwise stated at the beginning of the “Implementation” part and mostly beside configured nodes). So you supposedly have to cleanup at the beginning of each subject..
  • When i tell you to “reboot” i mean either (as you prefer)
    • Really reboot
    • use “crm node standby NODENAME” and then “crm online NODENAME” again
  • You will pledge not to do this in a live environment, because it would most certainly break something and in the best case interrupt running services

Anyway, let’s get into it..

Stickiness VS location: Where is the node and where will it stay ?^

Use case: 2-node proftpd cluster^

We have the nodes node1 and node2. node1 is active and at the beginning your resources will run here. However, if node1 fails, node2 takes over and begins serving. If node1 comes back again, he shall NOT take over the resources (back again)! They should stay where they are, at node2! More so: you plan to deploy this configuration on multiple active-passive pairs (for whatever reason) and you want to assure, that node1 (in each deployed cluster) always has the resources from the start (until the first failover occurs).

Implementation^

First of, here is the initial config

#> crm configure show
node $id="8d0882b9-1d12-4551-b19f-d785407a58e3" node1
node $id="e90c2152-3e47-4858-9977-37c7ed3c325b" node2
property $id="cib-bootstrap-options" \
    dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
    cluster-infrastructure="Heartbeat" \
    stonith-enabled="false" \
    last-lrm-refresh="1257846495"

Add resources^

Let us quickly create the resources, group them and make them dependent.

#> crm configure primitive VIRTUAL-IP ocf:heartbeat:IPaddr params ip="10.0.0.100" op monitor interval="10s"
#> crm configure primitive PROFTPD-SERVER lsb:proftpd op monitor interval="60s"
#> crm configure group FTP-AND-IP VIRTUAL-IP PROFTPD-SERVER
#> crm configure colocation FTP-WITH-IP inf: VIRTUAL-IP PROFTPD-SERVER

Make it sticky^

Next thing is to make the resources sticky, which means: stay on the server you are on, even if it was formerly the passive and the active is now online again. Or for hearbeat2 users: “auto_failover no”.

#> crm configure property default-resource-stickiness=1

Setup the initial location^

We like to have the resources on node1 for starters.

#> crm configure location PREFER-NODE1 FTP-AND-IP 100: node1

Pause: where we are^

Have a look with crm_mon, where we are:

crm_mon -1r
============
Last updated: Wed Nov 11 09:40:37 2009
Stack: Heartbeat
Current DC: node2 (e90c2152-3e47-4858-9977-37c7ed3c325b) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ node1 node2 ]

Full list of resources:

 NFS-EXPORT    (lsb:proftpd):    Started node1
 Resource Group: FTP-AND-IP
    VIRTUAL-IP    (ocf::heartbeat:IPaddr):    Started node1
    PROFTPD-SERVER    (lsb:proftpd):    Started node1

Alright, looks good so far. Now lets reboot node1, and see how node2 takes over. You can use “crm_mon” (without “-1″) to see a top-like status change.

And what happens ?  node2 takes over the resources, as expected.. but as soon as node1 is online again: all resources migrate back. Thats not what we wanted.

Math: Resource-stickiness vs location preference^

Ok, what went wrong is that the location (score/preference) of 100 outnumbers the default-resources-stickiness of 1. So what can we do to assure that the stickiness outweighs the location preference ? In short: set it higher. So how high do you think ? Well, something like 1000 would probably do it. But this article is not about guessing, but understanding. Lets try to determine the lowest (integer) value possible, which you can set and while STILL migrating back to the preferred location. Increment this value by 1 will give us the lowest possible score with which it will NOT re-migrates the resources.

Lets start with 99, which  is lower then 100, so it should do it. No, it doesn’t. You have more resources then one and you have to compare the summarized stickiness of the resources. Then again: how many resources are there? One virtual IP and the proftpd service, so 2 ? Nope, wrong again: you have three, you missed the group for which you set the location score. Those scores inherite upwards: total group score = group score + sum of ( scores of all resources). So the lowest possible value with which it will STILL MIGRATE back is 33, cause a stickiness of 33 * 3 = 99 (group + virtual ip + protfpd) is smaller then a preference of 100. So the smallest default value for stickiness is 34 to achieve NO MIGRATION (34 * 3 = 102 > 100).

It will certainly work if you set the default property to >= 34, but then again: not quite elegant, though ? What we really want, is to drop the default stickiness and set a stickiness to the actual group, for which we’ve already set the location preference. We could do a much easier math and will not run into problems as soon we begin using more then one resources group and more then 2-nodes (with up to n-1 location preferences). Eg, imagine the use case: we have 4 nodes: node1..node4. If a node fails, the resource will be handed down to the next (node1 -> node2 -> node3 -> node4). However, if one of the higher nodes will come up, we want the resource either to stay, if it is on node1 to node3, which are the primary nodes, but to move back, if it is on node4, which is the failover node. There are probably countless scenarios you can construct, so let’s get to the matter.

We will keep the default stickiness of 1, cause we still want a non-re-migrate behavior for all services, for which we do not set a location preference. 1 is small enough to be compensated in our math.

So lets set a stickiness for the group:

#> crm resource meta FTP-AND-IP set resource-stickiness 101

As mentioned above, we still have a default-resource-stickiness of 1, so the actual stickiness of FTP-AND-IP is 103 (group + 2 members a 1). I think it still is more readable then the required 99 (+2), because your config dump will make sense to you and, if you disable the default stickiness, it will still be enough. Here we go:

crm configure show
INFO: building help index
node $id="8d0882b9-1d12-4551-b19f-d785407a58e3" nfs4test1
node $id="e90c2152-3e47-4858-9977-37c7ed3c325b" nfs4test2
primitive PROFTPD-SERVER lsb:proftpd \
    op monitor interval="60s"
primitive VIRTUAL-IP ocf:heartbeat:IPaddr \
    params ip="192.168.178.41" \
    op monitor interval="10s"
group FTP-AND-IP VIRTUAL-IP PROFTPD-SERVER \
    meta resource-stickiness="101"
location PREFER-NODE1 FTP-AND-IP 100: nfs4test1
colocation FTP-WITH-IP inf: VIRTUAL-IP PROFTPD-SERVER
property $id="cib-bootstrap-options" \
    dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
    cluster-infrastructure="Heartbeat" \
    stonith-enabled="false" \
    last-lrm-refresh="1257846495" \
    default-resource-stickiness="1"

To assure yourself that everything works as expected, you might want to set a stickiness of 97 and reboot, to see how it DOES migrate back (97 + 1 + 1 = 99 < 100) and then try our 101 and reboot again (it will not migrate back).

Do not set an equal stickiness (in this example 98 + 1 + 1 = 100 == 100), cause pacemaker then can decide himself, which might look like random to you!

Caution after manual migrate!^

If you manually move a resource using “crm resource move“, “crm resource migrate” or “crm_resource ..“, you might overwrite your stickiness settings. Pacemaker will create a location-rule entry, which looks kind of the following:

location cli-prefer-FTP-AND-IP FTP-AND-IP \
    rule $id="cli-prefer-rule-FTP-AND-IP" inf: #uname eq node1

This entry assigns an score of infinity to the preference on node1. So, you have to remove this preference again, to assure your stickiness betters the location preference, as you expect.

Summary^

  • If you just want the good old “auto_failover off” back and do not care about nor will set locations, just set the default-resource-stickiness to 1.
  • If you want to use locations ..
    • .. for all resources, set default-resource-stickiness to 0 and define the meta attribute resource-stickiness for each primitive for which you set a location preference
    • .. for most resources, set default-resource-stickiness to 1 (if you want “auto_failover yes”) or 0 (if you don’t) and define the meta attribute resource-stickiness for each primitive for which you have set a location preference

N+1 clusters: Going beyond active-passive^

Use case: 3-node cluster, where one node serves NFS and one FTP and one is failover for both.^

You have three nodes, two of them shall be active serving NFS and FTP (each one). If one of the active nodes fails, the failover node shall take over it’s resources, which means: start the NFS/FTP and up the virtual IP. However, you don’t know which node will fail, you just want the services online.

Implementation^

First off, there will be no SAN storage in the example. We just work with NFS exporting a local dir and a virtual IP, cause i have tried all those configurations myself and did not wanted to waste time on setting up 2 more nodes (providing SAN). So, you require three nodes. If you already have them, you can skip the first topic where i’ll explain how to add a third node.

Adding a third node^

Skip, if you already have 3. If not, do as following:

Assure that all nodes have the other nodes in their /etc/hosts files:

# ..
10.0.0.1 node1
10.0.0.2 node2
10.0.0.3 node3
# ..

Assure that all nodes have all other nodes in their /etc/ha.d/ha.cf file:

# ..
# the nodes
node            node1
node            node2
node            node3

# heartbeats, over dedicated replication interface!
ucast           eth0 10.0.0.1
ucast           eth0 10.0.0.2
ucast           eth0 10.0.0.3
# ..

Copy the /etc/ha.d/authkeys file to the new node (chmod 0600).

Now shut down your cluster on all nodes! This is very important – and unpleasant, i know. (There is supposed to be a way with autojoin in ha.cf and later hb_addnode, but i read this to late and am to lazy to try it.).

Restart your cluster on all nodes. Have a look in “crm_mon -1r” and see the third node.

Setup resources, groups and dependencies (colocations)^

Add your virtual IPs

#> crm configure primitive VIRTUAL-IP1 ocf:heartbeat:IPaddr params ip="10.0.0.100" op monitor interval="10s"
#> crm configure primitive VIRTUAL-IP2 ocf:heartbeat:IPaddr params ip="10.0.0.101" op monitor interval="10s"

Create the NFS and FTP services

#> crm configure primitive NFS-SERVER lsb:nfs-kernel-server op monitor interval="60s"
#> crm configure primitive PROFTPD-SERVER lsb:proftpd op monitor interval="60s"

Group and relate the first virtual IP with NFS and the second with FTP

#> crm configure group NFS-AND-IP VIRTUAL-IP1 NFS-SERVER
#> crm configure colocation NFS-WITH-IP inf: VIRTUAL-IP1 NFS-SERVER
#> crm configure group FTP-AND-IP VIRTUAL-IP2 PROFTPD-SERVER
#> crm configure colocation FTP-WITH-IP inf: VIRTUAL-IP2 PROFTPD-SERVER

Have a look at crm_mon, it should kind of look like this:

#> crm_mon -1r
============
Last updated: Wed Oct  6 21:23:40 2010
Stack: Heartbeat
Current DC: node3 (055bb37b-6437-4bd7-8817-0aff3ab5549b) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
3 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ node1 node2 node3 ]

Full list of resources:

 Resource Group: NFS-AND-IP
     VIRTUAL-IP1    (ocf::heartbeat:IPaddr):    Started node1
     NFS-SERVER    (lsb:nfs-kernel-server):    Started node1
 Resource Group: FTP-AND-IP
     VIRTUAL-IP2    (ocf::heartbeat:IPaddr):    Started node1
     PROFTPD-SERVER    (lsb:proftpd):    Started node1

Preferred locations^

Ok here are the requirements formalized:

  • node1 shall serve NFS, as long as he is online
  • node2 shall serve FTP, as long as he is online
  • If node1 fails, node3 shall take over. If node3 is offline, node2 shall.
  • If node2 fails, node3 shall take over. If node3 is offline, node1 shall.
  • If node1 or node2 become online again, the resources shall be migrated back to them.

Thats easy to say, and easy to implement. Here we go:

“node1 shall serve NFS, as long as he is online”

#> crm configure location NFS-PREFER-1 NFS-AND-IP 100: node1

“node2 shall serve FTP, as long as he is online”

#> crm configure location FTP-PREFER-1 FTP-AND-IP 100: node2

“If node1 fails, node3 shall take over. If node3 is offline, node2 shall.”

#> crm configure location NFS-PREFER-2 NFS-AND-IP 50: node3
#> crm configure location NFS-PREFER-3 NFS-AND-IP 25: node2

“If node2 fails, node3 shall take over. If node3 is offline, node1 shall.”

#> crm configure location FTP-PREFER-2 FTP-AND-IP 50: node3
#> crm configure location FTP-PREFER-3 FTP-AND-IP 25: node1

The last “If node1 or node2 become online again, the resources shall be migrated back to them.” is implicit, because we are not working with stickiness.

Well, that was easy, wasn’t it ? Here is what your final config should look like:

#> crm configure show
node $id="055bb37b-6437-4bd7-8817-0aff3ab5549b" node3
node $id="8d0882b9-1d12-4551-b19f-d785407a58e3" node1
node $id="e90c2152-3e47-4858-9977-37c7ed3c325b" node2
primitive NFS-SERVER lsb:nfs-kernel-server \
    op monitor interval="60s"
primitive PROFTPD-SERVER lsb:proftpd \
    op monitor interval="60s"
primitive VIRTUAL-IP1 ocf:heartbeat:IPaddr \
    params ip="10.0.0.100" \
    op monitor interval="10s"
primitive VIRTUAL-IP2 ocf:heartbeat:IPaddr \
    params ip="10.0.0.101" \
    op monitor interval="10s"
group FTP-AND-IP VIRTUAL-IP2 PROFTPD-SERVER
group NFS-AND-IP VIRTUAL-IP1 NFS-SERVER
location FTP-PREFER-1 FTP-AND-IP 100: node2
location FTP-PREFER-2 NFS-AND-IP 50: node3
location FTP-PREFER-3 NFS-AND-IP 25: node1
location NFS-PREFER-1 NFS-AND-IP 100: node1
location NFS-PREFER-2 NFS-AND-IP 50: node3
location NFS-PREFER-3 NFS-AND-IP 25: node2
colocation FTP-WITH-IP inf: VIRTUAL-IP2 PROFTPD-SERVER
colocation NFS-WITH-IP inf: VIRTUAL-IP1 NFS-SERVER
property $id="cib-bootstrap-options" \
    dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
    cluster-infrastructure="Heartbeat" \
    stonith-enabled="false" \
    last-lrm-refresh="1286393008"

You can now begin wildly rebooting the nodes in any order and watch the resources move according to plan. As long as any of the 3 servers is online, all services are available.

Conclusion and a look ahead^

You can probably see how this will very much stabilize your infrastructure. Of course, it depends on how easy the resources can be moved (eg: a MySQL server with it’s local database). But what i wanted to show is, how simple you can lay the base for a stable multi-node deployment. It’s really not much more complex then your good old heartbeat2 2-node cluster (as soon as you begin to like the crm configuration syntax).

Of course you can increase the amount of nodes and decide complexer rules (resource X shall be on one of those three nodes, whereas resources Y can be served from any of the 5, ..) and also combine with stickiness (let the resource stay on the last node it was, as long as it is one of XYZ). Especially in the last scenario “crm_resource -r NFS-AND-IP -W” is where helpful, cause it shows you, where your resource is right now (it would be so nice, if this would work with my keys).

Variation N+1 cluster: reckless resources^

Use case: a 3-node cluster with NFS and FTP and preference to NFS^

The same as for the N+1 cluster, but with a difference: We know that each node can only serve one resource (FTP or NFS) at a time (eg: due to load). We have to make a choice whether NFS or FTP is more important. We go with NFS. So the difference is, that node2 can only moves to node3, if it is empty. node1, on the other hand, goes to node3 if it is online. If node2′s resources are already there, they will be shut down. If node1 and node3 are down, NFS will migrate to node2 and force FTP to shut down (thats the reckless part).

Implementation^

Keep the N+1 Cluster config, if you still have it, you only need to modify it. What we want is “mutual exclusion” of resources along with high priority resources.

Initial configuration^

Can be skipped, if you still have the N+1 config..

# ** Add your virtual IPs
#> crm configure primitive VIRTUAL-IP1 ocf:heartbeat:IPaddr params ip="10.0.0.100" op monitor interval="10s"
#> crm configure primitive VIRTUAL-IP2 ocf:heartbeat:IPaddr params ip="10.0.0.101" op monitor interval="10s"

# ** Create the NFS and FTP services
#> crm configure primitive NFS-SERVER lsb:nfs-kernel-server op monitor interval="60s"
#> crm configure primitive PROFTPD-SERVER lsb:proftpd op monitor interval="60s"

# ** group and relate
#> crm configure group NFS-AND-IP VIRTUAL-IP1 NFS-SERVER
#> crm configure colocation NFS-WITH-IP inf: VIRTUAL-IP1 NFS-SERVER
#> crm configure group FTP-AND-IP VIRTUAL-IP2 PROFTPD-SERVER
#> crm configure colocation FTP-WITH-IP inf: VIRTUAL-IP2 PROFTPD-SERVER

Set location preferences^

Here comes the difference. First let’s begin with the preferences for NFS-AND-IP, they haven’t changed:

#> crm configure location NFS-PREFER-1 FTP-AND-IP 100: node1
#> crm configure location NFS-PREFER-2 FTP-AND-IP 50: node2
#> crm configure location NFS-PREFER-3 FTP-AND-IP 25: node3

Now the preferences for FTP-AND-IP, which have changed – but only the last (if you have still your config, remove FTP-PREFER-3 beforehand via “crm configure delete FTP-PREFER-3“)

#> crm configure location FTP-PREFER-1 FTP-AND-IP 100: node2
#> crm configure location FTP-PREFER-2 FTP-AND-IP 50: node3
#> crm configure location FTP-PREFER-3 FTP-AND-IP -inf: node1

Setting “-inf” as priority for FTP-PREFER-3 on node1 will enforce that FTP-AND-IP will never migrate to node1. Which is of course a good thing, cause as long as node1 is online it serves NFS and as defined in the requirements: one server cannot server both. So if node2 and node3 are down FTP will be down!

Now comes the real clue which assures that NFS can use not only node3 for failover but also node2, if node1 and node3 are down:

#> crm configure colocation NFS-NOTWITH-FTP -inf: FTP-AND-IP NFS-AND-IP

So how does this work? Defining the negative (-infinity) colocation in exactly this order states: FTP-AND-IP can never be served (on the same node) where NFS-AND-IP runs. The order is important.

If you reverse it, you will have achieved something quite different: node1 can failover to node3, but if node2 also goes offline, FTP will be served from node3 – not NFS. Also: if node1 and node3 fail NFS will stay offline and not migrate to node2.

So try to read the part behind the colon as: “FTP-AND-IP will never exist on the same node as NFS-AND-IP and will rather shut itself down if NFS -AND-IP migrates to the node where it is.” It took me half an hour to identify the source of the reversed-behavior i described and figure out how it works (my first attempt to achieve the required scenario via rules and priorities)..

And here again is the full configuration your convenience:

node $id="055bb37b-6437-4bd7-8817-0aff3ab5549b" node3
node $id="8d0882b9-1d12-4551-b19f-d785407a58e3" node1
node $id="e90c2152-3e47-4858-9977-37c7ed3c325b" node2
primitive NFS-SERVER lsb:nfs-kernel-server \
    op monitor interval="60s"
primitive PROFTPD-SERVER lsb:proftpd \
    op monitor interval="60s"
primitive VIRTUAL-IP1 ocf:heartbeat:IPaddr \
    params ip="10.0.0.100" \
    op monitor interval="10s"
primitive VIRTUAL-IP2 ocf:heartbeat:IPaddr \
    params ip="10.0.0.101" \
    op monitor interval="10s"
group FTP-AND-IP VIRTUAL-IP2 PROFTPD-SERVER
group NFS-AND-IP VIRTUAL-IP1 NFS-SERVER
location FTP-PREFER-1 FTP-AND-IP 100: node2
location FTP-PREFER-2 FTP-AND-IP 50: node3
location FTP-PREFER-3 FTP-AND-IP -inf: node1
location NFS-PREFER-1 FTP-AND-IP 100: node1
location NFS-PREFER-2 FTP-AND-IP 50: node3
location NFS-PREFER-3 FTP-AND-IP 25: node2
colocation FTP-WITH-IP inf: VIRTUAL-IP2 PROFTPD-SERVER
colocation NFS-NOTWITH-FTP -inf: FTP-AND-IP NFS-AND-IP
colocation NFS-WITH-IP inf: VIRTUAL-IP1 NFS-SERVER
property $id="cib-bootstrap-options" \
    dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
    cluster-infrastructure="Heartbeat" \
    stonith-enabled="false" \
    last-lrm-refresh="1286393008" \
    default-resource-stickiness="0" \
    no-quorum-policy="ignore"

Finish^

As you can see, with only location and colocation you can cope with quite complex scenarios. As mentioned, i initially meant to write this topic about rules and priorities, but in the process if noticed that it was possible with a far less complex configuration. It seems, that you can create endless complex mutual exclusions with this technique – as long as you care about the order. An interesting article about about further informations about colocation can be found here.

Clones: all the same^

Use case: 3-node cluster with 2-nodes serving actively FTP and one is in “hot-standby”^

Your cluster consists of 3 nodes: two are serving FTP, one does no but has the FTP server up and running. The two active have different virtual IPs (eg: which you load balance from somewhere), the third has not – he is waiting to take over the load of any failing node. All your servers have mounted the same storage (howsoever they did). Whenever one of the first two nodes fail, the switch is as fast as possible: proftpd is already up and running (and the storage is mounted) on the idle node – all that it missing is the VIP. If two nodes fail, the left over node shall end up with two IPs – but only one proftpd instance (so that your silly load balancer still thinks he does good, but actually one node servers all).

Implementation^

It might look as another variation of the N+1 Cluster but it is not. It is more like an N-to-(N-1) cluster, cause all are actually kind of active in respective the proftpd resource, but only one or two are active respective the virtual IP. What i’d like to introduce is the clone-concept. If you understand it, you probably will be able to convert this example in a real N-to-N cluster and more complex scenarios. However, i will use the hot-standby node to make a point.

Setup required resources^

I’ll keep it short, we need one FTP resource and two IP resources:

#> crm configure primitive PROFTPD-SERVER lsb:proftpd op monitor interval="60s"
#> crm configure primitive VIRTUAL-IP1 ocf:heartbeat:IPaddr params ip="10.0.0.100" op monitor interval="10s"
#> crm configure primitive VIRTUAL-IP2 ocf:heartbeat:IPaddr params ip="10.0.0.101" op monitor interval="10s"

Create the clone^

This is also very easy.. first the command, then the explanation:

#> crm configure clone CLONE-FTP-SERVER PROFTPD-SERVER meta clone-max=3 clone-node-max=1

Ok, first of: we have a new resource called CLONE-FTP-SERVER. Interesting are the meta attributes which we have set. The first, clone-max, determines how many ACTIVE clones we will run on all nodes together. Setting it to three in our cluster of three nodes means: on each node will run a proftpd server. Actually we would not have to set it cause it defaults to the amount of nodes. The other param is clone-node-max which states: run at max one instance of this resource per node. If we would set it to 2 pacemaker would try to start 2 instances on one node which will neither work (with this particular service) nor would we want that. 1 is also the default value, so is redundant to set it (but descriptive).

The next thing we will do is set preferences for the clone. Because we have 3 servers and want to run 3 instances it will not matter. But if you have 4 nodes and run max 3 instances it certainly would!

#> crm configure location FTP-PREFER-1 CLONE-FTP-SERVER 100: node1
#> crm configure location FTP-PREFER-2 CLONE-FTP-SERVER 50: node2
#> crm configure location FTP-PREFER-3 CLONE-FTP-SERVER 25: node3

Again: imagine a fourth node, for which you would set the priority to 10 would assure that the first  (other) node failing will start another instance (we want 3 running altogether) on node4.

And some more location preferences, this time for the IPs. It will assure that node1 (as long as online) has the first virtual IP and node2 (if online) the second. If node1 fails, node3 will get the IP and straight begin serving, cause the FTP server is already up and running. If node2 fails -> he will also migrate to node3. If node1 and node2 fail, the first virtual IP will migrate to node2 (as in the first N+1 cluster). The same happens for the second VIP if node2 and node3 fail, but  ending on node1 of course. And the last possible scenario: if node1 and 3 fail, all IPs are on node2. Here we go:

#> crm configure location IP1-PREFER-1 VIRTUAL-IP1 100: node1
#> crm configure location IP1-PREFER-2 VIRTUAL-IP1 50: node3
#> crm configure location IP1-PREFER-3 VIRTUAL-IP1 25: node2
#> crm configure location IP2-PREFER-1 VIRTUAL-IP2 100: node2
#> crm configure location IP2-PREFER-2 VIRTUAL-IP2 50: node3
#> crm configure location IP2-PREFER-3 VIRTUAL-IP2 25: node1

Result^

And that’s all. We have 3 (hot and running) instances of the FTP server. The IPs are moved in our cluster with preferences to their “home” nodes. And what is the difference to the N+1 Cluster again? Again: we have “hot” nodes, the failover will be much faster. In technical means the major difference is that we use no groups/colocations of IP+FTP, but clones which makes it less complex.

But there are still some scenarios which you cannot implement with just clones. Think about the following: You have two MySQL nodes. One is the master, the other the replicated slave. When the master fails, you not only want to move the VIP, you also need to promote the Slave to master. Or the good old DRBD example: You have a storage server, which servers eg per NFS the storage to all clients. The storage itself is on a DRBD device for which the active node is the master and the passive node the slave. So if the master fails, the slave shall become the new master (not starting a new service, but changing it’s “role”) and start NFS + up the IP. This will be covered in the next and last chapter.

Master-slave: primus inter pares^

Use case: 2-node OpenLDAP master + replicated slave^

One node is the master, the other a replicated slave. You have one virtual-IP which stays with them master. If the master goes down, you want to move the VIP to the slave and promote him to master. Also: after the original master comes up again, you want him to become slave and replicate himself.

Implementation^

Let me begin with a theoretical summary of what multi-state is, how and under which circumstances it works and then proceed to the actual implementation.

Grasp the idea^

Multi-state is in itself a simple concept, which expresses: we have a resource and it is in exact one of multiple possible states. Those states are master and slave, thus a node is either master or slave. Further more, we define additional requirements, such as: a resource can only be on one node in the cluster master and one node can only have one master of a kind (yes, something like clones.. i will go into that later).

So what it is good for ? The classic (cause best documented) scenario is a cluster of two nodes running DRBD. In short, if you are not familiar with DRBD, what this service does (very rough and superficial):

With DRBD you can sync a storage between two servers via your network. It is settled under the actual filesystem (eg ext3) but above the physical device (egt /dev/sda). This means, writing a file will handle the write request (simplified) to your filesystem (ext3), then to DRBD (which will sync it to the other DRBD server) and then to the actual disk. With this you can achieve identical copies of one storage on two nodes. The downside is, if you don’t use a cluster filesystem (such as GFS2, OCFS2, …) only one node can mount the filesystem (ext3) at a time, otherwise it would corrupt very (very!) soon. Which leads us to the multi-state scenario: only one is primary (master) and has mounted the fs, the other is secondary (slave) and has not – but both run the DRBD resource. Here is the exact description.

Ok, so having no cluster fs, you have to assure that one node in your pacemaker cluster is master and the other slave. As mentioned, both nodes still run the same resource (DRBD). What brings this to mind ? Clones of course. They also implement running the same resource on multiple nodes. Actually, a multi-state resource is a variation of a clone (in programmer terms: multi-state inherits from clone). I hope you get an impression of where this might lead. I will now stop using DRBD as an example, cause it is already far better described then i can do.

Setting up the environment^

To bring something new to the table i have chosen OpenLDAP, cause it is dead simple to setup a master-slave replication. Since you might not have used OpenLDAP beforehand, and i do not want to make it unnecessarily difficult for you to replay my example, i will deviate from not providing any resource configurations and give you with the slapd.conf you’ll require (or setup yourself).

First install OpenLDAP

#all> aptitute install slapd

Here are the minimal configurations you need to deploy on your nodes (have a look at them, the slave configs only vary in the provider URL)

Those will overwrite your /etc/ldap/slapd.conf automatic via the later on provided OCF resource script. For now, you should only stop the slapd servers and remove them from init runlevels (both nodes):

#> /etc/init.d/slapd stop
#> update-rc.d -f slapd remove

Setting up the resources^

We will need two nodes, one virtual IP and our multi-state resource. If you still have 3 nodes from the above example: stop heartbeat on all nodes, remove nodes from ha.cf on node1 and node2, start heartbeat, run the following:

#> crm node delete node3

Now setup the virtual IP:

#> crm configure primitive VIRTUAL-IP1 ocf:heartbeat:IPaddr params ip="10.0.0.100" op monitor interval="10s"

Now it is getting tricky. You cannot use slapd only as an LSB resource, because LSB resources are not stateful. Furthermore, there is no OpenLDAP stateful resource in the heartbeat package. But dont panic, i wrote one. Before you download and install it, please remark: It is an example implementation meant to run in a testing-only environment. I have not testet it anywhere beside this tutorial and i strongly suggest not to use it for real. Even in your testing it is at your own risk. However, i hope it works at least.

So here it is:  slapd-ms

Download, untar and copy it on both nodes to /usr/lib/ocf/resource.d/custom/slapd-ms, afterwards set it executable.

We still need to add the resource in heartbeat. Therefore we first create a primitive and encapsulate it then within a ms (multi-state) resource:

#> crm configure primitive LDAP-SERVER ocf:custom:slapd-ms op monitor interval="30s"
#> crm configure ms MS-LDAP-SERVER LDAP-SERVER params config="/etc/ldap/slapd.conf" \
    lsb_script="/etc/init.d/slapd" meta clone-max="2" clone-node-max="1" master-max="1" \
    master-node-max="1" notify="false"

Both params tell the resource where the init script (/etc/init.d/slapd) and the config file (/etc/ldap/slapd.conf ) are. The clone meta attributes you already now, the master attributes are now. The first master-max defines that in the whole cluster only one instance can have the msater state and the other defines the same for a single node. Both are default settings.

The next thing we want is to colocate the IP with the master OpenLDAP server. We also will assure that the IP will up before slapd is started.

#> crm configure order LDAP-AFTER-IP inf: VIRTUAL-IP1 MS-LDAP-SERVER
#> crm configure colocation LDAP-WITH-IP inf: VIRTUAL-IP1 MS-LDAP-SERVER

One last last thing: i set the default-resource-stickiness to 1 in this example, because i would like the ldap server to stay on the last online node and do not migrate back.

Here is the full config:

#> crm configure show
INFO: building help index
node $id="8d0882b9-1d12-4551-b19f-d785407a58e3" node1
node $id="e90c2152-3e47-4858-9977-37c7ed3c325b" node2
primitive LDAP-SERVER ocf:custom:slapd-ms \
    op monitor interval="30s"
primitive VIRTUAL-IP1 ocf:heartbeat:IPaddr \
    params ip="10.0.0.100" \
    op monitor interval="10s"
ms MS-LDAP-SERVER LDAP-SERVER \
    params config="/etc/ldap/slapd.conf" lsb_script="/etc/init.d/slapd" \
    meta clone-max="2" clone-node-max="1" master-max="1" master-node-max="1" notify="false" \
    target-role="Slave"
colocation LDAP-WITH-IP inf: VIRTUAL-IP1 MS-LDAP-SERVER
order LDAP-AFTER-IP inf: VIRTUAL-IP1 MS-LDAP-SERVER
property $id="cib-bootstrap-options" \
    dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
    cluster-infrastructure="Heartbeat" \
    stonith-enabled="false" \
    last-lrm-refresh="1258039093" \
    default-resource-stickiness="1" \
    no-quorum-policy="ignore"

Conclusion^

Ok, where does this leave us ? We can now handle resources with multiple-states. Think MySQL replication or most databases for that matter. You can reboot a little and see how the slave gets the IP and becomes master while the old master, after boot, will stay the new slave. If you fill your LDAP with real data, it should also replicate fine.

FAQ^

My LSB resource “flickers” whenever an offline node becomes online. Is this normal ?^

Well, this is the “fault” of your init-runlevels. Eg, if you install proftpd in debian, if will install itself in all runlevels with start or stop links (have a look at /etc/rc[0-6].d). So, if your offline node boots again, it will start those services, according to runlevel settings. However, pacemaker will take them probably down again, as soon as it is aware. So, thats why you might see a short flicker, though. Depending on the service, this might do no harm or be a real problem. In any case, you can remove the services from runleveles, which i prefer to do, like so:

# check in which runlevels they are, save output maybe for later resetting
#> update-rc.d -n -f proftpd remove
# remove them
#> update-rc.d -f proftpd remove

Maybe you check after an upgrade of the debian package, whether they have been added again (which they shouldn’t, but..).

Update (from Erik): In Ubuntu 10.04 an later you can just disable the resources instead of removing them permanently:

update-rc.d proftpd disable

My resources won’t come up on any node – after node failure .. or playing around^

  • Did you set migration-threshold > 0 ? If so, the resources will stop migrating after they have more then this threshold. Set it to 0 as long as you are playing around..
  • Also check whether you have a negative infinite location for the nodes. Especially after a manual move/migrate pacemaker will add an inf location preference automatically!
  • If both is not the case, check the failcounts and other errors like “not installed” (“crm_mon -1r“, at the bottom of the output) and clear them
    • failcount: “crm resource failcount RESOURCE NODE 0″
    • other: “crm resource cleanup RESOURCE”
  • Did you set the resource unmanaged ? Set it back with “crm resource manage RESOURCE

Final words^

This will be my last blog entry about pacemaker, but there is still far more to know. There are things called rules, with which you can add a time component to your clusters behaviour. Also very complex colocation and location settings are possible with them. Also priorities can help you to prefer single resources with a finer granulation and more dynamic then you could with colocations… just to give you a hint about what is possible.

If you find any mistakes in my configuration or interpretations, please share.

Of course: no warranties or guaranties of any kind are given. Messed up system-configurations are likely. Headaches probable.

Links^

crm_mon -1r
============
Last updated: Wed Oct  6 21:23:40 2010
Stack: Heartbeat
Current DC: node3 (055bb37b-6437-4bd7-8817-0aff3ab5549b) – partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
3 Nodes configured, unknown expected votes
2 Resources configured.
============


Online: [ node1 node2 node3 ]

Full list of resources:

Resource Group: NFS-AND-IP
VIRTUAL-IP1    (ocf::heartbeat:IPaddr):    Started node1
NFS-SERVER    (lsb:nfs-kernel-server):    Started node1
Resource Group: FTP-AND-IP
VIRTUAL-IP2    (ocf::heartbeat:IPaddr):    Started node1
PROFTPD-SERVER    (lsb:proftpd):    Started node1