Search This Blog

Monday, March 31, 2014

High Availability Stand Alone Zabbix (Failover Zabbix, MySQL, Apache and Postfix with DRBD-Pacemaker)

This article describes Distributed Replicated Block Device (DRBD) fault tolerance, analogous to network disk mirroring.

DRBD-Pacemaker Failover HA Cluster Diagram



High-Availability Clustering Technologies


High-availability clustering is designed to increase systems availability.  For this article, we will be using:


These technologies present two servers as one to the network; one server is active and the other is waiting to take over if the first fails or is taken off line.

The cluster is composed of elements that differ from a single-server deployment.  

  • Resources unique to each node
  • Data shared between nodes on the cluster
  • Services shared between the nodes on the cluster

Resources Unique to Each Node

Each node has is a server with its own operating system and hardware.  The processor, memory, disk and IO subsystems (including network interfaces) are controlled by the operating system installed on the boot partition.

Data Shared Between Nodes on the Cluster

Zabbix server includes serveral components:
  • Apache2 Web Server
  • MySQL Database Server
  • Postfix Mail Server
  • Zabbix Server
  • Zabbix Agent
  • Zabbix PHP Frontend
Once configured,  Apache and Postfix do not require additional modifications and may remain unique to each server; theie configuration files do not need to be shared.

For a MySQL cluster, there are two types of shared data:  configuration files and databases.  The configuration files are those located in the /etc/mysql/ directory.  When shared between the two nodes, the MySQL server will have an identical configuration regardless of the node that is active.  However, there are circumstances in which the MySQL configuration files may be unique to each server.  The databases are kept in the /var/lib/mysql/ directory and include the log files.

The Zabbix configuration files (of the server, client and front-end web server) are stored in the /etc/zabbix directory.  The PHP Frontend files (used by Apache) are stored in the /usr/share/zabbix/directory; these may remain on each server.

MySQL Clustering Caveats

Although the two nodes share the same MySQL databases, UNDER NO CIRCUMSTANCES SHALL THE TWO NODES SIMULTANEOUSLY ACCESS THE DATABASES.  That is, only one node may run the mysqld daemon at any given time.  If two MySQL daemons access the same database, there will eventually be corruption.  The clustering software controls which node accesses the data.

DRBD - Corosync - Pacemaker Overview

The illustration below depicts a high-availability cluster design.  Each server has four network interfaces:

  • eth0 -- the publicly addressable interfaces
  • eth1 -- the DRBD data replication and control interfaces
  • eth2 and eth3 -- the Corosync - Pacemaker control interfaces

The first interface -- eth0 -- is the publicly addressable interface that provides MySQL database and Apache web server (for PHPMyAdmin) access.  Two IP addresses (unique to server) are assigned at boot time and a third is assigned by the Corosync - Pacemaker portion of the clustering software.

The second interface -- eth1 -- is controlled by the DRBD daemon.  This daemon is configured with two or more files: /etc/drbd.d/global_common.conf and /etc/drbd.d/ro.res, r1.res, r2.res... assigned for each shared block device.  For this example, only r0.res is installed.  System-wide settings are defined in the global_common.conf file.  Settings specific to each pair of shared block devices are defined in the r0.res (and other) resource files.  DRBD defines an entire block device (or hard drive) as shared and replicated between two nodes.  In this example, block device /dev/sdb (a SCSI drive in each server) is shared between the two nodes as /dev/drbd0.  Once configured, both servers see /dev/sdb as a new block device /dev/drbd0 and only ONE may mount it at any given time.  The server with the mounted partition replicates any changes to the failover node.  If the first server fails or is taken off line, the other server immediately mounts its /dev/drbd0 block device and assumes control of replication.

The third and fourth interfaces -- eth2 and eth3 -- are controlled by the Corosync - Pacemaker software.  These interfaces provide communication links defining the status of each defined resource and control how and where shared services (such as shared DRBD block devices, IP addresses, and the MySQL / Apache2 daemons) run.  In failover clustering, only one node may actively be in control at any time.


DRBD-Pacemaker Failover HA Cluster Diagram

Installing and Configuring the MYSQL - Apache - Postfix - Zabbix Failover Cluster

It is assumed the reader knows how to set up a Zabbix server.  If not, please read and understand this article:  Installing and Configuring Basic Zabbix Functionality on Debian Wheezy

Leave MySQL listening on the default loopback interface 127.0.0.1.  Then, start the Linux Cluster Management Console (LCMC) -- a Java application that will install and configure everything required for clustering.  Select the two nodes by name or IP address, install Pacemaker (NOT Heartbeat) and DRBD.

Once both nodes have the required software, configure the cluster.  LCMC will prompt you for two interfaces to use in the cluster (select eth2 and eth3) and the two-node system will be recognized as a Cluster.  Configure global options and then select device /dev/sdb on one node and mirror it to /dev/sdb on the other to configure DRBD device /dev/drbd0.  Format it with an ext4 file system and make sure to perform an initial full synchronization.

When the DRBD device finishes synchronization, create a shared mount point -- /mnt/sdb -- and IP address (10.195.0.100 shared between eth0 on the nodes).  The cluster will now recognize the DRBD device as a file system on the Active node.

The shared data must then be moved to the DRBD device.  Stop MySQL on each node.  On the Active node, move the directories /etc/mysql, /var/lib/mysql and /etc/zabbix/ to /mnt/sdb/etc/mysql, /mnt/sdb/var/lib/mysql and /etc/zabbix/, respectively, and create symlinks back to their original locations.  On the Inactive node, delete the mysql and zabbix directories and replace them with symlinks to the same locations -- even though those locations are not yet visible.

On the Active Node:
  • mv /etc/mysql /mnt/sdb/etc/mysql
  • mv /etc/zabbix /mnt/sdb/etc/zabbic
  • mv /var/lib/mysql /mnt/sdb/var/lib/mysql
  • ln -s /mnt/sdb/etc/mysql /etc/mysql
  • ln -s /mnt/sdb/etc/zabbix /etc/zabbix
  • ln -s /mnt/sdb/var/lib/mysql /var/lib/mysql
 On the Inactive Node:
  • rm /etc/mysql /mnt/sdb/etc/mysql -r
  • rm /etc/zabbix /mnt/sdb/etc/zabbic  -r
  • rm /var/lib/mysql /mnt/sdb/var/lib/mysql -r
  • ln -s /mnt/sdb/etc/mysql /etc/mysql
  • ln -s /mnt/sdb/etc/zabbix /etc/zabbix
  • ln -s /mnt/sdb/var/lib/mysql /var/lib/mysql

The LCMC console is then used to finalize the shared services.  Add Primitive LSB Init Script resources (that is, only running on one server at a time) for MySQL and Apache2.  Once the services are installed, change the listener address in /etc/mysql/my.cnf to the shared IP Address of the cluster (10.195.0.100 for this installation).

Fail the servers back and forth several times to check that the system performs as expected.  Finally, install a database.  For this article, I install the Zabbix Monitoring database.

The video below illustrates the entire installation and configuration process.



1 comment :

  1. Congratulations for your great article !!
    You helped me a lot.
    Thank you!

    ReplyDelete