HA Linux Mail Server – Part 2 (RHCS Postfix/Dovecot/MySQL)

Now it’s fine to configure our services.

On both nodes install and disable these services (cluster services will manage starting/stopping)


yum install postfix mysqld dovecot

chkconfig postfix off

chkconfig mysqld off

chkconfig dovecot off

Copy the required data to our shared glusterfs storage.


cp -rp /var/lib/mysql /mysql/data/

cp -p /etc/my.cnf /mysql/data

cp -rp /etc/postfix /mail/data

cp -rp /etc/dovecot /mail/dovecot

cp -p /etc/init.d/postfix /mail/data/postfix.sh

cp -p /etc/init.d/dovecot /mail/data/dovecot.sh

mkdir /mail/data/vmail

/mail/data/vmail will hold our user mail, otherwise we are making sure that our configuration files are located on the shared storage so we have a consistent environment.

Update /mysql/data/my.cnf with


datadir=/mysql/data/mysql

Create /mail/data/mail.sh since our cluster service needs to call both postfix and dovecot.


#!/bin/bash
if [ "$1" == "status" ]; then
ps -ef | grep -v grep | grep "/usr/libexec/postfix/master"
exit $?
else
/mail/data/dovecot.sh $1; /mail/data/postfix.sh $1
exit 0
fi

Please note this is a quick and dirty hack. You should have more checks than just master running since we care about dovecot as well.

Now again on both nodes, make some symbolic links to the shared storage for our services.


mv /etc/postfix /etc/postfix.bak

ln -s /mail/data/postfix /etc/postfix

mv /etc/dovecot /etc/dovecot.bak

ln -s /mail/data/dovecot /etc/dovecot

You should now be able to start your services. If you run into any errors, check /var/log/messages or /var/log/cluster/cluster.log


clusvcadm -d postfix-svc

clusvcadm -d mysql-svc

clusvcadm -e postfix-svc

clusvcadm -e mysql-svc

To store users mail in /mail/data/vmail, make the following changes to /etc/postfix/main.cf – in this example we are using LDAP.

Both nodes in this case are replicating LDAP information from another server, so both the main LDAP server and one node could go down and users could still authenticate to the cluster services.

accounts_server_host = localhost
accounts_search_base = dc=example,dc=com
#Assumes users have a mail: attribute, if not use something else
accounts_query_filter = (mail=%u)
#accounts_result_attribute = homeDirectory
accounts_result_attribute = mail
#accounts_result_format  =  %u/Mailbox
accounts_result_format  = /var/vmail/%u/
accounts_scope = sub
accounts_cache = yes
accounts_bind = yes
accounts_bind_dn = cn=admin,dc=example,dc=com
accounts_bind_pw = PASSWORD
accounts_version = 3

virtual_transport = virtual
virtual_uid_maps = static:5000
virtual_gid_maps = static:5000
virtual_mailbox_base = /
virtual_mailbox_maps = ldap:accounts
virtual_mailbox_domains = example.com

For Dovecot, configure LDAP normally and then make the following changes

conf.d/auth-ldap.conf.ext:  args = uid=vmail gid=vmail home=/var/vmail/%u/
conf.d/10-mail.conf:mail_location = maildir:/var/vmail/%u
dovecot.conf:mail_location = maildir:/var/vmail/%u

Once this is done, restart your services (clusvcadm -R servicename) and send some test e-mails to yourself.

HA Linux Mail Server – Part 1 (RHCS CentOS 6/Glusterfs)

Things needed for RHCS on Centos 6

  • Two Centos 6 nodes
  • Shared storage (glusterfs)
  • Fencing mechinism (in our case, a custom fence_esxi found on this website)

I have two physical ESXi 5.1 servers, so the below will assume ESXi for fencing.

Install and configure glusterfs

I assume you have added additional disks to your VM for this (sdb, sdc)

Add the EPEL/EL repos for your distruction to /etc/yum.repos.d


yum install glusterfs glusterfs-server

Partition and format /dev/sdb1 and /dev/sdb2 as EXT4, adding the following to /etc/fstab ON BOTH NODES

In the below example, I am using LVM

/dev/vg01/mysql         /mysql                   ext4    defaults        0 0
localhost:/gv0          /mysql/data             glusterfs       defaults 0 0
/dev/vg02/mail          /mail                   ext4    defaults        0 0
localhost:/gv1          /mail/data              glusterfs       defaults 0 0

Mount /mysql and /mail (not /*/data yet)

Now on only one node, do the following commands


gluster volume create gv0 replica 2 transport tcp centos-cluster1:/mysql/brick centos-cluster2:/mysql/brick

gluster volume start gv0

gluster volume create gv1 replica 2 transport tcp centos-cluster1:/mail/brick centos-cluster2:/mail/brick

gluster volume start gv1

You should now be able to mount /mail/data and /mysql/data on both nodes.

 

Install RHCS and tools on both nodes


yum install luci ricci rgmanager cman fence-agents corosync

Once done with this step, make sure both node has the other node in their /etc/hosts file and configure your nodes to use static IPs.

Use luci to create the basic configuration of your cluster.  (https://NODE1IP:8084)

Manage Cluster – Add

Once created add your nodes under “Nodes”

Create your fence devices (Use fence_vmware or something similar for now, we will change it later most likely) for each ESXi server.

Create a failover domain (Prioritized=No, Restricted=No)

Create two IP address resources (one for mail services, one for mysql)

Finally create your service groups for each service.

Order should be: IP Resource – Service

For your postfix-svc, use the “Script” type and define the script file as /mail/data/mail.sh

For mysql-svc, use the “Mysql” type and use for the Config File: /mysql/data/my.cnf

You should now be able to run “clustat” on either node, although your services will be failed or disabled for now.

Finally, let’s finish up fencing.

See http://linuxadministration.us/?p=256 for ESXi 5.1 (if you’re using something else, you’ll have to do this part yourself I’m afraid)

Test fencing with the “fence_node” command. Do not skip this step! Make sure fencing works before moving forward.

HA Linux Mail Server – Part 0

I’ve been working on a project to learn more about RHCS in general and decided to build a HA Mail server.

Technologies Used:
– Centos 6.6
– RHCS
– Postfix
– Dovecot (IMAP)
– OpenLDAP
– GlusterFS
– MySQL
– Roundcube

Over the coming weeks I’ll be posting the steps required to set this up.

The general idea is to have two RHCS services

mail-svc

  • VIP for postfix/dovecot
  • glusterfs for delivered mail, postfix and dovecot configuration files

mysql-svc

  • VIP for mysql
  • glusterfs for mysql config and data

 

Each node runs the following locally

  • httpd + roundcube
  • OpenLDAP replicating from another LDAP server

While the above two services could certainly be cluster services, this is not required.

We NAT our public IP to the VIP for mail-svc.

To avoid SSH key issues, copy your SSH keys from /etc/ssh to the other node (So SSH to the public IP will not result in errors after a failover)

 

 

RHCS ESXi 5.1 Fencing

Turns out that using the “free” version of ESXi 5.1 does not work with RHCS fencing due to SOAP limitations.

Place the following file in /usr/sbin/fence_esxi and chmod a+x /usr/sbin/fence_esxi

Make sure to install paramiko (yum install python-paramiko)

#!/usr/bin/python

import paramiko
import sys
import time
import datetime
import re
sys.path.append("/usr/share/fence")
from fencing import *

device_opt = [  "help", "version", "agent", "quiet", "verbose", "debug", "action", "ipaddr", "login", "passwd", "passwd_script", "ssl", "port", "uuid", "separator", "ipport", "power_timeout", "shell_timeout", "login_timeout", "power_wait" ]

options = check_input(device_opt, process_input(device_opt))

f = open("/var/log/cluster/fence_esxi.log","w+")
ts = time.time()
st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
f.write(st + " starting fencing.\n")
f.write("-n " + options["-n"] + "\n")
f.write("-a " + options["-a"] + "\n")
f.write("-l " + options["-l"] + "\n")
f.write("-p " + options["-p"] + "\n")

client = paramiko.SSHClient()
client.load_system_host_keys()
client.connect(options["-a"],username=options["-l"],password=options["-p"])

command="esxcli vm process list | grep ^" + options["-n"]  + " -A 1 | tail -n 1 | sed \'s/  */ /g\' | cut -d \" \" -f 4"

f.write("Cmd: " + command + "\n")

stdin, stdout, stderr = client.exec_command(command)
while not stdout.channel.exit_status_ready():
        f.write("Waiting for command to finish... \n")
        time.sleep(2)

wwid = stdout.read()
f.write("wwid: " + wwid + "\n")

if len(wwid) < 2:
	f.write("VM not found or alread offline \n")
	client.close()
	sys.exit(1)

f.write("VM found \n")
command="esxcli vm process kill --type=soft --world-id=" + wwid
f.write("Cmd: " + command + "\n")

stdin, stdout, stderr = client.exec_command(command)
while not stdout.channel.exit_status_ready():
        f.write("Waiting for command to finish... \n")
        time.sleep(2)

#Give the VM some time to shut down gracefully
time.sleep(30)
f.write("Waited 30 seconds \n")

command="vm-support -V | grep centos | cut -d \"(\" -f 2 | cut -d \")\" -f 1"
f.write("Cmd: " + command + "\n")

stdin, stdout, stderr = client.exec_command(command)
while not stdout.channel.exit_status_ready():
        f.write("Waiting for command to finish... \n")
        time.sleep(2)

status = stdout.read()
f.write("VM Status: " + status + "\n")
sregex = re.compile('Running')

if sregex.search(status):
	f.write("VM still running, hard kill required \n")
	command="esxcli vm process kill --type=hard --world-id=" + wwid
	f.write("Cmd: " + command + "\n")
	stdin, stdout, stderr = client.exec_command(command)
	while not stdout.channel.exit_status_ready():
	        f.write("Waiting for command to finish... \n")
	        time.sleep(2)

	time.sleep(30)
else:
	f.write("VM successfully soft killed \n")

#Get VM info while powered off
command="vim-cmd vmsvc/getallvms | grep " + options["-n"] + " | sed 's/  */ /g' | cut -d \" \" -f 1"
f.write("Cmd: " + command + "\n")
stdin, stdout, stderr = client.exec_command(command)
while not stdout.channel.exit_status_ready():
        f.write("Waiting for command to finish... \n")
        time.sleep(2)

vmid = stdout.read()

#Start VM back up
command="vim-cmd vmsvc/power.on " + vmid
f.write("Cmd: " + command + "\n")
stdin, stdout, stderr = client.exec_command(command)

while not stdout.channel.exit_status_ready():
	f.write("Waiting for command to finish... \n")
	time.sleep(2)

f.write("fence_esxi exiting...")
f.close()
client.close()
sys.exit(0)

In your cluster.conf you would then have something like the following. Make sure to enable SSH on your ESXi hosts


<fence>
<method name="fence-cluster1">
<device name="esx1" port="centos-cluster1" ssl="on" />
</method>
</fence>

<fencedevices>
<fencedevice agent=”fence_esxi” ipaddr=”esx2.FQDN.com” login=”root” name=”esx2″ passwd=”YOURPASSWORDHERE” delay=”60″ />
</fencedevices>