HA Linux Mail Server – Part 1 (RHCS CentOS 6/Glusterfs)

Things needed for RHCS on Centos 6

  • Two Centos 6 nodes
  • Shared storage (glusterfs)
  • Fencing mechinism (in our case, a custom fence_esxi found on this website)

I have two physical ESXi 5.1 servers, so the below will assume ESXi for fencing.

Install and configure glusterfs

I assume you have added additional disks to your VM for this (sdb, sdc)

Add the EPEL/EL repos for your distruction to /etc/yum.repos.d


yum install glusterfs glusterfs-server

Partition and format /dev/sdb1 and /dev/sdb2 as EXT4, adding the following to /etc/fstab ON BOTH NODES

In the below example, I am using LVM

/dev/vg01/mysql         /mysql                   ext4    defaults        0 0
localhost:/gv0          /mysql/data             glusterfs       defaults 0 0
/dev/vg02/mail          /mail                   ext4    defaults        0 0
localhost:/gv1          /mail/data              glusterfs       defaults 0 0

Mount /mysql and /mail (not /*/data yet)

Now on only one node, do the following commands


gluster volume create gv0 replica 2 transport tcp centos-cluster1:/mysql/brick centos-cluster2:/mysql/brick

gluster volume start gv0

gluster volume create gv1 replica 2 transport tcp centos-cluster1:/mail/brick centos-cluster2:/mail/brick

gluster volume start gv1

You should now be able to mount /mail/data and /mysql/data on both nodes.

 

Install RHCS and tools on both nodes


yum install luci ricci rgmanager cman fence-agents corosync

Once done with this step, make sure both node has the other node in their /etc/hosts file and configure your nodes to use static IPs.

Use luci to create the basic configuration of your cluster.  (https://NODE1IP:8084)

Manage Cluster – Add

Once created add your nodes under “Nodes”

Create your fence devices (Use fence_vmware or something similar for now, we will change it later most likely) for each ESXi server.

Create a failover domain (Prioritized=No, Restricted=No)

Create two IP address resources (one for mail services, one for mysql)

Finally create your service groups for each service.

Order should be: IP Resource – Service

For your postfix-svc, use the “Script” type and define the script file as /mail/data/mail.sh

For mysql-svc, use the “Mysql” type and use for the Config File: /mysql/data/my.cnf

You should now be able to run “clustat” on either node, although your services will be failed or disabled for now.

Finally, let’s finish up fencing.

See http://linuxadministration.us/?p=256 for ESXi 5.1 (if you’re using something else, you’ll have to do this part yourself I’m afraid)

Test fencing with the “fence_node” command. Do not skip this step! Make sure fencing works before moving forward.

HA Linux Mail Server – Part 0

I’ve been working on a project to learn more about RHCS in general and decided to build a HA Mail server.

Technologies Used:
– Centos 6.6
– RHCS
– Postfix
– Dovecot (IMAP)
– OpenLDAP
– GlusterFS
– MySQL
– Roundcube

Over the coming weeks I’ll be posting the steps required to set this up.

The general idea is to have two RHCS services

mail-svc

  • VIP for postfix/dovecot
  • glusterfs for delivered mail, postfix and dovecot configuration files

mysql-svc

  • VIP for mysql
  • glusterfs for mysql config and data

 

Each node runs the following locally

  • httpd + roundcube
  • OpenLDAP replicating from another LDAP server

While the above two services could certainly be cluster services, this is not required.

We NAT our public IP to the VIP for mail-svc.

To avoid SSH key issues, copy your SSH keys from /etc/ssh to the other node (So SSH to the public IP will not result in errors after a failover)

 

 

RHCS ESXi 5.1 Fencing

Turns out that using the “free” version of ESXi 5.1 does not work with RHCS fencing due to SOAP limitations.

Place the following file in /usr/sbin/fence_esxi and chmod a+x /usr/sbin/fence_esxi

Make sure to install paramiko (yum install python-paramiko)

#!/usr/bin/python

import paramiko
import sys
import time
import datetime
import re
sys.path.append("/usr/share/fence")
from fencing import *

device_opt = [  "help", "version", "agent", "quiet", "verbose", "debug", "action", "ipaddr", "login", "passwd", "passwd_script", "ssl", "port", "uuid", "separator", "ipport", "power_timeout", "shell_timeout", "login_timeout", "power_wait" ]

options = check_input(device_opt, process_input(device_opt))

f = open("/var/log/cluster/fence_esxi.log","w+")
ts = time.time()
st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
f.write(st + " starting fencing.\n")
f.write("-n " + options["-n"] + "\n")
f.write("-a " + options["-a"] + "\n")
f.write("-l " + options["-l"] + "\n")
f.write("-p " + options["-p"] + "\n")

client = paramiko.SSHClient()
client.load_system_host_keys()
client.connect(options["-a"],username=options["-l"],password=options["-p"])

command="esxcli vm process list | grep ^" + options["-n"]  + " -A 1 | tail -n 1 | sed \'s/  */ /g\' | cut -d \" \" -f 4"

f.write("Cmd: " + command + "\n")

stdin, stdout, stderr = client.exec_command(command)
while not stdout.channel.exit_status_ready():
        f.write("Waiting for command to finish... \n")
        time.sleep(2)

wwid = stdout.read()
f.write("wwid: " + wwid + "\n")

if len(wwid) < 2:
	f.write("VM not found or alread offline \n")
	client.close()
	sys.exit(1)

f.write("VM found \n")
command="esxcli vm process kill --type=soft --world-id=" + wwid
f.write("Cmd: " + command + "\n")

stdin, stdout, stderr = client.exec_command(command)
while not stdout.channel.exit_status_ready():
        f.write("Waiting for command to finish... \n")
        time.sleep(2)

#Give the VM some time to shut down gracefully
time.sleep(30)
f.write("Waited 30 seconds \n")

command="vm-support -V | grep centos | cut -d \"(\" -f 2 | cut -d \")\" -f 1"
f.write("Cmd: " + command + "\n")

stdin, stdout, stderr = client.exec_command(command)
while not stdout.channel.exit_status_ready():
        f.write("Waiting for command to finish... \n")
        time.sleep(2)

status = stdout.read()
f.write("VM Status: " + status + "\n")
sregex = re.compile('Running')

if sregex.search(status):
	f.write("VM still running, hard kill required \n")
	command="esxcli vm process kill --type=hard --world-id=" + wwid
	f.write("Cmd: " + command + "\n")
	stdin, stdout, stderr = client.exec_command(command)
	while not stdout.channel.exit_status_ready():
	        f.write("Waiting for command to finish... \n")
	        time.sleep(2)

	time.sleep(30)
else:
	f.write("VM successfully soft killed \n")

#Get VM info while powered off
command="vim-cmd vmsvc/getallvms | grep " + options["-n"] + " | sed 's/  */ /g' | cut -d \" \" -f 1"
f.write("Cmd: " + command + "\n")
stdin, stdout, stderr = client.exec_command(command)
while not stdout.channel.exit_status_ready():
        f.write("Waiting for command to finish... \n")
        time.sleep(2)

vmid = stdout.read()

#Start VM back up
command="vim-cmd vmsvc/power.on " + vmid
f.write("Cmd: " + command + "\n")
stdin, stdout, stderr = client.exec_command(command)

while not stdout.channel.exit_status_ready():
	f.write("Waiting for command to finish... \n")
	time.sleep(2)

f.write("fence_esxi exiting...")
f.close()
client.close()
sys.exit(0)

In your cluster.conf you would then have something like the following. Make sure to enable SSH on your ESXi hosts


<fence>
<method name="fence-cluster1">
<device name="esx1" port="centos-cluster1" ssl="on" />
</method>
</fence>

<fencedevices>
<fencedevice agent=”fence_esxi” ipaddr=”esx2.FQDN.com” login=”root” name=”esx2″ passwd=”YOURPASSWORDHERE” delay=”60″ />
</fencedevices>

OpenLDAP SSL Replication

Following the excellent guide here: https://wiki.debian.org/LDAP/OpenLDAPSetup

I was able to get LDAP replication working fairly easily. There are two problems with this however.

1. The default slapd configuration will use dc=nodomain (if no domain was picked at install) otherwise whatever domain you picked at install. You are not asked to choose, so of course if you have a different domain than your LDAP server replication will not function.

2. The above guide does NOT use SSL for replication for some reason

On your client, do the following to change dc=nodomain to whatever it should be for replication


/etc/init.d/slapd stop
rm /var/lib/ldap/*
vi /etc/ldap/slapd.d/cn\=config/olcDatabase\=\{1\}hdb.ldif

Update all dc=nodomain entries to dc=your,dc=domain

Then start slapd

/etc/init.d/slapd start

Create an LDIF file like the following (in this case, mirror.ldif)


dn: olcDatabase={1}hdb,cn=config
changeType: modify
add: olcSyncrepl
olcSyncrepl: rid=004 provider=ldaps://YOURMASTERHOSTNAME:636 bindmethod=simple binddn="cn=mirrormode,dc=bbis,dc=us" credentials=YOURPASSWORD tls_reqcert=never searchbase="dc=bbis,dc=us" schemachecking=on type=refreshAndPersist retry="60 +" tls_cert=/etc/ldap/ssl/server.pem tls_cacert=/etc/ldap/ssl/server.pem tls_key=/etc/ldap/ssl/server.pem
-
add: olcMirrorMode
olcMirrorMode: TRUE
-

Note that “rid=004” should be different for each LDAP server you bring in to play. Replace dc=bbis,dc=us with your domain.

Now add it to your schema

ldapmodify -QY EXTERNAL -H ldapi:/// -f mirror.ldif

Use ldapsearch to verify functonality

ldapsearch -H ldap://127.0.0.1 -x

Cisco IOS Import UCC Certificate

This assumes you have already requested and received your UCC certificate (IIS/Apache/etc.)

crypto ca trustpoint godaddy
enrollment terminal
chain-validation stop
revocation-check none
exit

crypto ca authenticate godaddy
—–BEGIN CERTIFICATE—–
Root Godaddy CA Cert (gd-class2-root.crt)
​https://certs.godaddy.com/anonymous/repository.pki
—–END CERTIFICATE—–

!Intermediate trustpoint
crypto ca trustpoint intermediate-primary
enrollment terminal
chain-validation continue godaddy
revocation-check none

crypto ca authenticate intermediate-primary
—–BEGIN CERTIFICATE—–
This is the first file inside the PFX container (gd-g2_iis_intermediates​)
—–END CERTIFICATE—–

crypto ca trustpoint intermediate-secondary
enrollment terminal
chain-validation continue intermediate-primary

crypto ca authenticate intermediate-secondary
—–BEGIN CERTIFICATE—–
This is the second file inside the PFX container (gd-g2_iis_intermediates)
—–END CERTIFICATE—–

crypto pki import godaddypriv pkcs12 tftp: password PASSWORDHERE
#pkcs12 you export from Windows

crypto pki trustpoint intermediate-secondary
rsakeypair godaddypriv

crypto ca import intermediate-secondary certificate
—–BEGIN CERTIFICATE—–
This should be the CRT godaddy gave you, the file you import into IIS
—–END CERTIFICATE—–

Debian 7 LACP Bonding /etc/network/interfaces

/etc/network/interfaces


auto bond0
iface bond0 inet static
address 10.10.0.100
gateway 10.10.0.1
netmask 255.255.255.0
slaves eth0 eth1 eth2
bond-mode 802.3ad
bond-miimon 100
bond-lacp-rate 4

ExtremeWare configuration

As you may be able to guess, this configures ports 41, 42 and 43 for LACP


enable sharing 41 grouping 41,42,43 dynamic

Summit 400-48t Enable Advanced Edge License

I noticed the license is a length 7 numeric value, so I modified an existing script floating around for the Summit 400-48t. Should work with any Summit 200/300/400 however.

You may need to modify the “Summit400” near the end of the file to match your prompt. This runs much quicker than I’d think; got my license after 4 hours or so.

Uncomment the dump.log line if you have trouble to debug


#!/usr/bin/perl
use Net::Telnet;
use warnings;

$t = new Net::Telnet(
Timeout => 10,
# Dump_Log => "dump.log",
Prompt => '/\* Summit*$/',
Errmode=>'die'
);
$t->open('SWITCHIPADDRESS');
$t->waitfor('/login: $/i');
$t->print('admin');
$t->waitfor('/password: $/i');
$t->print('YOURPASSWORD');
$t->waitfor('/Summit400/i');

for($x = 1; $x < 9999999; ++$x) { print "Trying: $x\n"; #@lines = $t->cmd("enable license advanced-edge $x");
$t->print("enable license advanced-edge $x");
$t->waitfor('/Summit400/i');
}