Category: Work


Assuming part I and part II worked for you, and you have tested that keepalived does indeed failover and route traffic, you need to deal with the services on those hosts. In the case of a LAMP environment, you are looking primarily at MySQL and Apache – and you have to address config files and data replication for each.

MySQL is fairly straightforward, and multi-master replication is covered well elsewhere. Personally, I used this guide to cover the setup. Glossing over most of the details, the process is this:

  1. set the unique ID’s for each database instance
  2. set the update increment for each instance (so multi-master won’t step on itself
  3. set each to replicate from the other
  4. stop external connections to the databases
  5. make sure the same data is loaded into each
  6. enable replication
  7. test

Setting up Apache is slightly more complex, assuming you don’t have a expensive shared clustered storage, as your config files (and website content) need to reflect one another. This is best handled in some kind of Source Code Management engine, like SVN or Git. Git is a better choice for this, it’s fully distributed – each copy of the repository is a full copy, and can stand alone in case of total failure of everything else in the world other than the server it is on. Sadly, I can’t share the scripts we use at work (nasty IP lawyers), so until I develop one in my free time to post here, this post is going to be short on content. In essence, what you do is make each server a git repo that serves can serve content over HTTPS or SSH. Script up the connection between the two (to push/pull data between your web server nodes), and then check that copy out to your preferred development platform.

If you want to get really fancy (which you should), setup a second VIP in keepalive so you have a place to test development changes before pushing the button and going live. As most LAMP stacks draw their content from the database backend anyway, this changes won’t happen too often, but it’s always good to be able to test on a real server before blowing up production. (To be extra super fancy, you’ll run two sets of apache daemons – with separate sets of config files – so when you add the new version of PHP, you don’t destroy the production environment.)

In the next and final installment of this series, we’ll look at the final part of this, which is harvesting the apache logs, looking for attackers, and adding them to the block lists automatically.

Now that you have keepalived up, and you are able to fail the IP addresses over between the hosts, you will need to address another crucial question: under what conditions should keepalived fail the IPs to the second host? For us it was pretty simple: if apache and mysql are both running, the node is healthy. If you have a shell script you’ve written that can determine node health, that’s great too. Just pay attention to the exit code you set, and you should be good to go.

The setup is done with the vrrp_script directive, where you tell keepalived what scripts to run, and what the outcome of those scripts is going to be on the priority level of each VIP. Two things to note: you need at least keepalived version 1.1.13 for vrrp_script to work, and you need to list the scripts before the vrrp_instance they are called in. Again, if you use my example password and the IP address listed here, you deserve any punishment you get.

Here is my example configuration file, after the additions of the scripts:


vrrp_script chk_httpd {
 script "killall -0 httpd"
 interval 2
 weight 50
}


vrrp_script chk_mysqld {
 script "killall -0 mysqld"
 interval 2
 weight 50
}


vrrp_instance VI_1 {
 state MASTER
 interface eth0
 virtual_router_id 1
 priority 101
 advert_int 1
 authentication {
  auth_type PASS
  auth_pass 319e49e4-88c2-4f83-a0e0-c0f332f6427c
 }


 virtual_ipaddress {
  1.2.3.4
 }
 track_script {
  chk_httpd
  chk_mysqld
 }
}

The script directive should be pretty obvious: it’s the command you are going to execute to get a return status. (I’m using killall -0, as it’s an exceptionally cheap way to determine if a process with a given name exists at least once in the process table.) The interval is the time, in seconds, between checks, and the weight is the number of points of priority are removed when the check fails. So, as long as your SLAVE config is set to a number higher than 51, when either mysql or httpd aren’t running, the VIP should fail over.

Setting up a modern, highly available system without external dependancies is a tricky business. It’s even tricker to think through when doing it on a shoestring budget. However, it’s critical for many operations. My employer, for example, keeps a set of tools available to administrators on a well-secured web server. This system is a front-end to complex Nagios deployment, user management tools, system maintenance tools. We can’t afford for it to go down when other systems are offline, so it’s design principals are much like that of a monitoring system: everything possible is internal to the system. Luckily, it’s not as hard as it sounds, and it’s free.

If you use CentOS or RHEL, you can follow this guide pretty much directly. Other flavors will have to adapt as needed to reflect local packaging standards, service administration and filesystem layouts. This guide is going to focus mostly on setting up keepalived to act as a failover mechanism, while also looking at replicated MySQL and failure detection for apache. Also, keep in mind this is about High Availability and failover, not load balancing.

First things first: you are going to want to have two servers that are pretty much identical. It helps in the long run for sizing and replacements. They are going to need to be somewhat beefy, as they are going to host the web servers and the database servers that administrators are going to hammer when there’s a system outage. There is nothing quite like melting the face off the server that the admins are using to put everything else back together. Also, put the servers in different data centers/different buildings, so when power goes out to one, the other is still up. The tricky bit is the servers are also going to need to be in the same Layer 2 broadcast domain, while being geographically separated. This usually means being on the same VLAN, and also usually means spanning those VLANs. Networking groups don’t usually like this. The alternative is MPLS and VPLS, which is expensive – at least if you use Cisco gear – and well beyond my knowledge. I understand it’s possible, but I digress.

Once they are on the same segment, it’s time to setup keepalived. Keepalived is usually used to act as a load balancer, and uses VRRP to pass the “router” address back and forth between themselves for redundancy. We’re going to use VRRP to do most of our magic today. Good news: on most modern RedHat-based distributions, installing keepalived is as simple as:

yum install keepalived

It’s not part of the Advanced Platform Server Magick that RedHat sells at a premium price. It’s just basic, vanilla, plain-Jane keepalived. Your config file is equally as simple, with two minor differences between the two hosts that will be sharing the service. On the first, set state to MASTER and priority to 101, and on the second set them to BACKUP and 75. Also, do yourself a favor and generate the passwords with uuidgen. I’ve added a password generated with uuidgen, but don’t use the one I’m pasting in here. Really. Also, give yourself a real IP address, not 1.2.3.4. (Honestly, if you use a password copied off a blog post and the IP address 1.2.3.4… well, you deserve what you get).


vrrp_instance VI_1 {
 state MASTER
 interface eth0
 virtual_router_id 1
 priority 101
 advert_int 1
 authentication {
  auth_type PASS
  auth_pass 319e49e4-88c2-4f83-a0e0-c0f332f6427c
 }


 virtual_ipaddress {
  1.2.3.4
 }
}

Once that’s done, and the config files are in place on both hosts, fire up keepalived on the master. There will be some messages in /var/log/messages talking about keepalived coming online, setting up loopbacks and entering MASTER state. You should now be able to ping the machine externally, and SSH into the virtual IP address you specified above. Bringing the second instance of keepalived online should start a short conversation in the logs – but the MASTER should remain master.

If there’s no conversation in the logs, check the virtual_router_id and auth_pass in the config files. If they match, and there’s still no conversation, you may have to poke iptables, as you need to explicitly allow vrrp traffic in iptables. In /etc/sysconfig/iptables, add the following:


# Keepalived VRRP traffic
-A INPUT -p vrrp -i eth0 -j ACCEPT

Once the hosts talk to each other, test failover. Turn off keepalived on the first host. It will take 2 or 3 seconds to fail over to the second host. At that point, you’ll be able to ping, but your SSH client will scream at you about a key mis-match. This is expected, and what you want to see. Success! The VIP has moved. I’ll remind you again, this is HA, not load balancing. When you bring keepalived back online, the IP will move back to the host you have set as MASTER in the config files.

At this point, you’ve done half of the hard work. As a breather, fire up apache on both boxes and fail keepalived back and forth between the hosts, and you’ll be able to see traffic going to one or the other. It’s pretty slick. In the next installment, I’ll talk about MySQL replication between the hosts.

My Office

The weather in my office has reached a sustained level of insanity. The cleverly designed nuclear furnace attached to my wall (disguised as a heater) is set to “MOLTEN SODIUM”. Nothing I can do about it. It’s also below freezing outside. Can’t change that either, as my weather control satellites are broken.

My options are:

a) Sit in my office with my door and my window open, where I alternate between boiling and freezing

b) Close my door and leave the window open, where there isn’t enough airflow, and I slowly roast

c) Close the window, where I break a sweat in less than 20 minutes.

Of course, being sane, I choose a). This has the nasty side effect of exposing me to sonic torture as the HVAC system located below my window comes on and off, as well as general road noise. There’s also the occasional odor of the street – carbon monoxide has never smelled so good. These factors combined make my sinuses try to kill me every day that I come to work, and have started waging war on my brain.

And Then There Was One…

It looks like Shufflegazine is dead:

Resolving www.shufflegazine.com... 64.202.189.170
Connecting to www.shufflegazine.com[64.202.189.170]:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.pxlwall.com/ [following]

It’s now a photoblog. I guess I won’t have a chance to archive some of the content I wrote (and foolishly didn’t backup), nor will I be able to point people to the blog when I reference it on LinkedIn or on my résumé. I wish all the former staff – especially the former editor – good luck and godspeed.