December 7, 2008

Day 7 - Host vs Service

An important distinction when talking about servers and services is to talk about them separately. Build automation in terms of configuration sets, not in terms of servers.

I tend to think of servers, machines, devices, whatever, as having labels or tags. Each label refers to a particular configuration set. Your automation tools should know what labels are on a host and only apply changes based on those labels. Modern administration tools such as Capistrano and Puppet are designed with this distinction in mind. Capistrano calls them 'roles' and puppet calls them 'classes,' but ultimately they're just some kind of name you apply to configuration or change.

Labels can be anything, but they should be meaningful. You might have "mysql-debug" and "mysql-production" service labels which both cause mysql to install but the debug version means you have heavier logging features enabled like full query logging, etc.

Configuring with labels instead of individual hosts helps you scale up. Managing configuration changes for a specific service lets you make one change to a service and have it deploy on any host having that service. Further, if you buy new server hardware, simply adding the appropriate labels to a host will let your automation system do the hard work of installation and configuration.

It helps you scale down, too. Here's a fictional example:

Quality control requested a production-like environment to test release candidates before pushing to production, but the budget will only allow you to use two server hosts for this. Production uses many more than this. If you automate based on labels instead of hosts, you could easily spread the required services across your two servers by simply labelling them, and automation would take care of the installation and configuration.

Assuming you have the development time or the tools available, you can use labels all over your automation:

  • Generate dns entries for all hosts with a specific label
  • Configure your monitoring system based on labels on a host
  • Configure firewall rules
  • Configure backup policy
  • etc...
A simple implementation of this would be a small yaml file with host:label mappings:
host1.prod.yourdomain:
- mysql-debug
host2.prod.yourdomain:
- memcache
- frontend
The deployment of these labels is up to you and the needs of your automation system. Keeping this in revision control gives you history with logs. Along with the other automation code and configuration you should be keeping in revision control, you might just be one step closer to being able to do more while working less.
With puppet
If you're using puppet, telling each host what it's labels (aka, puppet classes) are is easy, you need only write a script to help puppet know what classes to apply to a host (or node, in puppet's case). This document will show you how in puppet.
With capistrano
You'll want some piece of code that turns your yaml file of host:label entries into 'role <label>, <host1, host2, ...>. Something like this may do (ruby): (I called our yaml file 'hostlabels.yaml')
# roles.rb
require "yaml"
labelmap = Hash.new { |h,k| h[k] = [] } # default hash value is empty array
hosts = YAML::load(File.new("hostlabels.yaml"))
hosts.each { |host,labels|
  labels.each { |label| labelmap[label] << host }
}
labelmap.each { |label,hosts|
  role label, *hosts
}
And in your Capfile:
load "roles"   # use 'load' not 'require'

task :uptime, :roles => "frontend" do
  run "uptime"
end
And now 'cap uptime' will only hit servers listed in your yaml file as having the label 'frontend'. Cool.
I wanted to provide an example with cfengine, too, but I'm not familiar enough with the tool and my time ran out learning how to do it.

The yaml file example is not totally ideal, but it's a start if you have nothing. Evolutions beyond the simple host:services are the state configuration management tools where you store information about what is truth - such as for every machine that exists, mac addresses, IPs, service labels, hardware type, etc. It might include the class of "enterprise inventory management" suites by Oracle and others, too.

No comments :