This was written by Aleksey Tsalolikhin (http://verticalsysadmin.com/blog/). Illustrations by Joseph Kern
If you ask Wikipedia, "Configuration management (CM) is a field of management that focuses on establishing and maintaining consistency of a system."
Configuration management tools increase sysadmin efficiency and make sysadmin life better. As our systems grow larger and more complex, we need better tools to help us increase control and reliability of ever growing quanities and complexities in computing. Examples of such tools include Bcfg2, Cfengine, Chef, and Puppet - all of which are open source!
Configuring systems manually in interactive sessions is error-prone and extremely labor-intensive. Even with mostly-automated scripts, such as the typical "ssh and a for-loop" solution, pushing ad-hoc changes are still error prone. For example, if a system is down for maintenance while a change is being pushed out over ssh, it will miss that change, and "state drift" will occur between it and other systems in the same class.
You want a tool that helps keep actual and desired state the same.
System imaging is a common strategy for dealing with complexities of config management - make a copy of a system image, label it "gold master", and clone it to make new systems. While this approach helps to crank out identically configured systems, it has the weakness that updating the master image can be a pain and it does nothing to maintain the systems configured after the initial deploy. It is also not very auditable (what changed between golden image v1 and v2?).
Many sysadmins still configure systems with more traditional manual, ad-hoc, and hard-to-audit methods. In some cases, sysadmin teams build home-grown tools to solve these problems. An example of this is Ticketmaster, who released their own config management, "ssh and for loop" tool, and provisioning systems.
Why do we care to do this? Well, why do we administer systems? Correct configuration helps keep computer systems in use by human civilization.
CM tools free sysadmin's time for more challenging and creative system engineering and architecture work and for taking naps which power such work.
Minimize Manual Effort
Minimize manual effort by automatically configuring new systems. This works well because repeatable work is best left to computers; they don't get bored, and they don't forget steps.
"Go away or I will replace you with a very small shell script" - you've probably seen this shirt before, right? How about hearing someone recommend "automating yourself out of a job"? Building systems and fighting fires without any tools is a slow task that is difficult to repeat accurately, and with many sysadmin skills being software-related, it is in your interest to automate system turn up, maintenance, and repair. Automation helps reduce time spent in corrective actions, reduces mental energy consumed, reduces stress, and increases business value and agility. Winning!
In using a config management system, you are implicitly documenting the system's "desired state" - Why is the system configured this way? What are its dependencies? Who cares about the system? This documenting capability helps protect against knowledge loss by moving configuration knowledge out your brains and into a version control system. This helps defend against data lost through forgetfulness or staff changes, and it also facilitates alignment of efforts on a multi-sysadmin team.
In general, configuration management is in the realm of "Infrastructure as Code". Once your infrastructure is represented in code, you can think about apply release engineering and other tools - tag a new policy as "unstable", test it, then move the new policy into the "stable" branch where servers will apply it.
A Visualization
Getting Started
To encourage sysadmins to start using Configuration Management, the following is a rough manual of how to do some small tasks in a few different, open source configuration management tools demonstratiing what policies look like in common open-source server. Bourne shell examples are provided to help aid in understanding.
Using these examples
- Bourne shell: Can be run on the command line or via cron
- CFengine: Follow the quick start guide In a nutshell, put into a promise bundle inside a policy file (example.cf) and run from the command line with "cf-agent -f example.cf -b $bundlename"; or integrate into the default policy set in promises.cf in the CFEngine work directory, often found in
/var/cfengine/inputs
. - Chef: Follow the Chef Fast Start guide
- Puppet: Follow the Getting Started guide. For quick testing of these examples, you can write them to a file 'foo.pp' and execute them with
puppet apply foo.pp
. Puppet also supports a client-server model that is more common for production deployments.
Set Permissions on a File
Bourne shell
chmod 600 /tmp/testfile
CFengine
files: "/tmp/testfile" perms => m("600");
Chef
file "/tmp/testfile" do mode "0600" end
Puppet
file { "/tmp/testfile": mode => 0600; }
Create with some content
Bourne shell
echo 'Server will be down for maintenance 2 AM - 4 AM' > /etc/nologin
CFengine
files: "/etc/nologin" create => "true", edit_line => insert_lines("Server will be down for maintenance 2 AM - 4 AM");
Chef
file "/etc/nologin" do content 'Server will be down for maintenance 2 AM - 4 AM' end
Puppet
file { "/etc/nologin": ensure => present, content => "Server will be down for maintenance 2 AM - 4 AM"; }
Install a package
Bourne shell
yum -y install httpd
CFengine
packages: "httpd" package_policy => "add", package_method => yum;
Chef
package "httpd"
Puppet
package { "httpd": ensure => present; }
Make sure a service daemon is running
Bourne shell
ps -ef | grep httpd >/dev/null if [ $? -ne 0 ] then /etc/init.d/httpd start fi
CFengine
processes: "httpd" restart_class => "restart_httpd"; commands: restart_httpd:: "/etc/init.d/httpd start";
Chef
service "http" do action :start end
Puppet
service { "httpd": ensure => running; }
Final Thoughts
There's going to be a learning curve to any config management system, but I have found that the benefits in being able to audit, repeat, test, and share "desired state" in code far outweigh any time spent learning the config management tools.
Further Reading
- MTTR is more important than MTBF - John Allspaw presents that for many types of failure, speed of recovery is often more important than frequency of failure.
The CFEngine policies are made to look much more complicated than necessary here, because they don't make use of the standard library, so the editing example ought to be written
ReplyDeletefiles:
"/etc/nologin"
create => "true",
edit_line => insert_lines("Server will be down for maintenance 2 AM - 4 AM");
though your example does show how the CFEngine version can be extended in ways the others can't.
Also the CFEngine examples have a lot of comments that seem to make them longer. Comments are, of course, a good thing and CFEngine is a lot stronger on Knowledge Management as we saw at LISA this year.
Agree with the previous comment, but I think Aleksey was showing that it is possible to customize the behaviour of things in the CFEngine language - while many assumptions are hard-coded in Puppet and Chef.
ReplyDeleteUsing the abstraction facilities in CFEngine, we could make all the examples into one-liners too, so CFEngine extends to a lower level without having to open up the code and program, but in normal usage, you would just use these higher level abstractions.