December 23, 2010

Day 23 - Package vs Config management.

Written by Joshua Timberman

Package management is a best practice in system administration. So is automated configuration management. However, the maintainer scripts run by package management tools are an anti-pattern almost in direct conflict or competition with configuration management systems.

In my examples I'm going to talk about Debian packages and Chef, because that is what I use. Adapt your mindset for your own favorite distribution and configuration management tool.

Server Lifecycle

When almost all the modern, popular Linux distributions were created, servers had a general lifecycle, and an expected supportability throughout that lifecycle. Some distributions have a commercial entity that provides paid support. Others have an excellent user community that volunteers their time to help users and administrators. Many considerations in the development of the Linux distribution stem from the expectation that someone will require support, and the distribution should provide a supportable release. In addition to this, the package's maintainer scripts is what provides additional configuration, such as creating users, or starting services provided by the package.

Package Management

One of the value-adds of most Linux distributions is the package management system. Package management behavior and maintainer scripts are well documented by the distribution to be supportable by a company of support engineers, or a community of volunteers. For system administrators, however, the main reason to use package management is to get some pre-compiled software on the system, and to resolve and install any dependencies that package may have; it is less necessary to have a service start on package install. For example, CouchDB requires Erlang and various other libraries, so the package manager would install those libraries, Erlang and CouchDB. While package management has many other benefits, such as version management, and they can do things like drop off configuration files and start up daemons that were installed. There is definite business value in using packages, and that's why it is a sysadmin best practice.

Many system administrators create their own packages and host them on an internal repository. In most of the environments I've worked in, these packages were as simple as just managing the files included in the package usually ignoring the upstream culture of maintainer scripts and other policies, because the system administrator planned to use a configuration management tool to automate setup and maintenance of the software to run the business application. In these cases, the software provided by the distribution did not meet the needs of the business in some way. Perhaps an application required a newer library version, or you needed to patch in a feature or bug fix, or the default setup of a package conflicted with the way a business application was deployed.

Configuration Management

There are as many different application deployments as there are businesses. The different ways the application stacks are deployed provide a specific business value. The application stack often includes a number of the distribution-provided packages, as well as the code written by the business's software developers.

However, most companies have unique needs when it comes to how the software runs in their environment. Perhaps the HTTP server default configuration isn't properly tuned for the web application that it serves. Maybe the business requires that the MySQL server have replication slaves, and this configuration is not enabled by default. Perhaps the system administrator(s) that run the servers have tuned a particular web server for performance, but it conflicts with another web server package. The actual conflict is based on configuration, not on binaries that are created - both packages by default listen on the same port when the service is started.

For these reasons and more, automated configuration management tools such as Chef are now modern system administration best practice.

The problem we face, is that the packages that we install often run a number of maintenance scripts to ensure that the package is set up and configured. The distribution included the scripts to enforce some policy such as where to put certain configuration files, start services, or where to locate data files created by the packaged software. In some cases, the package maintainer scripts only perform actions when the package is removed (postrm in Debian/Ubuntu), and if there are problems, they don't surface until the package is removed.

Example of the Conflict

To illustrate the conflict between package maintainer scripts and configuration management systems, let's look at a couple use cases with MySQL. We are using Chef to automatically install the mysql-server package on Ubuntu 10.04 LTS running on an instance in Amazon EC2. Our two business requirements are setting a randomly generated root password and move the MySQL data directory to ephemeral storage, as the default location is on a smaller filesystem size. Normally, the package installation on Ubuntu will prompt the user for input on the password, which we then need to work around to automate the package installation. We'll need to generate a preseed file to give the proper settings to the package manager. We install mysql-server on a test system:

sudo apt-get install mysql-server

(And enter a bogus password when prompted, which is what we are trying to avoid).

To get the preseed settings for the package, we need the debconf-get-selections package:

sudo apt-get install debconf-get-selections

Then we get the mysql-server settings for our preseed file:

sudo debconf-get-selections | grep ^mysql-server > mysql-server.seed

We'll use a template that has a generated password (@mysql_root_password), along with the rest of the contents in the file:

mysql-server-5.1 mysql-server/root_password_again select <%= @mysql_root_password %>
mysql-server-5.1 mysql-server/root_password select <%= @mysql_root_password %>

And we set this up with Chef using a template and execute resource:

template "/var/cache/local/preseeding/mysql-server.seed" do
  source "mysql-server.seed.erb"
  owner "root"
  group "root"
  mode "0600"
  notifies :run, "execute[preseed mysql-server]", :immediately
end

execute "preseed mysql-server" do
  command "debconf-set-selections /var/cache/local/preseeding/mysql-server.seed"
  action :nothing
end

Then we have a package resource that installs mysql-server:

package "mysql-server"

Next, we want to configure an alternate location for the MySQL database on the ephemeral storage, as the database size may grow beyond the default root partition size (10G). An example Chef recipe to do this might look like:

service "mysql" do
  action :stop
end

execute "install-mysql" do
  command "mv /var/lib/mysql /mnt/mysql"
  not_if do FileTest.directory?("/mnt/mysql") end
end

directory "/mnt/mysql" do
  owner "mysql"
  group "mysql"
end

mount "/var/lib/mysql" do
  device "/mnt/mysql"
  fstype "none"
  options "bind,rw"
  action :mount
end

service "mysql" do
  action :start
end

We have to stop MySQL, move the directory, and restart MySQL. We use a bind mount so the configuration in /etc/mysql/my.cnf does not need to be changed. If we wanted to do that, there's additional configuration required.

Neither of these scenarios take into account the additional complexity required to manage the Debian system maintenance user set up in the MySQL package, or countless settings possible to set up MySQL tuning parameters, or database formats.

We're forced, here, to do extra work to skirt around problems created by the package management tool trying to be responsible for things outside of packages. The anti-pattern is exacerbated if we have to manage the package and installation on a different OS. Then, we'd have to redo the whole dance for another platform. If our package manager simply dropped the binaries/libraries off and we could handle this configuration directly and much in the configuration management, it would be much easier to manage in a heterogeneous environment.

Conclusion

Package management certainly has value! It allows system administrators to install a base OS image that gives all the hardware support and user-land well known and loved in Unix/Linux systems. When it comes to the application stack required by the business, custom configuration is often required. Package maintainers don't, and can't be expected to, imagine every possible custom configuration. Configuration management tools can, however, be used to cover any custom configuration, since that is their job.

After all, part of the Unix (and Linux) philosophy is that each program should do one thing well.

Further Reading

About the author

Joshua Timberman is a Technical Evangelist for Opscode. He has worked for a wide range of companies as a system administrator: from small company IT support to Enterprise web infrastructure delivery for Fortune 500 companies. He helps companies and individuals learn how to use Chef and the Opscode Platform. He wrote the majority of the Chef cookbooks Opscode publishes, teaches the Chef Fundamentals class, and speaks at user groups and conferences. He can be found as jtimberman on Twitter, Skype Freenode, GitHub and more, or via email joshua@opscode.com.

8 comments :

asq said...

given how many packages are broken (of course not in debian, but ie. rpm world sometimes is scary) i like to think that configuration management objects are superior over package's %pre-s and %post-s. first install rpm, then configure it (provide .conf from template, chkconfig, sysconfig, put it into HA cluster) via puppet. package is a type of puppet, after all :)

Unknown said...

The only problem here is that debian/ubuntu scripts are stupid - they should *not* start the server by default, or prompt interactively for configuration information on startup.

Unknown said...

Calling something "broken" or "stupid" because they don't be have the way an expert would prefer isn't really accurate. Debian behaves the way it does because it needs to work for beginners as well as experts that use configuration management. Sadly, these two groups don't always need the same thing.

But experts should know to alias their config mangagement installer to something like:
DEBIAN_FRONTEND=noninteractive /usr/bin/apt-get --yes -o DPkg::Options='--force-confold' -o DPkg::Options='--unpack' install ${pkg_list}
and then leverage their configuration managment system to do the configuration.

Expecting beginners to install a config management system, or tweak everything by hand would grind the adoption of the OS to a halt.

Unknown said...

Leaving beginners vs. experts out of this, in what way is starting a service without it being configured, and possibly without firewall protection not both broken *and* stupid?

Anonymous said...

What about the idea of packages the patch or modify other packages? At my work we use a package management system which allows that, I assume debian and redhat packages support similar functionality.

Then for example if you don't like the way the base package modifies a config script, you can include a package which 'patches' the base package and modifies the config file.

Obviously this sort of thing is dependent on the specifics of the packaging system you use. Our goal in my group is to never push a change outside of a package and we generally (but not always) succeed at that.

Thanks for the great article, this area is a particular interest of mine!

Tollef Fog Heen said...

Robin, if you don't want services started by default, just drop a file containing

#! /bin/sh
exit 101

into /usr/sbin/policy-rc.d. See the invoke-rc.d man page for more details.

I generally prefer my services to be running by default, since they tend to ship with reasonable configurations, and in the cases where they don't, you need to restart or reload the config when you put that in anyway.

As for firewalling, if you believe in firewalls, surely you have a default policy of drop, so not having the firewall configured for a service means it's not exposed to the world, not that it's open by default.

Potamianos Gregory said...

Great article!
There is a small error though. debconf-get-selections exists in debconf-utils package and not debconf-get-selections.

Zoredache said...

> in what way is starting a service without it being configured, and possibly without firewall protection not both broken *and* stupid?

Most packages that automatically start are not without a configuration. They generally come with a default somewhat safe config.

The novice probably probably simply try to start the service anyway using that default config if it wasn't auto-started.

The advanced users would almost certainly have a default deny firewall in place on both the host and at the border.

So the way it is 'not stupid' is simple pragmatism. Give the novices a somewhat secure working system after installing packages, leave the experts the tools to get what they need done.