December 10, 2012

Day 10 - Packages Doing Too Much?

This was written by Miah Johnson.

Before we had configuration management tools our UNIX vendors added similar functionality to their package managers. For example, the ability to run arbitrary scripts during a package installation enabled automatically starting the new service, generating a configuration, creating a default database, adding users, and an infinite assortment of possibilities.

Of course, packages have other useful features like file checksums, dependency resolution, metadata, cryptographic signatures. Again, these are features that are typically provided by our configuration management tool.

Because of these overlapping features, we have race conditions. Imagine the situation where we want to manage collectd with chef. Our goal is a fully configured collectd instance being supervised by runit.

The chef package provider is basically going to run this command:

apt-get install collectd

Whats wrong with this? Well, let's take a look inside the collectd-core package:

[~/tmp/packages]$: ls
collectd-core_4.10.1-2.1ubuntu7_amd64.deb

[~/tmp/packages]$: dpkg -e collectd-core_4.10.1-2.1ubuntu7_amd64.deb
[~/tmp/packages]$: ls DEBIAN/
conffiles       control         postinst        prerm
config          md5sums         postrm          templates

Above, we've extracted the control-information from the collectd-core package. You can see we have some files prefixed with post and pre. These are scripts that will be run when the corresponding action is run. With that in mind, we will take a look at postinst (I've removed the comments and blank lines for brevity)

set -e
. /usr/share/debconf/confmodule
case "$1" in
    configure)
        db_get collectd/auto-migrate-3-4
        if [ "$RET" = "true" ]; then
            tmpdir=`mktemp -dt collectd.XXXXXXXXXX`
            hostname=`hostname`
            if [ -z "$hostname" ]; then hostname="localhost"; fi
            cp -a /var/lib/collectd/ /var/backups/collectd-"$2"
            /usr/lib/collectd/utils/migrate-3-4.px \
                --hostname="$hostname" --outdir="$tmpdir" | bash
            rm -rf /var/lib/collectd/
            mkdir /var/lib/collectd/
            mv $tmpdir /var/lib/collectd/rrd
            chmod 0755 /var/lib/collectd/rrd
            # this is only available on Solaris using libkstat
            rm -f /var/lib/collectd/rrd/$hostname/swap/swap-reserved.rrd
        fi
    ;;
    abort-upgrade|abort-remove|abort-deconfigure)
    ;;
    *)
        echo "postinst called with unknown argument \`$1'" >&2
        exit 1
    ;;
esac
db_stop
if [ -x "/etc/init.d/collectd" ]; then
  if [ ! -e "/etc/init/collectd.conf" ]; then
    update-rc.d collectd defaults 95 >/dev/null
  fi
  invoke-rc.d collectd start || exit $?
fi
exit 0

As you can see, the script runs migrate-3-4.px which upgrades RRDs and various other files. It also starts collectd via the included init script and adds it to the init system for automatic startup. These scripts are fine, usually, if you're not managing your system with a configuration management tool. If you are running a CM then you've just discovered some code you need to be aware of and either disable or work around.

Because dpkg and apt-get lack a good way to disable these scripts, they will be run when you install or remove packages. There are, of course, other ways to remove them, but its going to add to your backlog. Additionally, you will have to make your CM tool handle disabling that init system and stopping the current instance of collectd. Hopefully, this doesn't change to often, but its entirely possible that collectd may switch to init systems (say, to upstart) and change how you disable and stop it.

It seems like you would be better off managing the package yourself - rather than fighting upstream decisions. In the long run, managing packages yourself will guarantee the software meets your exact needs. You can version it appropriately, customize build options, properly integrate it with your CM tool, and appropiately test each package.

As I think more about this problem, I realize that most UNIX distributions are doing operations a disservice. The common UNIX install is full of assumptions about how you're going to use it and configure it. It's difficult to customize without a series of DSL wrappers linked into your CM.

Why does it have to be so difficult? How is a UNIX wrapped in a CM different from its variants? Once you've abstracted your infrastructure into code, what is the difference between Ubuntu and Redhat or Solaris? The feature variation between Redhat and Ubuntu is basically non-existent once you're using a CM.

Linux is Linux. The other variations are assumptions about how you want to use the system. In reality, we want a CM to manage all of the resources installed on the system. We want the ability to choose how each piece is deployed, upgraded, configured, logged, etc.

What is the point of using a CM tool if it is only managing a small percentage of the system? Ultimately infrastructure management is a problem waiting to be programmed away. We need to continue thinking about this as we build new services, because ultimately everything I've talked about regarding packages appears roughly as a waste of effort. Think of all the energy that goes into the collectd postinst script above, and add the work you have to do to essentially revert all the actions done by that script - wasting effort!

There are, of course, some ideas on how to do this. I think its something we all need to start exploring, but a good starting point would be NixOS.

I think Nix, and systems like it, are going to be our 'third generation' configuration management systems. This idea was proposed before Nix, but I don't think it received as much attention. You may have forgotten about rPath and their Conary project.

Recently the GNU project forked the Nix package management to investigate using Scheme as the internal configuration language (rather than bash). Their project is called Guix.

9 comments :

Jan-Erik said...

And that's why I love pacman so much (the package manager in Arch Linux).

It installs the package and not much more (except updating icon databases for GUI programs) but it will never ever start any daemon on its own.

With programs default-configured to listen on 0.0.0.0 this becomes a security risk, too.
Managing all my software packages myself is not really an option either.

Anonymous said...

Miah you're asking some good questions. Pay no attention to anonymous trolls.

Unknown said...

I think the article is interesting. As the Crowbar http://crowbar.github.com team works on abstracting the CMs and providing a multi-os automation framework, we have to struggle with these issues. Makes me think that it's no longer single systems that are the snowflakes to be melted, but entire clusters of services and applications.

Anonymous said...

While I overall like packaging systems in each OS the more I do in Operations the more I end up agreeing with this post. Already in my Puppet configs I end up having to roll my own packages/tarballs anyway simply because the default packages assume WAY too much about my environment. It'd be nice if we could pull in some more features from Gentoo's system (being able to enable/disable/set variables for all packages system-wide and still use the updater) without it being quite as horrific long-term as Gentoo can be (sorry, I ran Gentoo for three years on a few boxes and while fun "emerge world" never ceased to be scary).

John M. said...

@Anonymous: Please enlighten us. I would like to hear a rebuttal on the topic that is more detailed than 'awful'.

Please explain why these scripts exist, and how a CM can be leveraged to use them effectively and economically.

Miah said...

@Jan-Erik

Indeed. I was AUR maintainer for puppet for a short time and did some work with the ArchServer project. Pacman definitely comes to mind when I think 'simple packaging system with integrated builds'.

I have built and maintained too many RPM's and the Pacman system is just much simpler to understand and get started in. There aren't tons of undocumented macro's to go learn either =)

@Judd

Crowbar looks really interesting. I haven't had a chance to take a look at it yet. It looks like a interesting alternative to Cobbler (http://cobbler.github.com/)

As much of my experience lately is dealing with 'cloud' provisioned systems I haven't had a direct need for working directly with the hardware. But either of these tools would be a huge help in large environments. Especially those that haven't already gained traction with configuration management.

Jordan Sissel said...

+1 all this.

Pretty much every time I find myself hating debian or redhat, it's because I'm actually trying to use them for a purpose.

Providing config management via 'apt-get' and post-install scripts is weak and fragile. How many people do 'apt-get install apache' and go "OK, now my apache server is totally ready to do things for my business!" without configuring it at all? Nobody, right?

The last time I tried to uninstall collectd, it failed because the collectd service wasn't running, because puppet already shut it down in preparation for removing the package. That is dumb.

Speaking of dumb, here's a fun hack I made to strip scripts (postinst, etc) from debian packages at install-time using an Apt hook - https://gist.github.com/1729559 - warning, though, because debian does dumb things (requires postinst scripts to function) with python packages, using this will break python. I'd love to see this expanded to 'allow' the necessary-but-dumb things (debian's pycentral crap) but stripa ll the really dumb things (initialize mysql on install).

Ryan Miller said...

This looks to me just like the ongoing maturation of the discipline, with consequent differentiation. At some point, desktop Linux distros (or even just package groups within a distro) diverged from servers. Now we need distros (or again, packagegroups/installers) that differentiate between standalone servers (for playing at home, a single VPS, etc) and servers with config management. The --with-cm install switch would turn off all of the daemon starts, firewall rule changes, database initializations/upgrades, etc.

And as those plugging pacman (which I haven't personally used) seem to indicate, like with desktop v server, it's more of a continuum than a binary. Pacman may be ahead of RHEL/RPM, but RHEL/RPM is waaaay ahead of Debian/DPKG (and derivatives, like Ubuntu). RHEL applies many fewer and simpler patches to its sources (making rebuilds easier) and does less intrusive stuff with its included scripts.

Kris said...

It's funny how anno 2012 people mostly complain about packages starting services by default because of configuration management pains and not because of high availability constraints.

No I don't want to launch mysql by default on that node .. I wan't pacemaker to manage where a resource is running..

People seem to have moved to cfgmmgmt and forgottn about real it issues ..