December 17, 2015

Day 17 - Grokking systemd for Fun and Profit

Written by: Tyler Langlois (@leothrix)
Edited by: Ben Cotton (@funnelfiasco

That's right: this post is about the "s" word. We'll be looking at how to leverage systemd for great good, whether you're a wary user on a foreign system or intending to use these features extensively in your own infrastructure. I'd encourage readers to set aside any existing biases for the project and join me in exploring some of the underlying capabilities of these tools - because whether you agree with its design philosophy or not, understanding the project better can only serve to better the conversation.

Before we start throwing levers and pushing buttons, it behooves the wary sysadmin to get familiar with the logging system behind systemd: journald.

Dear Journal:

Like other systemd tools, journald is somewhat familiar yet functionally different from its traditional predecessors. Coming from a traditional /var/log/ background, some of these differences are particularly worth noting:

  • A simple tail on journal log files in /var/log/journal isn't sufficient - journald encodes stored logs in a binary format that includes supplemental metadata fields (and some "trusted" fields we'll get into later) which require use of the journalctl utility.
  • By default, journald will attempt to intelligently rotate old logs based upon disk storage availability: the daemon will use either up to 10% of available space or keep 15% available on the filesystem it logs to, whichever is smaller (incidentally, this also means logrotate is somewhat moot for journal files.) The man page for journald.conf explains disk usage logic in greater detail.
  • Log entries and the aforementioned metadata can be read with a straightforward pager and subjected to basic query logic (matching fields based on AND, OR, and so on) through journalctl.

Log metadata is stored within fields in a log entry. The following example demonstrates what a snippet of this structure looks like, taken from the logs of an sshd unit:

...snip...
MESSAGE=Server listening on 0.0.0.0 port 22.
_PID=226
_COMM=sshd
...snip...

Many of these fields are self-explanatory: the daemon emitted some text on stdout, which is the MESSAGE field. There are also two additional fields here prefixed with "_", indicating their status as "trusted" fields. This means their content has been derived from the supervising process and is outside the control of the process emitting the log in question - hence, the validity of their contents is "trusted".

Given that information, we can trust that the real process ID of the log originated from 226, and arose from the invocation of the sshd command.

However, this also implies we can control journal fields! It turns out that from within daemons, journald calls can be made that pass explicit values for fields. Note that most of the time, daemons will usually just spit to stdout and modify MESSAGE implicitly unless some special code is written to tell journald about extra fields (I'll use the ruby journald library to create fields in the following examples.) What kind functionality can we build on top of this with some simple code?

Based upon simple use cases I've played with, user fields have been particularly handy when looking to elevate logs that need to be seen by a human. For example, the following dead-simple ruby daemon listens for journal events that have a user field of NOTIFY_SLACK=1:

#!/usr/bin/env ruby

require 'systemd/journal'
require 'slack-notifier'

webhook_url = ENV['SLACK_WEBHOOK_URL']

j = Systemd::Journal.new
notifier = Slack::Notifier.new webhook_url,
                               channel: '#bots'

j.add_filter('NOTIFY_SLACK', '1')
j.seek(:tail)

j.watch do |log|
    notifier.ping "#{log.message} (/cc @tylerjl)"
end

I run this daemon on my personal server and make calls like the following within my personal tools when I need to them to ping me on Slack (again, example in ruby):

Systemd::Journal.message(
  message: "Singularity achieved!",
  notify_slack: '1'
)

This means I can watch for logs system-wide I've tagged for elevation into chat channels. Note also that filtering on a user field as the ruby code does here is a similar exercise from the command line:

journalctl NOTIFY_SLACK=1

Some journalctl flags are particularly useful additions to an admin's toolbox:

  • Live-tail the system's logs: journalctl -f
  • Examine all logs from the previous boot, starting at the end: journalctl -e -b -1
  • Run a check against all journald log files to verify file integrity: journalctl --verify

The Wild World of Unit Types

The .service unit is a friendly face: intended to represent persistent services and daemons, it's pretty self-evident what happens when one invokes systemctl start sshd.service.

However, systemd units model more than persistent or oneshot daemons. You've probably heard of a couple of these:

  • timer - activates associated service units based on monotonic or realtime intervals.
  • socket - listens for data and passes it along or explicitly starts other units.
  • path - watches directories and files to activate other units based on inotify-driven events.
  • target - analagous to SysV runlevels, these permit you to create your own arbitrary targets to organize groups of other units.

Consider the path unit. This unit type can create fairly straightforward relationships between inotify events and service units. One use case for this could be a path unit that watches a directory for new backups to appear, and activates another oneshot service unit to archive newly appeared files to S3 or another remote endpoint.

The timer unit, in particular, has a familiar use case: triggering events based on a schedule. Timers typically replace crontabs in systemd-centric environments when paired with oneshot services. There are general differences here, including calendar formatting and the introduction of persistent and monotic timers. The reference under man systemd.timer is a sufficiently detailed guide for reference.

If you're among the jaded rank-and-file of ops people just waiting for things to fail, one noteworthy option in this use case is OnFailure=. I've often found that introspection into the success and failure of cron jobs to be fairly opaque; the default email behavior can be difficult to leverage effectively. This option enables a degree of reporting for units that may experience problems.

Consider this example oneshot unit that checks the status of a ZFS pool (if you haven't tried ZFS on Linux yet, you totally should):

$ cat /etc/systemd/system/zfs-check.service
[Unit]
Description=check result of last ZFS scrub
OnFailure=failure-report@%n.service

[Service]
Type=oneshot
ExecStart=/usr/bin/sh -c 'zpool status | grep "No known data errors"'

This service could tentatively be triggered weekly by an associated timer. Notice the OnFailure= option. That service file could look something like this:

$ cat /etc/systemd/system/failure-report@.service
[Unit]
Description=failure notification for %i

[Service]
Type=oneshot
ExecStart=/usr/local/bin/unit-notify.rb %i

The "@" in the unit name means this unit can be instantiated with different suffixes. In this case, we're invoking the unit as failure-report@%n.service from the ZFS unit, which translates %n into the unit name, zfs-check. The unit-notify.rb script can then be something as simple as:

$ cat /usr/local/bin/unit-notify.rb
#!/usr/bin/ruby

Systemd::Journal.message(
  message: "Unit #{ARGV[0]} has failed!",
  notify_slack: '1'
)

Coupled with the previously mentioned journald daemon, notifications for failed units can be fairly streamlined - just add OnFailure=failure-report@%n.service and go. When running systems at greater scale, this could also be applied to calling webhooks to gather unit activity in aggregate.

Into the Belly of the Beast: dbus

There may come a time in the course of dev-ing your ops when you need more finely-grained programmatic access to systemd. In such cases, although unit files are usually available on the filesystem, more detailed information regarding system state can often be acquired through dbus.

The documentation for the systemd dbus API is fairly comprehensive and outlines all of the capabilities of this interface to systemd. In a nutshell:

  • methods can be called from the API to perform certain actions, such as reloading a unit or rebooting the host machine.
  • signals can be subscribed to in code to trigger actions - think interrupts for your init system.
  • properties represent settings or traits about units, the host system, and other systemd objects.
Amid all these sources of data, there are rules and interfaces for accessing them. One thing to bear in mind when coding against dbus APIs is that they can get a litle wordy. Consider this code snippet that prints the environment for the sshd service:

from dbus import SystemBus, Interface

bus = SystemBus()
systemd = bus.get_object('org.freedesktop.systemd1',
                         '/org/freedesktop/systemd1')
manager = Interface(systemd,
                    dbus_interface='org.freedesktop.systemd1.Manager')
unit_path = manager.LoadUnit('sshd.service')
unit_proxy = bus.get_object('org.freedesktop.systemd1', str(unit_path))
unit_props = Interface(unit_proxy,
dbus_interface='org.freedesktop.DBus.Properties')
for env_var in unit_props.Get('org.freedesktop.systemd1.Service', 'Environment'):
  print(env_var)

There's a great deal of flexibility here, especially when you peruse the list of interfaces on the API documentation - as long as you're willing to slog through some fairly verbose setup. Here are some use cases for the dbus API that I've personally found useful:

  • Subscribing to the PropertiesChanged signal of a unit to watch for unit state changes.
  • Bringing custom BusNames online to handle special dependency management. For example, a custom service may successfully fork but may be unable to serve a dependent service until it verifies itself as healthy, in which case you can use the BusName= service option to watch for the green light before indicating a service's dependent units should proceed.
  • Obtaining a list of all units currently loaded, running, or failed on the system.

Bonus Round: Ops Grab Bag!

If nothing else, I hope that you may find something here that will make your life easier when working with systemd. With that in mind, Consider these tips when you're in the trenches:

  • Need to quickly override a service option that shipped with your distro's service file? Use systemctl edit foo.service to drop into an editor and create an override file that will be automatically written to /etc/systemd/system/foo.service.d/override.conf. The override file's options has precedence over the stock unit.
  • Need to express that nginx.service requires php-fpm.service to be running? Run systemctl add-requires nginx.service php-fpm.service and the appropriate symlink will be created for you in /etc/systemd/system/.
  • Need to watch the log files for a service? Live-tail a service's journal with journalctl -u foo.service -f.

Conclusion

I do hope that this has been an informative tour of some of the practical considerations to bear in mind when working with systemd. The jury is still out on whether systemd will destroy the planet or usher in a thousand-year peace between all distributions, but a little extra knowledge never hurts.

I'd like to thank Ben Cotton for generously editing this piece, and for the SysAdvent organizers being a delightful group of people to work with. May your pager be silent and holiday be downtime-free!

No comments :