December 3, 2014

Day 3 - So Server, tell me about yourself. An introduction to facter, osquery and sysdig

Written by: Gareth Rushgrove (@garethr)
Edited by: Hugh Brown (@saintaardvark)

Introduction

Linux and Unix have always had powerful, low level tools capable of telling you exactly what your computer system is doing (strace, DTrace, systemtap, top, ps). But these tools often have complex user interfaces, and platform differences, that mean not everyone has the time to master them. This article is all about several new tools that aim to not just be powerful debugging tools, but to provide a pleasant user interface too.

Facter is a simple inventory application providing a single, cross-platform interface to a range of structured data about your system. Everything; from network interfaces to available hardware and operating system version is available.

osquery is a new open source tool from Facebook that exposes low level details of your system via a familiar SQL interface. Want to query for processes listening on a given network interface? Or for services that launch at startup? This is the tool for you.

Sysdig is another open source tool for system level exploration and tracing that aims aiming at being both powerful and easy to use. Sysdig focuses on tools to help answer real-time issues.

Installation

I’m running all of the following on an Ubuntu 14.04 virtual machine, but you should be able to find the installation commands for your favourite distribution too. As for supporting other operating systems: Facter also runs on Windows and OS X; osquery also runs on OS X; and Sysdig is Linux only.

Facter

Facter has been around for a while (it’s a core part of Puppet), and is included in lots of distribution repositories already. However, for this walkthrough, we’re going to use the preview version of facter.

First let’s install the official Puppet labs repositories:

wget https://apt.puppetlabs.com/puppetlabs-release-trusty.deb
sudo dpkg -i puppetlabs-release-trusty.deb

Next let’s install the nightly build repository for facter. Note that the repository and package are called cfacter to allow it to be installed alongside the stable version of facter.

cd /etc/apt/sources.list.d
sudo wget http://nightlies.puppetlabs.com/cfacter-latest/repo_configs/deb/pl-cfacter-latest-trusty.list

For the curious, or those wanting to use a different operating system. feel free to read -up on the nightly repositories.

osquery

osquery is quite new, and packages aren’t available just yet – so we’ll need to compile from source. First let’s download the latest release:

wget https://github.com/facebook/osquery/archive/1.1.0.tar.gz
tar -zxvf 1.1.0.tar.gz
cd osquery-1.1.0

And then we’ll install its dependencies and compile the osquery tools. This will take a little while but I promise it will be worth it.

sudo make deps
sudo make
sudo make install

For the full installation instructions see the osquery wiki.

Note that if you want to use osquery for anything more than a quick demo you could create your own package using the makefile.

make package

The resulting system package (Ubuntu or Centos at the moment) can then be used to install the binaries without needing to compile everywhere.

Sysdig

Sysdig handily provide a one-line installer which detects your operating system and installs the relevant packages:

curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash

If you would rather do that manually then full installation instructions are available.

Usage

Facter

Facter is the most straightforward of the three tools we’re taking a look at. When run, it simply outputs structured information about the host, collected from various other tools or the operating system itself. This can be hugely useful if you’re on a machine and want to know everything quickly – but it’s also useful if you’re using an unfamiliar operating system, as it provides a single way of accessing lots of information quickly.

The quickest way of understanding this is just to run it:

cfacter | head -n 20

Feel free to leave out the pipe to head if you’re running locally.) The output is over 100 lines long, and looks something like this:

cfacterversion => 0.2.0
disks => {
  sda => {
    model => "VBOX HARDDISK",
    size => "40.00 GiB",
    size_bytes => 42949672960,
    vendor => "ATA"
  }
}
dmi => {
  bios => {
    release_date => "12/01/2006",
    vendor => "innotek GmbH",
    version => "VirtualBox"
  },
  board => {
    manufacturer => "Oracle Corporation",
    product => "VirtualBox"
  },
  chassis => {

You can see that I’m running this on a VirtualBox virtual machine with a 40GB hard drive.

Facter supports other output formats too, including JSON and YAML. For instance you can run:

cfacter -y | head

And you’ll receive YAML:

cfacterversion: 0.2.0
disks:
  sda:
    model: VBOX HARDDISK
    size: 40.00 GiB
    size_bytes: 42949672960
    vendor: ATA
dmi:
  bios:
    release_date: 12/01/2006

Facter also supports returning just a single value, so if you know the name of the fact you want to check you can simply ask for that. For instance:

cfacter ruby.version
1.9.3
cfacter os.distro.codename
trusty

As well as the large number of facts provided out-of-the-box on a range of operating systems, facter also allows for writing your own facts. A very simple example might be exposing the version of python to facter. First write a script that outputs a simple key=value pair. Save the following as /etc/facter/facts.d/python-version.sh.

#!/bin/bash
var=$(python --version 2>&1)
echo "python_version=$var"

Now we can ask for the python_version fact (the name of our key in the script above) like so:

cfacter --external-dir /etc/facter/facts.d/ python_version
Python 2.7.6

Facter has a number of different ways of extending it and any custom facts from previous versions of Facter should work with the new implementation.

osquery

osquery services a similar purpose to Facter, providing a universal interface for information on a machine. osquery presents information about the system as tables, which can be queried via SQL. The information being queried tends to return a dynamic list of results – for instance, the users present on a machine or the host entries in the local hosts file. Again, here’s a quick example:

echo "SELECT * FROM etc_hosts;" | osqueryi

This will output something like the following:

+-----------+----------------------------+
| address   | hostnames                  |
+-----------+----------------------------+
| 127.0.0.1 | localhost                  |
| ::1       | ip6-localhost ip6-loopback |
| fe00::0   | ip6-localnet               |
| ff00::0   | ip6-mcastprefix            |
| ff02::1   | ip6-allnodes               |
| ff02::2   | ip6-allrouters             |
| ff02::3   | ip6-allhosts               |
+-----------+----------------------------+

Let’s change the host entries on our machine and rerun the query:

echo "127.0.0.1 testingosquery" | sudo tee -a /etc/hosts
echo "SELECT * FROM etc_hosts;" | osqueryi

Now you’ll see something like:

+-----------+----------------------------+
| address   | hostnames                  |
+-----------+----------------------------+
| 127.0.0.1 | localhost                  |
| ::1       | ip6-localhost ip6-loopback |
| fe00::0   | ip6-localnet               |
| ff00::0   | ip6-mcastprefix            |
| ff02::1   | ip6-allnodes               |
| ff02::2   | ip6-allrouters             |
| ff02::3   | ip6-allhosts               |
| 127.0.0.1 | testingosquery             |
+-----------+----------------------------+

The above examples use the osqueryi tool which can take a query on stdin and return the results. You can also run osqueryi on it’s own and open an osquery SQL shell.

osqueryi

With the shell open, lets build a more complex query by joining together two tables.

osquery> SELECT
    ...> u.username,
    ...> g.groupname
    ...> FROM users as u
    ...> JOIN groups as g ON u.gid = g.gid
    ...> LIMIT 10;

This should produce something like the following:

+----------+-----------+
| username | groupname |
+----------+-----------+
| root     | root      |
| daemon   | daemon    |
| bin      | bin       |
| sys      | sys       |
| sync     | nogroup   |
| games    | games     |
| man      | man       |
| lp       | lp        |
| mail     | mail      |
| news     | news      |
+----------+-----------+

osquery supports a large and growing number of tables – everything from arp_cache and bash_history, to crontab records and kernel_modules. It’s also possible to write your own tables if you’re happy getting your hands into the code.

osquery also supports a long-running daemon process called osqueryd; this allows for scheduling queries for execution across your infrastructure, aggregating the results over time and generating logs of any changes in state.

Sysdig

Whereas Facter and osquery are predominantly about querying infrequently changing information, Sysdig is much more suited to working with real-time data streams – for example, network or file I/O, or tracking errors in running processes.

Here’s a few examples. First let’s watch for any operations that open the /etc/hosts file:

sudo sysdig evt.type=open and fd.name contains /etc/hosts

Now in another tab or ssh session, open the /etc/hosts file with vim or other editor of choice:

vim /etc/hosts

This should output something like the following:

3145 12:05:06.477169760 1 vim (3835) < open fd=3(<f>/etc/hosts) name=/etc/hosts flags=1(O_RDONLY) mode=0

Here we can see that vim made an open syscall to the /etc/hosts files.

Let’s do something a bit more practical: we’ll look for any I/O calls that have a latency greater than 1ms. This would be useful if you were tracking down certain kinds of performance issues:

sudo sysdig -c fileslower 1

fileslower is what’s called a Chisel, you can find out more on the Chisel user guide or read this tutorial about how to write your own.

In another tab or session let’s run a command that should trigger a bit of I/O. We’ll use these packages in the next example too.

sudo apt-get install nginx apache2-utils -y

On the virtual machine I ran this on this resulted in the following.

2014-11-28 12:10:02.821 dpkg         read           1 /var/lib/dpkg/info/linux-headers-3.13.0-34.list
2014-11-28 12:10:04.517 mandb        read           1 /usr/share/man/man5/resolv.conf.5.gz
2014-11-28 12:10:06.323 apt-get      read          15 /var/cache/apt/srcpkgcache.bin

The output here is showing any files where the I/O latency was greater than 1ms. Each line shows the binary (apt-get, dpkg, etc.), the action (read in this case) and the latency (1ms or 15ms). If you were using sysdig to debug a real performance problem this kind of information should be much more useful.

I mentioned above that we’d make use of the nginx and apache-utils packages for our next example. Let’s watch all the events related to requests served by nginx in real time.

sudo sysdig -A -c echo_fds proc.name=nginx

And again in another tab or session, let’s run apache bench to generate some traffic against our local nginx web server.

ab -n 5 -c 1 http://localhost/

This should output something like the following:

------ Read 77B from 127.0.0.1:58089->127.0.0.1:80

GET / HTTP/1.0
Host: localhost
User-Agent: ApacheBench/2.3
Accept: */*

------ Write 241B to 127.0.0.1:58089->127.0.0.1:80

HTTP/1.1 200 OK
Server: nginx/1.4.6 (Ubuntu)
Date: Fri, 28 Nov 2014 12:17:02 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 04 Mar 2014 11:46:45 GMT
Connection: close
ETag: \"5315bd25-264\"
Accept-Ranges: bytes

------ Write 90B to /var/log/nginx/access.log

127.0.0.1 - - [28/Nov/2014:12:17:02 +0000] \"GET / HTTP/1.0\" 200 612 \"-\" \"ApacheBench/2.3\"

Note that we’re seeing the request, the response and the log lines being written - all from the same command and all in real time. Imagine how useful that would be when debugging a production web server.

The folks behind Sysdig provide lots of examples which gives you an idea of all the possibilities: from watching the behaviour of particular users, to tracking busy processes, to recording users of a specific application.

Conclusions

Hopefully these quick examples have given you an insight into three useful tools and into why you might want them around when you have a problem. All three of these tools present lots of opportunities for integration with your monitoring or configuration management framework.

No comments :