December 21, 2016

Day 21 - Reusable Application Packaging With Habitat

Written by: Joshua Timberman (@jtimberman)
Edited by: Dan Webb (@dan_webb)

Introduction

Habitat by Chef is a framework for building, deploying, and running any kind of application. Chef’s blog has a good introductory series on the concepts behind Habitat, and the Habitat tutorial is a good place to start learning. In this post, I’m going to take a look at how to integrate Habitat with Chef to deploy and run a package we’ve created for an example application.

The sample application is Heroku’s “ruby-rails-sample.” This application is a simple example, but it does make a database connection, so we can see how that works out, too. The Habitat organization on GitHub has a forked repository where we’ve made changes to add Habitat and integration with Chef and Chef Automate. Let’s take a look at those changes.

Habitat Background

First, we have created a habitat directory. This where the Habitat configuration goes. In this directory, we have:

  • config
  • default.toml
  • hooks
  • plan.sh

The config directory contains handlebars.js templates for the configuration files for the application. In this case, we only have the Rails database.yml. It looks like this:

default: &default
adapter: postgresql
encoding: unicode
pool: 5

production:
<<: *default
database: {{cfg.database_name}}
username: {{cfg.database_username}}
password: {{cfg.database_password}}
host: {{cfg.database_host}}
port: {{cfg.database_port}}

The next part of the Habitat configuration is the default.toml. This file contains all the default variables that will be used to configure the package. These variables are accessible in hooks, and the templates in config. The configuration file above replaces {{cfg.VARIABLE}} with the value from the default.toml. These values can also be dynamically changed at run time. The default.toml looks like this:

rails_binding_ip = "0.0.0.0"
rails_port = 3000
database_name = "ruby-rails-sample_production"
database_username = "ruby-rails-sample"
database_password = ""
database_host = "localhost"
database_port = 5432

The hooks directory contains shell scripts that we call “hooks”. For the ruby-rails-sample application, we have two hooks, init, and run. The init hook is used to initialize the application. It looks like this:

##!/bin/sh
rm -rf {{pkg-svc_static_path}}/*
cp -a {{pkg.path}}/static/* {{pkg.svc_static_path}}
cp {{pkg.svc_config_path}}/database.yml {{pkg.svc_static_path}}/config/database.yml
export GEM_HOME="{{pkg.svc_static_path}}/vendor/bundle/ruby/2.3.0"
export GEM_PATH="$(hab pkg path core/ruby)/lib/ruby/gems/2.3.0:$(hab pkg path core/bundler):$GEM_HOME"
export LD_LIBRARY_PATH="$(hab pkg path core/gcc-libs)/lib"
export PATH="$PATH:{{pkg.svc_static_path}}/bin"
export RAILS_ENV="production"
chown -R hab:hab {{pkg.svc_static_path}}
cd {{pkg.svc_static_path}}
exec 2>&1
if [[ ! -f {{pkg.svc_static_path}}/.migrations_complete ]]; then
echo "Running 'rake bootstrap' in ${PWD}"
exec chpst -u hab bin/rake bootstrap && touch {{pkg.svc_static_path}}/.migrations_complete
fi

Hooks are still templates, like the configuration file database.yml. The values that come from the {{pkg.VARIABLE}} variables are set by the package and are fully documented. To initialize the application, we are going to remove the existing deployed version and copy the new version from the package to the “static” path. This is because we treat the extracted package as immutable. We copy the config file from the service config directory to the static path’s config directory because of how Rails looks for the database.yml file. Then, we ensure that the entire application is readable by the application runtime user, hab. Then, if we haven’t completed a database migration, we do that.

Next, we have a run hook because in order to start our application we need to set some environment variables so Rails knows where to find Ruby and the gems.

##!/bin/sh
export GEM_HOME="{{pkg_svc_static_path}}/vendor/bundle/ruby/2.3.0"
export GEM_PATH="$(hab pkg path core/ruby)/lib/ruby/gems/2.3.0:$(hab pkg path core/bundler):$GEM_HOME"
export LD_LIBRARY_PATH="$(hab pkg path core/gcc-libs)/lib"
export RAILS_ENV="production"

cd {{pkg_svc_static_path}}

exec 2>&1
exec chpst -u hab ./bin/rails server -b {{cfg.rails_binding_ip}} -p {{cfg.rails_port}}

Rails itself doesn’t support dropping privileges, so we use the chpst command to run the application as the hab user.

Next, we have the plan itself, plan.sh. This is a shell script executed by Habitat’s build script. All the gory details of plans are documented in great detail, so I’ll cover the highlights here. The plan itself is a Bourne Again Shell script that contains metadata variables that start with pkg_, and callback functions that start with do_. There are default values in the plan build script if your plan does not specify anything or override the functions. You can view the full plan.sh in the GitHub repository.

First, we want to ensure that we execute the hooks as root by setting the pkg_svc_user and pkg_svc_group variables. The reason for this is because the init hook needs to create files and directories in a privileged root directory where the service runs.

pkg_svc_user="root"
pkg_svc_group=$pkg_svc_user

Habitat packages are built in a “cleanroom” we call a “studio.” This is a stripped down environment that isn’t a full Linux distribution - it has enough OS to build the package. We rely on specifying the dependences to build as pkg_build_deps. As such, many application build scripts may assume that /usr/bin/env is available. Rubygems are an example of this in their native extensions. We cannot possibly know what any arbitrary Rubygem is going to do, however. Our first callback function we override is do_prepare(), where we make a symlink for the Habitat core/coreutils package’s bin/env command to /usr/bin/env if that does not exist. The symlink is removed after the package is built in do_install, later.

do_prepare() {
  build_line "Setting link for /usr/bin/env to 'coreutils'"
  [[ ! -f /usr/bin/env ]] && ln -s "$(pkg_path_for coreutils)/bin/env" /usr/bin/env
  return 0
}

The next function in the plan.sh is do_build(). Many software packages in the open source world are built by doing ./configure && make, and the default do_build() function in Habitat does that as a “sane default.” However, Ruby on Rails applications are built using bundler, to download all the Rubygem dependencies required to run the application. Habitat packages have their own runtime dependencies, specified with pkg_deps, and these packages are isolated away from the underlying OS in the /hab directory. This means we need to tell the build script where to find all the libraries we’re going to need to install the Rails application bundle. This includes any Rubygems that install native extensions, such as nokogiri or the PostgreSQL client, pg. The full, commented version is on GitHub.

do_build() {
  # solve compiling nokogiri native extensions!
  local _libxml2_dir=$(pkg_path_for libxml2)
  local _libxslt_dir=$(pkg_path_for libxslt)
  export NOKOGIRI_CONFIG="--use-system-libraries --with-zlib-dir=${_zlib_dir} --with-xslt-dir=${_libxslt_dir} --with-xml2-include=${_libxml2_dir}/include/libxml2 --with-xml2-lib=${_libxml2_dir}/lib"
  bundle config build.nokogiri '${NOKOGIRI_CONFIG}'
  bundle config build.pg --with-pg-config="${_pgconfig}"
  bundle install --jobs 2 --retry 5 --path vendor/bundle --binstubs
}

The next callback function we define is do_install(). Similar to do_build(), most open source software out there does their installation with make install. This isn’t the case with our Rails application, so we need to define our own function. The intent here is to install the content into the correct prefix’s static directory so we can create the artifact. We also need to ensure that any binaries shipped in the application use the correct Ruby by replacing their shebang line. Finally, we cleanup the symlink created in do_prepare().

do_install() {
  cp -R . "${pkg_prefix}/static"
  for binstub in ${pkg_prefix}/static/bin/*; do
    [[ -f $binstub ]] && sed -e "s#/usr/bin/env ruby#$(pkg_path_for ruby)/bin/ruby#" -i "$binstub"
  done
  if [[ $(readlink /usr/bin/env) = "$(pkg_path_for coreutils)/bin/env" ]]; then
    rm /usr/bin/env
  fi
}

Continuous Integration/Delivery

With the contents of our habitat directory in place, we’re ready to build the package. We do this using the “Habitat Studio”. While an application artifact can be built anywhere Habitat runs, we strongly recommend doing this in an automated CI/CD pipeline. In the case of this project, we’re going to automatically build the package and upload it to a Habitat package depot using Chef Automate. The .delivery directory in the project contains a Chef Automate “Build Cookbook” and a configuration file. This cookbook is run on worker “build” nodes in Chef Automate. For this project, we wrap the habitat-build cookbook, which does the heavy lifting. In .delivery/build-cookbook, it defines a dependency on habitat-build:

depends 'habitat-build'

This allows us to include the habitat-build cookbook’s recipes in the various phases within the pipeline. The phases we’re interested in are:

  • Lint: Check that we’re using good shell script practices in our plan.sh using shellcheck.
  • Syntax: Verify that the script is valid Bash with bash.
  • Publish: Build the artifact and upload it to a Habitat Depot.

Each of these is a recipe in the project’s build cookbook. Importantly, we use the publish recipe in ./delivery/build-cookbook/recipes/publish.rb:

include_recipe 'habitat-build::publish'

This recipe is in the habitat-build cookbook. When the build node runs the publish recipe, it loads the origin key from an encrypted data bag on the Chef Server used with Automate. Then, it executes the hab studio build command with an ephemeral workspace directory.

execute 'build-plan' do
  command "unset TERM; HAB_ORIGIN=#{origin} sudo -E #{hab_binary} studio" \
          " -r #{hab_studio_path}" \
          " build #{habitat_plan_dir}"
  cwd node['delivery']['workspace']['repo']
  live_stream true
end

This builds the package according to the plan.sh, and creates a “results” directory that has the output Habitat artifact (.hart file) and a file with informataion about the build. The recipe loads that file and stores the content in a data bag on the Chef Server, and it uploads the package to a Habitat Depot - the publicly available one at app.habitat.sh.

execute 'upload-pkg' do
  command lazy {
    "#{hab_binary} pkg upload" \
    " --url #{node['habitat-build']['depot-url']}" \
    " #{hab_studio_path}/src/results/#{artifact}"
  }
  env(
    'HOME' => delivery_workspace,
    'HAB_AUTH_TOKEN' => depot_token
  )
  live_stream true
  sensitive true
end

Deployment

Once we have a package, it’s time to deploy it. Generally speaking, this is as simple a matter as installing Habitat on a system, and then running hab start delivery-example/ruby-rails-sample. Of course, we want to automate that, and we do so in our Chef Automate pipeline. After the publish phase are provision and deploy phases where we provision infrastructure - an EC2 node in this case - and run Chef on it to deploy the application. In our project, the .delivery/build-cookbook has the provision and deploy recipes to handle this - it’s outside the scope of habitat-build. Those recipes use chef-provisioning, but one could also write a recipe that uses Terraform or some other kind of provisioning tool. The recipe used for this is actually in the cookbooks/ruby-rails-sample at the top of the repository. This cookbook uses the habitat cookbook, which provides resources for installing Habitat, installing Habitat packages, and enabling Habitat services.

In the metadata.rb:

depends 'habitat'

There is only a default.rb recipe in the ruby-rails-sample cookbook. First, it loads details about the database that it connects to:

database_details = {
                    'host' => '',
                    'username' => '',
                    'password' => ''
                   }

The astute reader will note that these are empty values. For now, we don’t have any connection information handling here because these are secrets and should be managed appropriately. Previous iterations of this cookbook used a hardcoded plain text password, so we’re going to move away from that in a later version. For now, let’s step through the recipe.

Habitat is good about managing the application. However, there are still things we need a configuration management system to do on the node(s) that run the application. Notably, we’re going to ensure the hab user and group required to run the application are present.

execute('apt-get update') { ignore_failure true }

package 'iproute2'

group 'hab'

user 'hab' do
  group 'hab'
  home '/hab'
end

Next, we need to ensure that Habitat itself is installed, and that the application’s package is installed as well. Not only do we want them installed, but we want them to be the latest version available. We’re doing this in a continuous delivery pipeline, so we will assume everything is tested in an acceptance environment before it gets delivered. Right? :-)

hab_install 'habitat' do
  action :upgrade
end

hab_package 'delivery-example/ruby-rails-sample' do
  action :upgrade
end

Next, we’re going to manage the runtime configuration file for the application. Remember earlier we had the default.toml? That file contains default values. We actually want to modify the runtime, so we do that with a user.toml file in the service directory. Habitat creates this directory by default when it starts the application, but we need to make sure it exists first so the application starts properly the first time.

directory '/hab/svc/ruby-rails-sample' do
  recursive true
end

template '/hab/svc/ruby-rails-sample/user.toml' do
  variables database_details
  owner 'hab'
  group 'hab'
  mode '0600'
end

We’ve passed in the database_details hash we set up earlier. In a future version of this recipe, that hash will come from an encrypted data bag on the Chef Server, and all we need to do is change that data structure in the recipe. And later on, we can also change this project to deploy a PostgreSQL server using Habitat and use the supervisor’s gossip to discover that and configure the application automatically. But, that is an article for another time.

Finally, we want to enable and start the service with Habitat. When we run hab start delivery-example/ruby-rails-sample on a system, it will run the service in the foreground under the Habitat Supervisor. If we do that in a Chef recipe, Chef will hang here forever. The hab_service resource will set up the service to run as a systemd unit.

hab_service 'delivery-example/ruby-rails-sample' do
  action [:enable, :start]
end

After this recipe runs, we will have Habitat installed at the latest version, and our application package will be installed and running as the hab user. If we are working in an environment where we have multiple Rails applications to manage, we can use this pattern across those other projects and automate ourselves out of a job.

No comments :