December 14, 2014

Day 14 - Using Chef Provisioning to Build Chef Server

Or, Yo Dawg, I heard you like Chef.

Written by: Joshua Timberman (@jtimberman)
Edited by: Paul Graydon (@twirrim)

This post is dedicated to Ezra Zygmuntowicz. Without Ezra, we wouldn’t have had Merb for the original Chef server, chef-solo, and maybe not even Chef itself. His contributions to the Ruby, Rails, and Chef communities are immense. Thanks, Ezra, RIP.

In this post, I will walk through a use case for Chef Provisioning used at Chef Software, Inc.: building a new Hosted Chef infrastructure with Chef Server 12 on Amazon EC2. This isn’t an in-depth how to guide, but I will illustrate the important components to discuss what is required to setup Chef Provisioning, with a real world example. Think of it as a whirlwind tour of Chef Provisioning and Chef Server 12.

Background

If you have used Chef for awhile, you may recall the wiki page “Bootstrap Chef RubyGems Installation” - the installation guide that uses cookbooks with chef-solo to install all the components required to run an open source Chef Server. This idea was a natural fit in the omnibus packages for Enterprise Chef (nee Private Chef) in the form of private-chef-ctl reconfigure: that command kicks off a chef-solo run that configures and starts all the Chef Server services.

It should be no surprise, that at CHEF we build Hosted Chef using Chef. Yes, it’s turtles and yo-dawg jokes all the way down. As the CHEF CTO Adam described when talking about one Chef Server codebase, we want to bring our internal deployment and development practices in line with what we’re shipping to customers, and we want to unify our approach so we can provide better support.

Chef Server 12

As announced recently, Chef Server 12 is generally available. For purposes of the example discussed below, we’ll provision three machines: one backend, one frontend (with Chef Manage and Chef Reporting), and one running Chef Analytics. While Chef Server 12 has the capability to install add-ons, we have a special cookbook with a resource to manage the installation of “Chef Server Ingredients.” This is so we can also install the chef-server-core package used by both the API frontend nodes and the backend nodes.

Chef Provisioning

Chef Provisioning is a new capability for Chef, where users can define “machines” as Chef resources in recipes, and then converge those recipes on a node. This means that new machines are created using a variety of possible providers (AWS, OpenStack, or Docker, to name a few), and they can have recipes applied from other cookbooks available on the Chef Server.

Chef Provisioning “runs” on a provisioner node. This is often a local workstation, but it could be a specially designated node in a data center or cloud provider. It is simply a recipe run by chef-client (or chef-solo). When using chef-client, any Chef Server will do, including Hosted Chef. Of course, the idea here is we don’t have a Chef Server yet. In my examples in this post, I’ll use my OS X laptop as the provisioner, and Chef Zero as the server.

Assemble the Pieces

The cookbook that does the work using Chef Provisioning is chef-server-cluster. Note that this cookbook is under active development, and the code it contains may differ from the code in this post. As such, I’ll post relevant portions to show the use of Chef Provisioning, and the supporting local setup required to make it go. Refer to the README.md in the cookbook for the most recent information on how to use it.

Amazon Web Services EC2

The first thing we need is an AWS account for the EC2 instances. Once we have that, we need an IAM user that has privileges to manage EC2, and an SSH keypair to log into the instances. It is outside the scope of this post to provide details on how to assemble those pieces. However once those are acquired, do the following:

Put the access key and secret access key configuration in ~/.aws/config. This is automatically used by chef-provisioning’s AWS provider. The SSH keys will be used in a data bag item (JSON) that is described later. You will then want to choose an AWS region to use. For sake of example, my keypair is named hc-metal-provisioner in the us-west-2 region.

Chef Provisioning needs to know about the SSH keys in three places:

  1. In the .chef/knife.rb, the private_keys and public_keys configuration settings.
  2. In the machine_options that is used to configure the (AWS) driver so it can connect to the machine instances.
  3. In a recipe.

This is described in more detail below.

Chef Repository

We use a Chef Repository to store all the pieces and parts for the Hosted Chef infrastructure. For example purposes I’ll use a brand new repository. I’ll use ChefDK’s chef generate command:

% chef generate repo sysadvent-chef-cluster

This repository will have a Policyfile.rb, a .chef/knife.rb config file, and a couple of data bags. The latest implementation specifics can be found in the chef-server-cluster cookbook’s README.md.

Chef Zero and Knife Config

As mentioned above, Chef Zero will be the Chef Server for this example, and it will run on a specific port (7799). I started it up in a separate terminal with:

% chef-zero -l debug -p 7799

The knife config file will serve two purposes. First, it will be used to load all the artifacts into Chef Zero. Second, it will provide essential configuration to use with chef-client. Let’s look at the required configuration.

This portion tells chef, knife, and chef-client to use the chef-zero instance started earlier.

chef_server_url 'http://localhost:7799'
node_name       'chef-provisioner'

In the next section, I’ll discuss the policyfile feature in more detail. These configuration settings tell chef-client to use policyfiles, and which deployment group the client should use.

use_policyfile   true
deployment_group 'sysadvent-demo-provisioner'

As mentioned above, these are the configuration options that tell Chef Provisioning where the keys are located. The key files must exist on the provisioning node somewhere.

First here’s the knife config:

private_keys     'hc-metal-provisioner' => '/tmp/ssh/id_rsa'
public_keys      'hc-metal-provisioner' => '/tmp/ssh/id_rsa.pub'

Then the recipe - this is from the current version of chef-server-cluster::setup-ssh-keys.

fog_key_pair node['chef-server-cluster']['aws']['machine_options']['bootstrap_options']['key_name'] do
  private_key_path '/tmp/ssh/id_rsa'
  public_key_path '/tmp/ssh/id_rsa.pub'
end

The attribute here is part of the driver options set using the with_machine_options method for Chef Provisioning in chef-server-cluster::setup-provisioner. For further reading about machine options, see Chef Provisioning configuration documentation. While the machine options will automatically use keys stored in ~/.chef/keys or ~/.ssh, we do this to avoid strange conflicts on local development systems used for test provisioning. An issue has been opened to revisit this.

Policyfile.rb

Beware, gentle reader! This is an experimental new feature that mayWwill change. However, I wanted to try it out, as it made sense for the workflow when I was assembling this post. Read more about Policyfiles in the ChefDK repository. In particular, read the “Motivation and FAQ” section. Also, Chef (client) 12 is required, which is included in the ChefDK package I have installed on my provisioning system.

The general idea behind Policyfiles is to assemble node’s run list as an artifact, including all the roles and recipes needed to fulfill its job in the infrastructure. Each policyfile.rb contains at least the following.

  • name: the name of the policy
  • run_list: the run list for nodes that use this policy
  • default_source: the source where cookbooks should be downloaded (e.g., Supermarket)
  • cookbook: define the cookbooks required to fulfill this policy

As an example, here is the Policyfile.rb I’m using, at the toplevel of the repository:

name            'sysadvent-demo'
run_list        'chef-server-cluster::cluster-provision'
default_source  :community
cookbook        'chef-server-ingredient', '>= 0.0.0',
                :github => 'opscode-cookbooks/chef-server-ingredient'
cookbook        'chef-server-cluster', '>= 0.0.0',
                :github => 'opscode-cookbooks/chef-server-cluster'

Once the Policyfile.rb is written, it needs to be compiled to a lock file (Policyfile.lock.json) with chef install. Installing the policy does the following.

  • Build the policy
  • “Install” the cookbooks to the cookbook store (~/.chefdk/cache/cookbooks)
  • Write the lockfile

This doesn’t put the cookbooks (or the policy) on the Chef Server. We’ll do that in the upload section with chef push.

Data Bags

At CHEF, we prefer to move configurable data and secrets to data bags. For secrets, we generally use Chef Vault, though for the purpose of this example we’re going to skip that here. The chef-server-cluster cookbook has a few data bag items that are required before we can run Chef Client.

Under data_bags, I have these directories/files.

  • secrets/hc-metal-provisioner-chef-aws-us-west-2.json: the name hc-metal-provisioner-chef-aws-us-west-2 is an attribute in the chef-server-cluster::setup-ssh-keys recipe to load the correct item; the private and public SSH keys for the AWS keypair are written out to /tmp/ssh on the provisioner node
  • secrets/private-chef-secrets-_default.json: the complete set of secrets for the Chef Server systems, written to /etc/opscode/private-chef-secrets.json
  • chef_server/topology.json: the topology and configuration of the Chef Server. Currently this doesn’t do much but will be expanded in future to inform /etc/opscode/chef-server.rb with more configuration options

See the chef-server-cluster cookbook README.md for the latest details about the data bag items required. Note At this time, chef-vault is not used for secrets, but that will change in the future.

Upload the Repository

Now that we’ve assembled all the required components to converge the provisioner node and start up the Chef Server cluster, let’s get everything loaded on the Chef Server.

Ensure the policyfile is compiled and installed, then push it as the provisioner deployment group. The group name is combined with the policy name in the config that we saw earlier in knife.rb. The chef push command uploads the cookbooks, and also creates a data bag item that stores the policyfile’s rendered JSON.

% chef install
% chef push provisioner

Next, upload the data bags.

% knife upload data_bags

We can now use knife to confirm that everything we need is on the Chef Server:

% knife data bag list
chef_server
policyfiles
secrets
% knife cookbook list
apt                      11131342171167261.63923027125258247.235168191861173
chef-server-cluster      2285060862094129.64629594500995644.198889591798187
chef-server-ingredient   37684361341419357.41541897591682737.246865540583454
chef-vault               11505292086701548.4466613666701158.13536425383812

What’s with those crazy versions? That is what the policyfile feature does. The human readable versions are no longer used, cookbook versions are locked using unique, automatically generated version strings, so based on the policy we know the precise cookbook dependency graph for any given policy. When Chef runs on the provisioner node, it will use the versions in its policy. When Chef runs on the machine instances, since they’re not using Policyfiles, it will use the latest version. In the future we’ll have policies for each of the nodes that are managed with Chef Provisioning.

Checkpoint

At this point, we have:

  • ChefDK installed on the local privisioning node (laptop) with Chef client version 12
  • AWS IAM user credentials in ~/.aws/config for managing EC2 instances
  • A running Chef Server using chef-zero on the local node
  • The chef-server-cluster cookbook and its dependencies
  • The data bag items required to use chef-server-cluster’s recipes, including the SSH keys Chef Provisioning will use to log into the EC2 instances
  • A knife.rb config file that will point chef-client at the chef-zero server, and tells it to use policyfiles

Chef Client

Finally, the moment (or several moments…) we have been waiting for! It’s time to run chef-client on the provisioning node.

% chef-client -c .chef/knife.rb

While that runs, let’s talk about what’s going on here.

Normally when chef-client runs, it reads configuration from /etc/chef/client.rb. As I mentioned, I’m using my laptop, which has its own run list and configuration, so I need to specify the knife.rb discussed earlier. This will use the chef-zero Chef Server running on port 7799, and the policyfile deployment group.

In the output, we’ll see Chef get its run list from the policy file, which looks like this:

resolving cookbooks for run list: ["chef-server-cluster::cluster-provision@0.0.7 (081e403)"]
Synchronizing Cookbooks:
  - chef-server-ingredient
  - chef-server-cluster
  - apt
  - chef-vault

The rest of the output should be familiar to Chef users, but let’s talk about some of the things Chef Provisioning is doing. First, the following resource is in the chef-server-cluster::cluster-provision recipe:

machine 'bootstrap-backend' do
  recipe 'chef-server-cluster::bootstrap'
  ohai_hints 'ec2' => '{}'
  action :converge
  converge true
end

The first system that we build in a Chef Server cluster is a backend node that “bootstraps” the data store that will be used by the other nodes. This includes the postgresql database, the RabbitMQ queues, etc. Here’s the output of Chef Provisioning creating this machine resource.

Recipe: chef-server-cluster::cluster-provision
  * machine[bootstrap-backend] action converge
    - creating machine bootstrap-backend on fog:AWS:862552916454:us-west-2
    -   key_name: "hc-metal-provisioner"
    -   image_id: "ami-b99ed989"
    -   flavor_id: "m3.medium"
    - machine bootstrap-backend created as i-14dec01b on fog:AWS:862552916454:us-west-2
    - Update tags for bootstrap-backend on fog:AWS:862552916454:us-west-2
    -   Add Name = "bootstrap-backend"
    -   Add BootstrapId = "http://localhost:7799/nodes/bootstrap-backend"
    -   Add BootstrapHost = "champagne.local"
    -   Add BootstrapUser = "jtimberman"
    - create node bootstrap-backend at http://localhost:7799
    -   add normal.tags = nil
    -   add normal.chef_provisioning = {"location"=>{"driver_url"=>"fog:AWS:XXXXXXXXXXXX:us-west-2", "driver_version"=>"0.11", "server_id"=>"i-14dec01b", "creator"=>"user/IAMUSERNAME, "allocated_at"=>1417385355, "key_name"=>"hc-metal-provisioner", "ssh_username"=>"ubuntu"}}
    -   update run_list from [] to ["recipe[chef-server-cluster::bootstrap]"]
    - waiting for bootstrap-backend (i-14dec01b on fog:AWS:XXXXXXXXXXXX:us-west-2) to be ready ...
    - bootstrap-backend is now ready
    - waiting for bootstrap-backend (i-14dec01b on fog:AWS:XXXXXXXXXXXX:us-west-2) to be connectable (transport up and running) ...
    - bootstrap-backend is now connectable
    - generate private key (2048 bits)
    - create directory /etc/chef on bootstrap-backend
    - write file /etc/chef/client.pem on bootstrap-backend
    - create client bootstrap-backend at clients
    -   add public_key = "-----BEGIN PUBLIC KEY-----\n..."
    - create directory /etc/chef/ohai/hints on bootstrap-backend
    - write file /etc/chef/ohai/hints/ec2.json on bootstrap-backend
    - write file /etc/chef/client.rb on bootstrap-backend
    - write file /tmp/chef-install.sh on bootstrap-backend
    - run 'bash -c ' bash /tmp/chef-install.sh'' on bootstrap-backend

From here, Chef Provisioning kicks off a chef-client run on the machine it just created. This install.sh script is the one that uses CHEF’s omnitruck service. It will install the current released version of Chef, which is 11.16.4 at the time of writing. Note that this is not version 12, so that’s another reason we can’t use Policyfiles on the machines. The chef-client run is started on the backend instance using the run list specified in the machine resource.

Starting Chef Client, version 11.16.4
 resolving cookbooks for run list: ["chef-server-cluster::bootstrap"]
 Synchronizing Cookbooks:
   - chef-server-cluster
   - chef-server-ingredient
   - chef-vault
   - apt

In the output, we see this recipe and resource:

Recipe: chef-server-cluster::default
  * chef_server_ingredient[chef-server-core] action reconfigure
    * execute[chef-server-core-reconfigure] action run
      - execute chef-server-ctl reconfigure

An “ingredient” is a Chef Server component, either the core package (above), or one of the Chef Server add-ons like Chef Manage or Chef Reporting. In normal installation instructions for each of the add-ons, their appropriate ctl reconfigure is run, which is all handled by the chef_server_ingredient resource. The reconfigure actually runs Chef Solo, so we’re running chef-solo in a chef-client run started inside a chef-client run.

The bootstrap-backend node generates some files that we need on other nodes. To make those available using Chef Provisioning, we use machine_file resources.

%w{ actions-source.json webui_priv.pem }.each do |analytics_file|
  machine_file "/etc/opscode-analytics/#{analytics_file}" do
    local_path "/tmp/stash/#{analytics_file}"
    machine 'bootstrap-backend'
    action :download
  end
end

machine_file '/etc/opscode/webui_pub.pem' do
  local_path '/tmp/stash/webui_pub.pem'
  machine 'bootstrap-backend'
  action :download
end

These are “stashed” on the local node - the provisioner. They’re used for Chef Manage webui, and the Chef Analytics node. When the recipe runs on the provisioner, we see this output:

  * machine_file[/etc/opscode-analytics/actions-source.json] action download
    - download file /etc/opscode-analytics/actions-source.json on bootstrap-backend to /tmp/stash/actions-source.json
  * machine_file[/etc/opscode-analytics/webui_priv.pem] action download
    - download file /etc/opscode-analytics/webui_priv.pem on bootstrap-backend to /tmp/stash/webui_priv.pem
  * machine_file[/etc/opscode/webui_pub.pem] action download
    - download file /etc/opscode/webui_pub.pem on bootstrap-backend to /tmp/stash/webui_pub.pem

They are uploaded to the frontend and analytics machines with the files resource attribute. Files are specified as a hash. The key is the target file to upload to the machine, and the value is the source file from the provisioning node.

machine 'frontend' do
  recipe 'chef-server-cluster::frontend'
  files(
        '/etc/opscode/webui_priv.pem' => '/tmp/stash/webui_priv.pem',
        '/etc/opscode/webui_pub.pem' => '/tmp/stash/webui_pub.pem'
       )
end

machine 'analytics' do
  recipe 'chef-server-cluster::analytics'
  files(
        '/etc/opscode-analytics/actions-source.json' => '/tmp/stash/actions-source.json',
        '/etc/opscode-analytics/webui_priv.pem' => '/tmp/stash/webui_priv.pem'
       )
end

Note These files are transferred using SSH, so they’re not passed around in the clear.

The provisioner will converge the frontend next, followed by the analytics node. We’ll skip the bulk of the output since we saw it earlier with the backend.

  * machine[frontend] action converge
  ... SNIP
    - upload file /tmp/stash/webui_priv.pem to /etc/opscode/webui_priv.pem on frontend
    - upload file /tmp/stash/webui_pub.pem to /etc/opscode/webui_pub.pem on frontend

Here is where the files are uploaded to the frontend, so the webui will work (it’s an API client itself, like knife, or chef-client).

When the frontend runs chef-client, not only does it install the chef-server-core and run chef-server-ctl reconfigure via the ingredient resource, it also gets the manage and reporting addons:

* chef_server_ingredient[opscode-manage] action install
  * package[opscode-manage] action install
    - install version 1.6.2-1 of package opscode-manage
* chef_server_ingredient[opscode-reporting] action install
   * package[opscode-reporting] action install
     - install version 1.2.1-1 of package opscode-reporting
Recipe: chef-server-cluster::frontend
  * chef_server_ingredient[opscode-manage] action reconfigure
    * execute[opscode-manage-reconfigure] action run
      - execute opscode-manage-ctl reconfigure
  * chef_server_ingredient[opscode-reporting] action reconfigure
    * execute[opscode-reporting-reconfigure] action run
      - execute opscode-reporting-ctl reconfigure

Similar to the frontend above, the analytics node will be created as an EC2 instance, and we’ll see the files uploaded:

    - upload file /tmp/stash/actions-source.json to /etc/opscode-analytics/actions-source.json on analytics
    - upload file /tmp/stash/webui_priv.pem to /etc/opscode-analytics/webui_priv.pem on analytics

Then, the analytics package is installed as an ingredient, and reconfigured:

* chef_server_ingredient[opscode-analytics] action install
* package[opscode-analytics] action install
  - install version 1.0.4-1 of package opscode-analytics
* chef_server_ingredient[opscode-analytics] action reconfigure
  * execute[opscode-analytics-reconfigure] action run
    - execute opscode-analytics-ctl reconfigure
...
Chef Client finished, 10/15 resources updated in 1108.3078 seconds

This will be the last thing in the chef-client run on the provisioner, so let’s take a look at what we have.

Results and Verification

We now have three nodes running as EC2 instances for the backend, frontend, and analytics systems in the Chef Server. We can view the node objects on our chef-zero server:

% knife node list
analytics
bootstrap-backend
chef-provisioner
frontend

We can use search:

% knife search node 'ec2:*' -r
3 items found

analytics:
  run_list: recipe[chef-server-cluster::analytics]

bootstrap-backend:
  run_list: recipe[chef-server-cluster::bootstrap]

frontend:
  run_list: recipe[chef-server-cluster::frontend]

% knife search node 'ec2:*' -a ipaddress
3 items found

analytics:
  ipaddress: 172.31.13.203

bootstrap-backend:
  ipaddress: 172.31.1.60

frontend:
  ipaddress: 172.31.1.120

If we navigate to the frontend IP, we can sign up using the Chef Server management console, then download a starter kit and use that to bootstrap new nodes against the freshly built Chef Server.

% unzip chef-starter.zip
Archive:  chef-starter.zip
...
  inflating: chef-repo/.chef/sysadvent-demo.pem
  inflating: chef-repo/.chef/sysadvent-demo-validator.pem
% cd chef-repo
% knife client list
sysadvent-demo-validator
% knife node create sysadvent-node1 -d
Created node[sysadvent-node1]

If we navigate to the analytics IP, we can sign in with the user we just created, and view the events from downloading the starter kit: the validator client key was regenerated, and the node was created.

Next Steps

For those following at home, this is now a fully functional Chef Server. It does have premium features (manage, reporting, analytics), but those are free up to 25 nodes. We can also destroy the cluster, using the cleanup recipe. That can be applied by disabling policyfile in .chef/knife.rb:

% grep policyfile .chef/knife.rb
# use_policyfile   true
% chef-client -c .chef/knife.rb -o chef-server-cluster::cluster-clean
Recipe: chef-server-cluster::cluster-clean
  * machine[analytics] action destroy
    - destroy machine analytics (i-5cdac453 at fog:AWS:XXXXXXXXXXXX:us-west-2)
    - delete node analytics at http://localhost:7799
    - delete client analytics at clients
  * machine[frontend] action destroy
    - destroy machine frontend (i-68dfc167 at fog:AWS:XXXXXXXXXXXX:us-west-2)
    - delete node frontend at http://localhost:7799
    - delete client frontend at clients
  * machine[bootstrap-backend] action destroy
    - destroy machine bootstrap-backend (i-14dec01b at fog:AWS:XXXXXXXXXXXXX:us-west-2)
    - delete node bootstrap-backend at http://localhost:7799
    - delete client bootstrap-backend at clients
  * directory[/tmp/ssh] action delete
    - delete existing directory /tmp/ssh
  * directory[/tmp/stash] action delete
    - delete existing directory /tmp/stash

As you can see, the Chef Provisioning capability is powerful, and gives us a lot of flexibility for running a Chef Server 12 cluster. Over time as we rebuild Hosted Chef with it, we’ll add more capability to the cookbook, including HA, scaled out frontends, and splitting up frontend services onto separate nodes.

1 comment :

Em said...

Thanks for the post! And thanks for the code on github, which helps playing around Chef Provisioning and installs Chef Server. Exited about the prospect of using it after waiting for it to mature.

One thing I had to work out (a little too hard) and was not obvious from the start was what the cluster-provision recipe was actually going to do. The fact that the use of the name "boostrap-backend" confused me a fair bit did not help: where is actually the Chef Server? Is it in the frontend too, since chef-server-core is installed on it? Why boostrap?

It might have helped me to know that the cluster-provision recipe was going to:

1) Create the backend machine, which actually hosts the deployed Chef Server
2) Download from the backend machine to the Chef Provisioning server (the Chef Zero instance on the local machine) all the files that will be later used to configure the frontend machine and the analytics machine
3) Create the frontend machine serving the web GUI to the backend, and upload the configuration files previously downloaded from the backend
4) Create the analytics machine holding the information on operations done on the backend, and upload the configuration files previously downloaded from the backend

Also, for me personally, the addition of the Policyfile.rb concept added on to the complexity and made it harder to understand how it all works. Maybe leave it to another post, together with Chef Vault?

My 2 c.

Em