December 25, 2013

Day 25 - Go at a glance

Written by: Kelsey Hightower (@kelseyhightower)
Edited by: Ben Cotton (@funnelfiasco)

2013 was a fantastic year for Go and system tools written in the language. Derek Collison, Founder & CEO of Apcera Inc, predicted in September 2012 that “Go will become the dominant language for systems work in IaaS, Orchestration, and PaaS in 24 months”. The release of the following projects, written in Go, should serve as proof that we are on track to see that happen:
  • Docker (Pack, ship and run any application as a lightweight container)
  • Packer (Automates the creation of any type of machine image)
  • Etcd (Highly-available key value store for shared configuration and service discovery)
  • SkyDNS (DNS service for service discovery)
  • Heka (Data collection and processing made easy)
  • Groupcache (Memcache replacement for caching and cache-filling)
I take this as a strong signal that the time to learn Go is now. The language is stable, has a large and active community, and Go is getting better with every release. The Go feature list is extensive, but when it comes to system administration the following items stand out for me:
  • Simplicity
  • Large Standard Library
  • Statically Compiled Binaries
  • Concurrency


Go only has 25 key words. There are no classes, exceptions, or inheritance, which keeps the language small and approachable. Go doesn’t force a particular paradigm. You can get a lot done in Go writing structured code, yet still end-up with an elegant solution.
Lets review a short example to get a feel for the language:
// main.go
package main

import (

func main() {
    log.Print("I'm using Go")

Every Go program starts with a main package.

package main

Next we import the log package from the standard library:

import (

Finally we define the main entry point of the program. In the body of main we call the Print() function from the log package, which we use to log "I'm using Go" to the system console:

func main() {
    log.Print("I'm using Go")

Unlike Ruby or Python, Go is a compiled language. We must compile our code before we can run it. You’ll find it refreshing to know Go makes the compilation process easy and fast. So fast that we can compile and run our code in a single step:

go run main.go
2013/12/22 19:38:40 I'm using Go

This is great during development, but when you’re ready for deployment you’ll want to create a static binary:

go build -o main main.go
2013/12/22 19:38:40 I'm using Go

You can get a deeper dive into the language by taking A Tour of Go.

Large Standard Library

Go ships with a large standard library containing just about everything you need for system administration out of the box:
  • command line flag processing
  • interacting with the OS -- executing shell commands, reading and writing files, etc
  • working with various encodings including JSON, CSV, XML, and base64
Go also ships with a very performant HTTP server implementation for serving static files and building REST services.


Often times the tools we build start out small. When it comes time to scale for performance we often need to turn to 3rd party frameworks or libraries. With Go you don’t have that problem. Concurrency is built-in, and not only that, it’s easy to use.

Check out Rob Pike’s, one of the original authors of Go, excellent talk on Go concurrency patterns.

Statically-compiled Binaries

My favorite Go feature is the static binary. Running the go build command produces a self-contained binary which includes all package dependencies and the Go runtime. This means I don’t have to install Go on every target system or worry about conflicting dependencies at deploy time. This is by far the largest time saver I’ve come to appreciate when working with Go.

There is, however, a catch. Since Go compiles to machine code you need to build binaries for each platform you want your code to run on. Linux binaries won't work on Windows. But just like everything else in Go, the process of building static binaries for other platforms is pretty straight forward. Go ships with all the bits necessary for cross-compiling your code, but you'll have to setup the proper toolchain first. Bootstrapping the environment for cross-compiling can be painful especially for people new to the process, but there are a few tools to help you automate the entire process:

goxc (build tool focused on cross-compiling and packaging)
gox (simple, no frills Go cross compile tool)

Once the toolchain is in place you can simply override the GOARCH and GOOS environment variables to cross-compile your code. For example, if you wanted to target Windows running on the 64 bit architecture run the following command:

GOARCH="amd64" GOOS="windows" go build

For more details about building static binaries for other platforms checkout Dave Cheney’s Introduction to cross compilation with Go.


I encourage you to consider Go for your next System Administration task, and if you do you’ll quickly realize the attention Go has been getting lately is more than hype. Go delivers a modern platform with classic language features that makes it easy to get things done. In addition to all the things you’d expect from a programming language the collection of unique features such as static binaries, fast compilation, and built-in concurrency are sure to change the way you approach everyday problems.

December 24, 2013

Day 24 - Ars Longa, Vita Brevis

Written By: Mohit Chawla (@a1cy)
Edited By: Aleksey Tsalolikhin (@atsaloli)

A common, recurring argument and observation about the field of operations is that its quite young, compared to other engineering disciplines. The nature of learning, in becoming a good operations engineer, benefits from, and requires a multi- and inter-disciplinarian approach - the career, the culture, the actual technical practices and runbooks, and the tools that we design, build and use every day - all of these borrow ideas from other, more established and better understood industries and professions, in order to progress and improve understanding of our own endeavours in the field.

One fundamental area of study in operations, is the human-computer interface. From the design of our systems' peripherals, to the automation, monitoring technologies we use and develop, and the postmortems we carry out of failures and disasters, a comprehensive understanding and acknowledgement of the intricacies of this interaction is central, as emphasized effectively in the talks and writings of many respectable members from the community.

Another area of research at these crossroads but not directly related to system administration, are the digital humanities, that make use of, but are not solely characterized by, powerful tools for information retrieval, text analysis, visualization and statistics to enhance existing understanding, and gain new insights in humanities and social sciences.

To complement existing ongoing projects and efforts aiming to formalize, develop and solidify knowledge in operations, such as Ops School, SABOK, and other scattered forms of dissemination such as articles, podcasts, weeklies, books, talks and mailing lists, it could be beneficial to use some of the techniques employed by digital humanists, as an additional aid, an extra lens to view our field from.

Particularly, Topic Modeling is an indispensable technique used in digital humanities.

'Topic models are a suite of algorithms that uncover the hidden thematic structure in document collections. These algorithms help us develop new ways to search, browse and summarize large archives of texts.'

The above definition is taken from the page of one of the pioneers of the field, David M. Blei, who's also one of the original developers of the Latent Dirichlet Allocation algorithm - which is the dominant topic modeling algorithm used in digital humanities.

By doing a comparative analyses of data corpuses from other engineering disciplines, and the existing literature and understanding of system administration, we can possibly develop newer ways of inference.

An oversimplified version of doing this process could be stated in the following steps:
1) Selection and gathering of literature from different engineering and scientific disciplines.
2) Gathering literature about system administration.
3) Application of LDA to these collections, followed by comparative analysis between the disciplines.

All the three tasks pose various practical problems - procurement of literature limited by economical and personal resources, non overlapping or conflicting rosters of vocabularies and terminologies in these disciplines, the relative lack of formally published literature in system administration, and the tuning of the algorithms themselves. Perhaps by the next installment of sysadvent, I'll have ironed out some of these problems. Meanwhile, if other members of the community are interested in the idea, do get in touch.

1) John Allspaw's Blog
2) Jordan Sissel's entry from SysAdvent 2011 on Learning From Other Industries
3) Slides by Lindsay Holmwood on Alert Design
4) Accessible introduction to Topic Modeling
5) David M. Blei's Topic Modeling page
6) gensim, a python library for topic modeling
7) Mark Burgess's amazon page 8) SABOK
9) Ops School

December 23, 2013

Day 23 - The profession of operations and why it's the coolest thing ever

Written by: John Vincent (@lusis)
Edited by: Ben Cotton (@funnelfiasco)

I own a lot of t-shirts. I'm addicted to them. I have the one in this picture and it's one of my favorite ones. It's a favorite because it is the most accurate description of what a sysadmin does: everything.
This is why operations is one of the coolest careers on the planet.

The IT organization

IT organizations range anywhere from flatter-than-kansas to complex hierarchies where each physical cable on the network has its own org chart. Regardless of how they're structured, there's usually one person who has the most comprehensive view of every moving piece and has most likely performed every single role at some point and time. That person is probably the sysadmin.

It's worth noting that every role in IT is important. This isn't an attempt to denigrate those who are in those roles. Quite the contrary. One of the best things about being a sysadmin is that you get to do them all!


One thing that has always fascinated me about systems administration and operations in general is that it is largely a self-taught field. Even today there are, according to Wikipedia, very few actual sysadmin degrees. It's largely an uncodified discipline (though initiatives like Ops School are trying to change that), and yet it is the core and foundation of a company's IT. You can take courses, get degrees and be certified on everything that a sysadmin does and yet there's no official "sysadmin" certification (that is worth anything, anyway).

There are many theories for this but the one I tend to stick with is that to do your job you simply have to understand every single aspect. In fact, it's expected of you.

The sound of one hand clapping

If a server isn't connected to a network is it still a server? Right out of the gate, you're dealing with networking. But what good is a server if it's not serving. Let's throw some database software on there.

Now you've got performance issues. Looks like there are I/O problems. Now you're a storage admin figuring out iSCSI issues (which also happens to be networking). Now you're looking at what your database software is doing to treat your poor disks that way. You look at tablespaces and hit ratios. You're running explains and tuning bufferpool sizes. When did you become a DBA?
Almost everyone I've ever talked to has come to a specialization in IT by way of being a sysadmin (though that could be biased based on my typical peers). The rare exception is development which has always been a more formal discipline.

Called to service

One of the things that has always attracted me to being a sysadmin outside of getting to touch pretty much every aspect of technology is that of service. You run servers. Servers "serve". You serve others.

You keep the lights on. This is a powerful ideal. Permit me a bit of hyperbole if you will: When your servers go down, your company isn't making money. When companies don't make money, people could lose jobs or worse everyone could lose their job.

Even if your primary industry isn't technology, having backend systems like financials offline can lead to lost revenue, delayed payroll and the like. Even if you outsource all of that and have minimal infrastructure, there's still a sysadmin somewhere in the mix keeping things running. It's systems administrators all the way down.

And let's not forget the users OUTSIDE of your company as well. Think about how people have used tools like Twitter to organize the wholesale replacement of corrupt governments. People have used Facebook to help find loved ones after natural disasters.

To the team of system administrators who kept the Neil Gaiman Yahoo! group running 11 years ago - thank you. Without that I would have never met my wife.

Even if you aren't working in operations at some major internet company, there's a person in your company who didn't have to fight technology for a day and it made them happy.

There's nothing wrong with being a sysadmin

Many people think of being a sysadmin as just a stepping stone. A way in the door. You do your time working the pager and maybe at some point you move to another team. There's nothing wrong with that.

But there's nothing wrong with being a systems administrator. Not only is it one of the most important parts of the bigger picture but quite frankly it's one of the most fun.

You should be PROUD of being in operations. Be proud of being a sysadmin. You don't need a new title.

Trust me. I get it. People can be assholes.
"Oh you're a sysadmin? We'll I'M a devops!".
Avoid those people. They're very unhappy people anyway.

Not a sysadmin?

That's okay. You're important too.

The fact is, we're all important. The people who write the software we use. The people who keep the pipes fast. The people who keep the storage going. We're all in this together and having fun. We cross-pollinate. We share. We learn. We thrive.

Sysadmins are still the best though. =)

December 22, 2013

Day 22 - Getting Started Testing Your Puppet Modules

Written By: Christopher Webber (@cwebber)

So, you read the great article by Paul Czarkowski (@pczarkowski) on December 11th, The Lazy SysAdmin's Guide to Test Driven Chef Cookbooks, but felt left out because you run Puppet... Well this article is for you.

Getting the Environment Setup

The first hurdle to get over is the version of Ruby and where it is coming from. My personal preference is to use something like RVM or rbenv. But the only real thing I recommend is making sure you are using a version of ruby that is the same as what you have in production. I have been bitten by things that work in 1.8.7, the version of Ruby that Puppet uses in production in my environment, but didn't work in 1.9.3, the version I was testing with on my workstation.

The Gems

You will need/want the following gems:

  • puppet (Once again, use the version you are running in production)
  • rspec-puppet
  • puppet-lint
  • puppetlabs_spec_helper

Once you have installed the gems, you should be ready to move on to the next step.

Generating A New Module

For this we are going to use the handy puppet module command to generate our starting module. We will be working with a module called cwebber-example.

$ puppet module generate cwebber-example
Notice: Generating module at /Users/cwebber/cwebber-example

Getting some testing boilerplate in place

Before we start building the boilerplate, it is useful to create the templates. Additionally, I think the boilerplate spec_helper.rb that rspec-puppet creates is a better starting point, so I run rm -rf spec to allow the rspec-puppet files to be used. To get the boilerplate in place run:

$ rspec-puppet-init
 + spec/
 + spec/classes/
 + spec/defines/
 + spec/functions/
 + spec/hosts/
 + spec/fixtures/
 + spec/fixtures/manifests/
 + spec/fixtures/modules/
 + spec/fixtures/modules/example/
 + spec/fixtures/manifests/site.pp
 + spec/fixtures/modules/example/manifests
 + spec/fixtures/modules/example/templates
 + spec/spec_helper.rb
 + Rakefile

Fixing the rake tasks

Finally, I make a slight change to the Rakefile so that some a few additional tasks are supported and to enable a few advanced features we will touch on a little later. Update your Rakefile to look like the following:

require 'rake'

require 'rspec/core/rake_task'
require 'puppetlabs_spec_helper/rake_tasks'

Writing the first test

Given the popularity of TDD, I am going to show the example of writing the test first. While this makes sense for this example, please don't feel bad if you find yourself writing the code first and writing tests to cover that code later.

To keep things simple, we are going to use puppet to create a file called /foo with the contents bar. While this is a little arbitrary, we can get a feel for what it might look like to test a config file. We are going to put this file resource in the init.pp for the module we created.

Creating the test involves adding the following contents to the file spec/classes/example_spec.rb.

require 'spec_helper'

describe "example" do

it do
  should contain_file('/foo').with({
    'ensure'  => 'present',
    'content' => %r{^bar}

Running this test should fail as the resource has not been added to the class yet. To run the test, we run rake spec to kick it off.

$ rake spec
/usr/bin/ruby -S rspec spec/classes/example_spec.rb --color


  1) example should contain File[/foo] with ensure => "present" and content matching /^bar/
     Failure/Error: })
       expected that the catalogue would contain File[/foo]
     # ./spec/classes/example_spec.rb:9

Finished in 0.08232 seconds
1 example, 1 failure

Failed examples:

rspec ./spec/classes/example_spec.rb:5 # example should contain File[/foo] with ensure => "present" and content matching /^bar/
/usr/bin/ruby -S rspec spec/classes/example_spec.rb --color failed

Now that there is a failing test, time to create the puppet code to go with it.

The corresponding puppet code

The puppet code that goes with this test is pretty straight forward. In manifests/init.pp we add the following inside the example class.

file {'/foo':
  ensure  => present,
  content => 'bar'

Now that we have code in place, lets test to make sure it passes our test by running rake spec again.

$ rake spec
/usr/bin/ruby -S rspec spec/classes/example_spec.rb --color

Finished in 0.07837 seconds
1 example, 0 failures

Useful things to know about


One of the things I never cared for was the lack of information when you had passing tests. If you add --format documentation to ~/.rspec you get output that looks like this:

/usr/bin/ruby -S rspec spec/classes/example_spec.rb --color

  should contain File[/foo] with ensure => "present" and content matching /^bar/

Finished in 0.09379 seconds
1 example, 0 failures

rake lint

One of the additional tasks that gets added when we change the Rakefile is the lint task. This task runs puppet-lint across all of the puppet files, verifying that they pass a certain coding standard. To get more info, please visit


As soon as you start to get into more complex modules, you see a dependency on other modules. The .fixtures.yml file is there to help make sure that the appropriate puppet modules are checked out for testing use. See for more details.

December 21, 2013

Day 21 - Making the web secure, one unit test at a time

Written By: Gareth Rushgrove (@garethr)
Edited By: Michael Stahnke (@stahnma)

Writing automated tests for your code is one of those things that, once you have gotten into it, you never want to see code without tests ever again. Why write pages and pages of documentation about how something should work when you can write tests to show exactly how something does work? Looking at the number and quality of testing tools and frameworks (like cucumber, rspec, Test Kitchen, Server Spec, Beaker, Casper and Jasmine to name a few) that have popped up in the last year or so I'm obviously not the only person who has a thing for testing utilities.

One of the other things I am interested in is web application security, so this post is all about using the tools and techniques from unit testing to avoid common web application security issues. I'm using Ruby in the examples but you could quickly convert these to other languages if you desire.

Any port in a storm

Lets start out with something simple. Accidentally exposing applications on TCP ports can lead to data loss or introduce a vector for attack. Maybe your main website is super secure, but you left the port for your database open to the internet. It's the server configuration equivalent of forgetting to lock the back door.

Nmap is a tool lots of people will be familiar with for spanning for open ports. As well as a command line interface Nmap also has good library support in lots of languages so lets try and write a simple tests suite around it.

require "tempfile"
require "nmap/program"
require "nmap/xml"

describe "the website" do
  file ="nmap.xml")
  before(:all) do
    Nmap::Program.scan do |nmap|
      nmap.xml = file.path
      nmap.targets = ""

  @open_ports = []"scan.xml") do |xml|
    xml.each_host do |host|
      host.each_port do |port|
        @open_ports << port.number if port.state == :open

With the above code in place we can then write tests like:

it "should have two ports open" do
  @open_ports.should have(2).items

it "should have port 80 open" do
  @open_ports.should include(80)

it "should have port 22 closed" do
  @open_ports.should_not include(22)

We can run these manually, but also potentially as part of a continuous integration build or constantly as part of a monitoring suite.

Run the Guantlt

We had to do quite a bit of work wrapping Nmap before we could write the tests above. Wouldn't it be nice if someone had already wrapped lots of useful security minded tools for us? Gauntlt is pretty much just that, it's a security testing framework based on cucumber which currently supports curl, nmap, sslyze, sqlmap, garmr and a bunch more tools in master. Lets do something more advanced than our port scanning test above by testing a URL for a SQL injection vulnerability.

Feature: Run sqlmap against a target
  Scenario: Identify SQL injection vulnerabilities
    Given "sqlmap" is installed
    And the following profile:
      | name       | value                                      |
      | target_url | http://localhost/sql-injection?number_id=1 |
    When I launch a "sqlmap" attack with:
      python <sqlmap_path> -u <target_url> —dbms sqlite —batch -v 0 —tables
    Then the output should contain:
      sqlmap identified the following injection points
    And the output should contain:
      [2 tables]
      | numbers         |
      | sqlite_sequence |

The Gauntlt team publish lots of examples like this one alongside the source code, so getting started is easy. Gauntlt is very powerful, but as you'll see from the example above you need to know quite a bit about the underlying tools it is using. In the case above you need to know the various arguments to sqlmap and also how to interpret the output.

Enter Prodder

Prodder is a tool I put together to automate a few specific types of security testing. In many ways it's very similar to Gauntlt; it uses the cucumber testing framework and uses some of the same tools (like nmap and sslyze) under the hood. However rather than a general purpose security framework like Gauntlt, Prodder is higher level and very opinionated. Here's an example:

Feature: SSL
  In order to ensure secure connections
  I want to check the SSL configuration of my servers
    Given "" is installed
    Scenario: Check SSLv2 is disabled
      When we test using the "sslv2" protocol
      Then the exit status should be 0
      And the output should contain "SSLv2 disabled"

    Scenario: Check certificate is trusted
      When we check the certificate
      Then the output should contain "Certificate is Trusted"
      And the output should match /OK — (Common|Subject Alternative) Name Matches/
      And the output should not contain "Signature Algorithm: md5"
      And the output should not contain "Signature Algorithm: md2"
      And the output should contain "Key Size: 2048"

    Scenario: Check certificate renegotiations
      When we test certificate renegotiation
      Then the output should contain "Client-initiated Renegotiations: Rejected"
      And the output should contain "Secure Renegotiation: Supported"

    Scenario: Check SSLv3 is not using weak ciphers
      When we test using the "sslv3" protocol
      Then the output should not contain "Anon"
      And the output should not contain "96bits"
      And the output should not contain "40bits"
      And the output should not contain " 0bits"

This is a little higher level than the Gauntlt example — it's not exposing the workings of sslyze that is doing the actual testing. All you need is an understanding of SSL certifcates. Even if you're not an expert on SSL you can accept the aforementioned opinions of Prodder about what good looks like. Prodder currently contains steps and exampes for port scanning, SSL certificates and security minded HTTP headers. If you already have a cucumber based test suite (including one based on Gauntlt) you can reuse the step definitions in that too.

I'm hoping to build upon Prodder, adding more types of tests and getting agreement on the included opinions from the wider systems administration community. By having a default set of shared assertions about the expected security of out system we can more easily move onto new projects, safe in the knowledge that a test will fail if someone messes up our once secure configuration.

I'm convinced, what should I do next?

As well as trying out some of the above tools and techniques for yourself I'd recommend encouraging more security conversations in your development and operations teams. Here's a few places to start with:

December 20, 2013

Day 20 - Distributed configuration data with etcd

Written by: Kelsey Hightower (@kelseyhightower)
Edited by: Ben Cotton (@funnelfiasco)


I’ve been managing applications for a long time, but I’ve never stopped to ponder why most application configurations are managed via files. Just about every application deployed today requires a configuration file stored in the correct location, with the proper permissions, and valid content on every host that runs the application.

If not, things break.

Sure configuration management tools provide everything you need to automate the process of constructing and syncing these files, but the whole process is starting to feel a bit outdated. Why are we still writing applications from scratch that rely on external tools (and even worse, people) to manage configuration files?

Think about that for a moment.

The state of application configuration seems a bit stagnant, especially when compared to the innovation happening in the world of application deployment. Thanks to virtualization we have the ability to deploy applications in minutes, and with advances in containerization, we get the same results in seconds.

However, all those application instances need to be configured. Is there a better way of doing this, or are we stuck with configuration files as the primary solution?

Introducing etcd

What is etcd? Straight from the docs:
A highly-available key value store for shared configuration and service discovery. etcd is inspired by zookeeper and doozer, with a focus on:
  • Simple: curl'able user facing API (HTTP+JSON)
  • Secure: optional SSL client cert authentication
  • Fast: benchmarked 1000s of writes/s per instance
  • Reliable: Robustly distributed using Raft
On the surface it appears that etcd could be swapped out with any key/value store, but if you did that you would be missing out on some key features such as:
  • Notification on key changes
  • TTLs on keys
But why would anyone choose to move configuration data from files to something like etcd? Well for the same reasons DNS moved away from zone files to a distributed database: speed and portability.


When using etcd all consumers have immediate access to configuration data. etcd makes it easy for applications to watch for changes, which reduces the time between a configuration change and propagation of that change throughout the infrastructure. In contrast, syncing files around takes time and in many cases you need to know the location of the consumer before files can be pushed. This becomes a pain point when you bring autoscaling into the picture.


Using a remote database of any kind can make data more portable. This holds true for configuration data and etcd -- access to configuration data stored in etcd is the same regardless of OS, device, or application platform in use.

Hands on with etcd

Lets run through a few quick examples to get a feel for how etcd works, then we’ll move on to a real world use case.

Adding values

curl -X PUT -L -d value=""

Retrieving values

curl -L


Deleting values

curl -L -XDELETE

That’s all there is to it. No need for a database library or specialized client, we can utilize all of etcd’s features using curl.

A real world use case

To really appreciate the full power of etcd we need to look at a real world example. I’ve put together an example weather-app that caches weather data in a redis database, which just so happens to utilize etcd for configuration.

First we need to populate etcd with the configuration data required by the weather app:

We can do this using curl:
curl -XPUT -L -d value="Portland"
curl -XPUT -L -d value="5"
curl -XPUT -L \
  -d value=""
curl -XPUT -L \
  -d value=""

Next we need to set the etcd host used by the weather app:

For this example I’m using an environment variable to bootstrap things. The prefered method would be to use a DNS service record instead, so we can avoid relying on local settings.

Now with our configuration data in place, and the etcd host set, we are ready to start the weather-app:
2013/12/18 21:17:36 weather app starting ...
2013/12/18 21:17:37 Setting current temp for Portland: 30.92
2013/12/18 21:17:42 Setting current temp for Portland: 30.92

Things seem to be working. From the output above I can tell I’m hitting the right URL to grab the current weather for the city of Portland every 5 seconds. If I check my the redis database, I see that the temperature is being cached:
redis> get Portland

Nothing too exciting there. But watch what happens if I change the value of the /weather_app/city key in etcd:
curl -XPUT -L -d value="Atlanta"

We end up with:
2013/12/18 21:17:36 weather app starting ...
2013/12/18 21:17:37 Setting current temp for Portland: 30.92
2013/12/18 21:17:42 Setting current temp for Portland: 30.92
2013/12/18 21:17:48 Setting current temp for Portland: 30.92
2013/12/18 21:17:53 Setting current temp for Atlanta: 29.84
2013/12/18 21:17:59 Setting current temp for Atlanta: 29.84

Notice how we are now tracking the current temperature for Atlanta instead of Portland. The results are cached in Redis just as expected:
redis> get Atlanta

etcd makes it really easy to update and watch for configuration changes; then apply the results at run-time. While this might seem a bit overkill for a single app instance, it’s incredibly useful when running large clusters or when autoscaling comes into the picture.

Everything we’ve done so far was pretty basic. We used curl to set some configuration, then had our application use those settings. But we can push this idea even further. There is no reason to limit our applications to read-only operations. We can also write configuration data to etcd directly. This unlocks a whole new world of possibilities. Web applications can expose their IP addresses and ports for use by load-balancers. Databases could expose connection details to entire clusters. This would mean making changes to existing applications, and perhaps more importantly would mean changing how we design new applications. But maybe it’s time for AppOps -- lets get out of the way and let the applications configure themselves.


Hopefully this post has highlighted how etcd can go beyond traditional configuration methods by exposing configuration data directly to applications. Does this mean we can get rid of configuration files? Nope. Today the file system provides a standard interface that works just about anywhere. However, it should be clear that files are not the only option for modern application configuration, and viable alternatives do exist.

December 19, 2013

Day 19 - Automating IAM Credentials with Ruby and Chef

Written by: Joshua Timberman (@jtimberman)
Edited by: Shaun Mouton (@sdmouton)

Chef, nee Opscode, has long used Amazon Web Services. In fact, the original iteration of "Hosted Enterprise Chef," "The Opscode Platform," was deployed entirely in EC2. In the time since, AWS has introduced many excellent features and libraries to work with them, including Identity and Access Management (IAM), and the AWS SDK. Especially relevant to our interests is the Ruby SDK, which is available as the aws-sdk RubyGem. Additionally, the operations team at Nordstrom has released a gem for managing encrypted data bags called chef-vault. In this post, I will describe how we use the AWS IAM feature, how we automate it with the aws-sdk gem, and store secrets securely using chef-vault.


First, here are a few definitions and references for readers.
  • Hosted Enterprise Chef - Enterprise Chef as a hosted service.
  • AWS IAM - management system for authentication/authorization to Amazon Web Services resources such as EC2, S3, and others.
  • AWS SDK for Ruby - RubyGem providing Ruby classes for AWS services.
  • Encrypted Data Bags - Feature of Chef Server and Enterprise Chef that allows users to encrypt data content with a shared secret.
  • Chef Vault - RubyGem to encrypt data bags using public keys of nodes on a chef server.

How We Use AWS and IAM

We have used AWS for a long time, before the IAM feature existed. Originally with The Opscode Platform, we used EC2 to run all the instances. While we have moved our production systems to a dedicated hosting environment, we do have non-production services in EC2. We also have some external monitoring systems in EC2. Hosted Enterprise Chef uses S3 to store cookbook content. Those with an account can see this with knife cookbook show COOKBOOK VERSION, and note the URL for the files. We also use S3 for storing the packages from our omnibus build tool. The omnitruck metadata API service exposes this.

All these AWS resources - EC2 instances, S3 buckets - are distributed across a few different AWS accounts. Before IAM, there was no way to have data segregation because the account credentials were shared across the entire account. For (hopefully obvious) security reasons, we need to have the customer content separate from our non-production EC2 instances. Similarly, we need to have the metadata about the omnibus packages separate from the packages themselves. In order to manage all these different accounts and their credentials which need to be automatically distributed to systems that need them, we use IAM users, encrypted data bags, and chef.

Unfortunately, using various accounts adds complexity in managing all this, but through the tooling I'm about to describe, it is a lot easier to manage now than it was in the past. We use a fairly simple data file format of JSON data, and a Ruby script that uses the AWS SDK RubyGem. I'll describe the parts of the JSON file, and then the script.

IAM Permissions

IAM allows customers to create separate groups which are containers of users to have permissions to different AWS resources. Customers can manage these through the AWS console, or through the API. The API uses JSON documents to manage the policy statement of permissions the user has to AWS resources. Here's an example:
  "Statement": [
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": [
Granted to an IAM user, this will allow that user to perform all S3 actions to the bucket an-s3-bucket and all the files it contains. Without the /*, only operations against the bucket itself would be allowed. To set read-only permissions, use only the List and Get actions:
"Action": [
Since this is JSON data, we can easily parse and manipulate this through the API. I'll cover that shortly.

See the IAM policy documentation for more information.

Chef Vault

We use data bags to store secret credentials we want to configure through Chef recipes. In order to protect these secrets further, we encrypt the data bags, using chef-vault. As I have previously written about chef-vault in general, this section will describe what we're interested in from our automation perspective.

Chef vault itself is concerned with three things:
  1. The content to encrypt.
  2. The nodes that should have access (via a search query).
  3. The administrators (human users) who should have access.
"Access" means that those entities are allowed to decrypt the encrypted content. In the case of our IAM users, this is the AWS access key ID and the AWS secret access key, which will be the content to encrypt. The nodes will come from a search query to the Chef Server, which will be added as a field in the JSON document that will be used in a later section. Finally, the administrators will simply be the list of users from the Chef Server.

Data File Format

The script reads a JSON file, described here:
  "accounts": [
  "user": "secret-files",
  "group": "secret-files",
  "policy": {
    "Statement": [
        "Action": "s3:*",
        "Effect": "Allow",
        "Resource": [
  "search_query": "role:secret-files-server"
This is an example of the JSON we use. The fields:
  • accounts: an array of AWS account names that have authentication credentials configured in ~/.aws/config - see my post about managing multiple AWS accounts
  • user: the IAM user to create.
  • group: the IAM group for the created user. We use a 1:1 user:group mapping.
  • policy: the IAM policy of permissions, with the action, the effect, and the AWS resources. See the IAM documentation for more information about this.
  • search_query: the Chef search query to perform to get the nodes that should have access to the resources. For example, this one will allow all nodes that have the Chef role secret-files-server in their expanded run list.
These JSON files can go anywhere, the script will take the file path as an argument.

Create IAM Script

Note This script is cleaned up to save space and get to the meat of it. I'm planning to make it into a knife plugin but haven't gotten a round tuit yet.
require 'inifile'
require 'aws-sdk'
require 'json'
filename = ARGV[0]
dirname  = File.dirname(filename)
aws_data = JSON.parse(
aws_data['accounts'].each do |account|
  aws_creds = {}
  aws_access_keys = {}
  # load the aws config for the specified account
  IniFile.load("#{ENV['HOME']}/.aws/config")[account].map{|k,v| aws_creds[k.gsub(/aws_/,'')]=v}
  iam =
  # Create the group
  group = iam.groups.create(aws_data['group'])
  # Load policy from the JSON file
  policy = AWS::IAM::Policy.from_json(aws_data['policy'].to_json)
  group.policies[aws_data['group']] = policy
  # Create the user
  user = iam.users.create(aws_data['user'])
  # Add the user to the group
  # Create the access keys
  access_keys = user.access_keys.create
  aws_access_keys['aws_access_key_id'] = access_keys.credentials.fetch(:access_key_id)
  aws_access_keys['aws_secret_access_key'] = access_keys.credentials.fetch(:secret_access_key)
  # Create the JSON content to encrypt w/ Chef Vault
  vault_file ="#{File.dirname(__FILE__)}/../data_bags/vault/#{account}_#{aws_data['user']}_unencrypted.json", 'w')
  vault_file.puts JSON.pretty_generate(
      'id' => "#{account}_#{aws_data['user']}",
      'data' => aws_access_keys,
      'search_query' => aws_data['search_query']
  # This would be loaded directly with Chef Vault if this were a knife plugin...
  puts <<-eoh data-blogger-escaped---admins="" data-blogger-escaped---json="" data-blogger-escaped---mode="" data-blogger-escaped---search="" data-blogger-escaped--="" data-blogger-escaped--sd="" data-blogger-escaped-account="" data-blogger-escaped-admins="" data-blogger-escaped-aws_data="" data-blogger-escaped-be="" data-blogger-escaped-client="" data-blogger-escaped-code="" data-blogger-escaped-create="" data-blogger-escaped-data_bags="" data-blogger-escaped-encrypt="" data-blogger-escaped-end="" data-blogger-escaped-eoh="" data-blogger-escaped-humans="" data-blogger-escaped-knife="" data-blogger-escaped-list="" data-blogger-escaped-of="" data-blogger-escaped-paste="" data-blogger-escaped-search_query="" data-blogger-escaped-should="" data-blogger-escaped-unencrypted.json="" data-blogger-escaped-user="" data-blogger-escaped-vault="" data-blogger-escaped-who="">
This is invoked with:
% ./create-iam.rb ./iam-json-data/filename.json
The script iterates over each of the AWS account credentials named in the accounts field of the JSON file named, and loads the credentials from the ~/.aws/config file. Then, it uses the aws-sdk Ruby library to authenticate a connection to AWS IAM API endpoint. This instance object, iam, then uses methods to work with the API to create the group, user, policy, etc. The policy comes from the JSON document as described above. It will create user access keys, and it writes these, along with some other metadata for Chef Vault to a new JSON file that will be loaded and encrypted with the knife encrypt plugin.

As described, it will display a command to copy/paste. This is technical debt, as it was easier than directly working with the Chef Vault API at the time :).

Using Knife Encrypt

After running the script, we have an unencrypted JSON file in the Chef repository's data_bags/vault directory, named for the user created, e.g., data_bags/vault/secret-files_unencrypted.json.
  "id": "secret-files",
  "data": {
    "aws_access_key_id": "the access key generated through the AWS API",
    "aws_secret_access_key": "the secret key generated through the AWS API"
  "search_query": "roles:secret-files-server"
The knife encrypt command is from the plugin that Chef Vault provides. The output of the create-iam.rb script outputs how to use this:
% knife encrypt create vault an-aws-account-name_secret-files \
  --search 'roles:secret-files-server' \
  --mode client \
  --json data_bags/vault/an-aws-account-name_secret-files_unencrypted.json \
  --admins "`knife user list | paste -sd ',' -`"


After running the create-iam.rb script with the example data file, and the unencrypted JSON output, we'll have the following:
  1. An IAM group in the AWS account named secret-files.
  2. An IAM user named secret-files added to the secret-files.
  3. Permission for the secret-files user to perform any S3 operations
    on the secret-files bucket (and files it contains).
  4. A Chef Data Bag Item named an-aws-account-name_secret-files in the vault Bag, which will have encrypted contents.
  5. All nodes matching the search roles:secret-files-server will be present as clients in the item an-aws-account-name_secret-files_keys (in the vault bag).
  6. All users who exist on the Chef Server will be admins in the an-aws-account-name_secret-files_keys item.
To view AWS access key data, use the knife decrypt command.
% knife decrypt vault secret-files data --mode client
    data: {"aws_access_key_id"=>"the key", "aws_secret_access_key"=>"the secret key"}
The way knife decrypt works is you give it the field of encrypted data to encrypt which is why the unencrypted JSON had a field named data created - so we could use that to access any of the encrypted data we wanted. Similarly, we could use search_query instead of data to get the search query used, in case we wanted to update the access list of nodes.

In a recipe, we use the chef-vault cookbook's chef_vault_item helper method to access the content:
require 'chef-vault'
aws = chef_vault_item('vault', 'an-aws-account_secret-files')['data']


I wrote this script to automate the creation of a few dozen IAM users across several AWS accounts. Unsurprisingly, it took longer to test the recipe code and access to AWS resources across the various Chef recipes, than it took to write the script and run it.

Hopefully this is useful for those who are using AWS and Chef, and were wondering how to manage IAM users. Since this is "done" I may or may not get around to releasing a knife plugin.

December 18, 2013

Day 18 - Wide Columns, Shaggy Yaks: HBase on EMR

Written by: Bridget Kromhout (@bridgetkromhout)
Edited by: Shaun Mouton (@sdmouton)

My phone rang at 4am one day last spring. When I dug it out from under my pillow, I wasn't greeted by the automated PagerDuty voice, which makes sense; I wasn't on call. It was the lead developer at 8thBridge, the social commerce startup where I do operations, and he didn't sound happy. "The HBase cluster is gone," he said. "Amazon says the instances are terminated. Can you fix it?"

Spoiler alert: the answer to that question turned out to be yes. In the process (and while stabilizing our cluster), I learned an assortment of things that weren't entirely clear to me from the AWS docs. This SysAdvent offering is not a step-by-step how-to; it's a collection of observations, anecdotes, and breadcrumbs for future googling.

  1. When Automatic Termination Protection Isn't
  2. Master and Servants
  3. A Daemon in My View
  4. Watching Your Every Move
  5. Only Back Up the Data You Want to Keep
  6. A Penny Saved is a Penny You Can Throw at Your AWS Bill
  7. Coroner Cases

1. When Automatic Termination Protection Isn't

But wait, you say! "Amazon EMR launches HBase clusters with termination protection turned on." True, but only if your HBase intent is explicit at launch time.

You'll notice that Amazon Web Services calls the Elastic MapReduce Hadoop clusters "job flows". This term reveals a not-insignificant tenet of the AWS perception of your likely workflow: you are expected to spin up a job flow, load data, crunch data, send results elsewhere in a pipeline, and terminate the job flow. There is some mention of data warehousing in the docs, but the defaults are geared towards loading in data from external to your cluster (often S3).

Since AWS expects you to be launching and terminating clusters regularly, their config examples are either in the form of "bootstrap actions" (configuration options you can only pass to a cluster at start time; they run after instance launch but before daemons start) or "job flow steps" (commands you can run against your existing cluster while it is operational). The cluster lifecycle image in the AWS documentation makes this clear.

Because we don't launch clusters with the CLI but rather via the boto python interface, we start HBase as a bootstrap action, post-launch:

BootstrapAction("Install HBase", "s3://elasticmapreduce/bootstrap-actions/setup-hbase", None)

When AWS support says that clusters running HBase are automatically termination-protected, they mean "only if you launched them with the --hbase option or its gui equivalent".

There's also overlap in their terms. The options for long-running clusters show an example of setting "Auto-terminate" to No. This is the "Keep Alive" setting (--alive with the CLI) that prevents automatic cluster termination when a job ends successfully; it's not the same as Termination Protection, which prevents automatic cluster termination due to errors (human or machine). You'll want to set both if you're using HBase.

In our case, the cluster hit a bug in the Amazon Machine Image and crashed, which then led to automatic termination. Lesson the first: you can prevent this from happening to you!

2. Master and Servants

Master group

For whatever size of cluster you launch, EMR will assign one node to be your Master node (the sole member of the Master group). The Master won't be running the map-reduce jobs; rather, it will be directing the work of the other nodes. You can choose a different-sized instance for your Master node, but we run the same size as we do for the others, since it actually does need plenty of memory and CPU. The Master node will also govern your HBase cluster.

This is a partial jps output on the Master node:

NameNode manages Hadoop's distributed filesystem (HDFS).

JobTracker allocates the map-reduce jobs to the slave nodes.

HRegionMaster manages HBase.

ZooKeeper is a coordination service used by HBase.

$ cat /mnt/var/lib/info/instance.json

Core group

By default, after one node is added to the Master group, EMR will assign the rest of the nodes in your cluster to what it terms the Core group; these are slave nodes in standard Hadoop terms. The Master, Core, and Task nodes (more on those in a moment) will all need to talk to one another. A lot. On ports you won't anticipate. Put them in security groups you open completely to one another (though not, obviously, the world - that's what SSH tunnels are for).

This is a partial jps output on the Core nodes:

DataNode stores HDFS data.

TaskTracker runs map-reduce jobs.

HRegionServer runs HBase by hosting regions.

$ cat /mnt/var/lib/info/instance.json
If you start with a small cluster and then grow it, be warned that the AWS replication factor defaults to 1 for a cluster with 1-3 Core nodes, 2 for 4-9 Core nodes, and 3 for 10 or more Core nodes. The stock Hadoop default is a replication factor of 3, and you probably want to set that if running HBase on EMR. The file is hdfs-site.xml and here is the syntax:
Every now and again, a Core node will become unreachable (like any EC2 instance can). These aren't EBS-backed instances; you can't stop and re-launch them. If they cannot be rebooted, you will need to terminate them and let them be automatically replaced. So, having each of your blocks replicated to more than one instance's local disk is wise. Also, there's a cluster ID file in HDFS called /hbase/ which HBase needs in order to work. You don't want to lose the instance with the only copy of that, or you'll need to restore it from backup.

If you are co-locating map-reduce jobs on the cluster where you also run HBase, you'll notice that AWS decreases the map and reduce slots available on Core nodes when you install HBase. For this use case, a Task group is very helpful; you can allow more mappers and reducers for that group.

Also, while you can grow a Core group, you can never shrink it. (Terminating instances leads to them being marked as dead and a replacement instance spawning.) If you want some temporary extra mapping and reducing power, you don't want to add Core nodes; you want to add a Task group.

Task group

Task nodes will only run TaskTracker, not any of the data-hosting processes. So, they'll help alleviate any mapper or reducer bottlenecks, but they won't help with resource starvation on your RegionServers.

A Task group can shrink and grow as needed. Setting a bid price at Task group creation is how to make the Task group use Spot Instances instead of on-demand instances. You cannot modify the bid price after Task group creation. Review the pricing history for your desired instance type in your desired region before you choose your bid price; you also cannot change instance type after Task group creation, and you can only have one Task group per cluster. If you intend an HBase cluster to persist, I do not recommend launching its Master or Core groups as Spot Instances; no matter how high your bid price, there's always a chance someone will outbid you and your cluster will be summarily terminated.

If you'd like a new node to choose its configs based on if it's in the Task group or not, you can find that information here:
$ cat /mnt/var/lib/info/instance.json
With whatever you're using to configure new instances, you can tell the Task nodes to increase and mapred.tasktracker.reduce.tasks.maximum. These are set in mapred-site.xml, and AWS documents their recommended amounts for various instance types. Now your Task nodes won't have the jobrunning constraints of the Core nodes that are busy serving HBase.

3. A Daemon in My View

The EMR Way of changing config files differs from how one would approach stock Hadoop.

Use the AWS-provided bootstrap actions mentioned earlier to configure any EMR clusters:


But HBase-specific configs need to be changed in their own files:


For any configuration setting, you'll need to specify which file you expect to find it in. This will vary greatly between Hadoop versions; check the defaults in your conf directory as a starting point. You can also specify your own bootstrap actions from a file on S3, like this:


Bootstrap actions are performed on all newly-instantiated instances (including if you need to replace or add a Core instance), so you will need logic in your custom bootstrap that makes specific changes for only new Task instances if desired, as mentioned above.

As mentioned above, the AWS way to make changes after your cluster is running is something called "job flow steps".

While you may find it easier to edit the config files in ~hadoop/conf/ or to have your config management replace them, if you want to capture what you did, so you can replicate it when you relaunch your cluster, framing it as bootstrap actions or job flow steps in your launch script is advisable.

Note that a config change made in a job flow step just logs in and updates your config files for you; it does not restart any relevant daemons. You'll need to determine which one(s) you need to restart, and do so with the appropriate init script.

The recommended way to restart a daemon is to use the init script to stop it (or kill it, if necessary) and then let service nanny (part of EMR's stock image) restart it.

The service nanny process is supposed to keep your cluster humming along smoothly, restarting any processes that may have died. Warning, though; if you're running your Core nodes out of memory, service nanny might also get the OOM hatchet. I ended up just dropping in a once-a-minute nanny-cam cron job so that if the Nagios process check found it wasn't running, it would get restarted:

if ( [ -f /usr/lib/nagios/plugins/check_procs ] && [[ $(/usr/lib/nagios/plugins/check_procs -w 1:1 -C service-nanny) =~ "WARNING" ]]); then
  sudo /etc/init.d/service-nanny restart >> /home/hadoop/nanny-cam.log 2>&1
  exit 0
Insert standard disclaimer about how, depending on versions, your bash syntax may vary.

4. Watching Your Every Move

Ganglia allows you to see how the memory use in your cluster looks and to visualize what might be going on with cluster problems. You can install Ganglia on EMR quite easily, as long as you decide before cluster launch and set the two required bootstrap actions.

Use this bootstrap action to install Ganglia:


Use this bootstrap action to configure Ganglia for HBase:


If you're using Task nodes which come and go, while your job flow persists, your Ganglia console will be cluttered with the ghosts of Task nodes past, and you'll be muttering, "No, my cluster is not 80% down." This is easy to fix by restarting gmond and gmetad on your EMR Master (ideally via cron on a regular basis - I do so hourly):

sudo kill $(ps aux | grep gmond | awk '{print $2}') > /dev/null 2>&1
sudo -u nobody gmond
sudo kill $(ps aux | grep gmetad | awk '{print $2}') > /dev/null 2>&1
You don't need to restart gmetad; that will happen automagically. (And yes, you can get fancier with awk; going for straightforward, here.)

As for Nagios or your preferred alerting mechanism, answering on a port isn't any indication of HBase actually working. Certainly if a RegionServer process dies and doesn't restart, you'll want to correct that, but the most useful check is hbase hbck. Here's a nagios plugin to run it. This will alert in most cases that would cause HBase not to function as desired. I also run the usual NRPE checks on the Master and Core nodes, though allowing for the much higher memory use and loads that typify EMR instances. I don't actually bother monitoring the Task group nodes, as they are typically short-lived and don't run any daemons but TaskTracker. When memory frequently spikes on a Core node, that's a good sign that region hotspotting is happening. (More on that later.)

Other than HBase being functional, you might also want to keep an eye on region balance. In our long-running clusters, the HBase balancer, which is supposed to distribute regions evenly across the RegionServers, turns itself off after a RegionServer restart. I check the HBase console page with Nagios and alert if any RegionServer has fewer than 59 regions. (Configure that according to your own expected region counts.)

check_http!-H your-emr-master -p 60010 -u "/master-status" -r "numberOfOnlineRegions\=[0-5][0-9]?\," --invert-regex

We're trying to keep around 70 regions per RegionServer, and if a RegionServer restarts, it often won't serve as many regions as it previously did. You can manually run the balancer from the HBase shell. The balance_switch command returns its previous status, while the balancer command returns its own success or failure.
$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.0, r0ab71deb2d842ba5d49a261513d28f862ea8ce60, Fri May 17 00:16:53 UTC 2013
hbase(main):001:0> balance_switch true
0 row(s) in 0.0410 seconds
hbase(main):002:0> balancer
0 row(s) in 0.0110 seconds
Regions aren't well-balanced per-table in HBase 0.92.x, but that is reportedly improved in 0.94.x. You can manually move regions around, if you need to get a specific job such as a backup to run; you can also manually split regions. (I'll elaborate on that in a bit.) Note that the automatic balancer won't run immediately after a manual region move.

5. Only Back Up the Data You Want to Keep

The AWS backup tool for HBase uses a wrapper around Hadoop's DistCp to back your data up to S3. Here's how we use it as a job flow step against a running cluster, using the Ruby elastic-mapreduce CLI (as the new unified AWS cli doesn't yet offer EMR support):

$ elastic-mapreduce --jobflow j-your-jobflow-ID --hbase-schedule-backup --full-backup-time-interval 7 --full-backup-time-unit days --incremental-backup-time-interval 24 --incremental-backup-time-unit hours --backup-dir s3://your-S3-bucketname/backups/j-your-jobflow-ID --start-time 2013-12-11T11:00Z
Make sure that in a case of an incremental backup failure, you immediately run a full backup and then carry on with periodic incrementals after that. If you need to restore from these backups, a failed incremental will break your chain back to the most recent full backup, and the restore will fail. It's possible to get around this via manual edits to the Manifest the backup stores on S3, but you're better off avoiding that.

To identify successful backups, you'll see this line on your EMR Master in the file /mnt/var/log/hbase/hbase-hadoop-master-YOUR_EMR_MASTER.out:
13/12/11 13:34:35 INFO metrics.UpdateBackupMetrics: Changing /hbaseBackup/backupMetricsInfo  node data to: {"lastBackupFailed":false,"lastSuccessfulBackupStartTime":"2013-12-11T11:00:20.287Z","lastBackupStartTime":"2013-12-11T11:00:20.287Z","lastBackupFinishTime":"2013-12-11T13:34:35.498Z"}
Backups created with S3DistCp leave temporary files in /tmp on your HDFS; failed backups leave even larger temporary files. To be safe you need as much room to run backups as your cluster occupies (that is, don't allow HDFS to get over 50% full, or your backups will fail.) This isn't as burdensome as it sounds; given the specs of the available EMR instance options, long before you run out of disk, you'll lack enough memory for your jobs to run.

If a backup job hangs, it is likely to hang your HBase cluster. Backups can hang if RegionServers crash. If you need to kill the backup, it's running on your HBase cluster like any map-reduce job and can be killed like any job. It will look like this in your jobtracker:
S3DistCp: hdfs://your-emr-master:9000/hbase -> s3://your-S3-bucket/backups/j-your-jobflow/20131211T110020Z
HBase cluster replication is not supported on EMR images before you get to AWS's premium offerings. If you ever need to migrate your data to a new cluster, you will be wanting replication, because backing up to and then restoring from S3 is not fast (and we haven't even discussed the write lock that consistent backups would want). If you plan to keep using your old cluster until your new one is up and operational, you'll end up needing to use CopyTable or Export/Import.

I've found it's easy to run your Core instances out of memory and hang your cluster with CopyTable if you try to use it on large tables with many regions. I've gotten better results using a time-delimited Export starting from before your last backup started, and then Importing it to your new cluster. Also note that although the docs on Export don't make it explicit, it's implied in CopyTable's example that the time format desired is epoch time in milliseconds (UTC). Export also requires that. Export respects HBase's internal versioning, so it won't overwrite newer data.

After asking AWS Support over many months, I was delighted to see that Hadoop MRv2 and HBase 0.94.x became available at the end of October. We're on the previous offering of MRv1 with HBase 0.92.0, and with retail clients we aren't going to upgrade during prime shopping season, but I look forward to January this year for reasons beyond great snowshoeing. For everything in this post, assume Hadoop 1.0.3 and HBase 0.92.0.

6. A Penny Saved Is a Penny You Can Throw at Your AWS Bill

Since we are using Python to talk to HBase, we use the lightweight Thrift interface. (For actual savings, you want Spot Instances.) Running the Thrift daemon on the EMR Master and then querying it from our applications led to your friend and mine, the OOM-killer, hitting Thrift more often than not. Running it on our Core nodes didn't work well either; they need all their spare memory for the RegionServer processes. (Conspiracy theory sidebar: Java is memory-gobbling, and Sun Microsystems (RIP) made hardware. I'm just saying.) I considered and rejected a dedicated Thrift server, since I didn't want to introduce a new single point of failure. It ended up working better installing a local Thrift daemon on select servers (via a Chef recipe applied to their roles). We also use MongoDB, and talking to local Thrift ended up working much like talking to local mongos.

There's a lot of info out there about building Thrift from source. That's entirely unnecessary, since a Thrift daemon is included with HBase. So, all you need is a JVM and HBase (not that you'll use most of it.) Install HBase from a tarball or via your preferred method. Configure hbase-site.xml so that it can find your EMR Master; this is all that file needs:
Start the Thrift daemon with something along these lines from your preferred startup tool (with paths changed as appropriate for your environment):

env JAVA_HOME="/usr/lib/jvm/java-6-openjdk-amd64/jre/"
exec /opt/hbase/bin/hbase thrift start >> /var/log/thrift/thrift.log 2>&1
Now you can have your application talk to localhost:9090. You'll need to open arbitrary high ports from application servers running this local Thrift daemon to your EMR Master and Core nodes both.

7. Coroner Cases

You know, the corner cases that give you a heart attack and then you wake up in the morgue, only to actually wake up and realize you've been having a nightmare where you are trampled by yellow elephants... just me, then?

HBase HMaster process won't start

You know it's going to be a fun oncall when the HBase HMaster will not start, and logs this error:

NotServingRegionException: Region is not online: -ROOT-,,0

The AWS support team for EMR is very helpful, but of course none of them were awake when this happened. Enough googling eventually led me to the exact info I needed.

In this case, Zookeeper (which you will usually treat as a black box) is holding onto an incorrect instance IP for where the -ROOT- table is being hosted. Fortunately, this is easy to correct:
$ hbase zkcli
zookeeper_cli> rmr /hbase/root-region-server
Now you can restart HMaster if service-nanny hasn't beat you to it:
$ /etc/init.d/hbase-hmaster start

Instance Controller

If the instance controller stops running on your Master node, you can see strange side effects like an inability to launch new Task nodes or an inability to reschedule the time your backups run.

It's possible that you might need to edit /usr/bin/instance-controller and increase the amount of memory allocated to it in the -Xmx directive.

Another cause for failure is if the instance controller has too many logs it hasn't yet archived to S3, or if the disk with the logs fills up.

If the instance controller dies it can then go into a tailspin with the service-nanny attempting to respawn it forever. You may need to disable service-nanny, then stop and start the instance-controller with its init script before re-enabling service-nanny.

A Word On Hotspotting

Choose your keys in this key-value game carefully. If they are sequential, you're likely to end up with hotspotting. While you can certainly turn the balancer off and manually move your regions around between RegionServers, using a hashed key will save you a lot of hassle, if it's possible. (In some cases we key off organic characteristics in our data we can neither control nor predict, so it's not possible in our most hotspotting-prone tables.)

If you limit automatic splitting you might need to manually split a hot region before your backups will succeed. Your task logs will likely indicate which region is hot, though you can also check filesize changes in HDFS. The HBase console on your-emr-master:60010 has links to each table, and a section at the bottom of a table's page where you can do a split:


Optionally you can specify a "Region Key", but it took a bit to figure out which format this means. (The "Region Start Key" isn't the same thing.) The format you want for a Region Key when doing a manual split is what is listed as "Name" on that HBase console page. It will have a format like this:
In this, 1379638609844 is an epoch timestamp in milliseconds and dace217f50fb37b69844a0df864999bc is the region ID.


The Apache project has a page called "the important configurations". An alternate title might be "if you don't set these, best of luck to you, because you're going to need it". Plenty of detailed diagrams out there to explain "regions", but from a resource consumption standpoint, you minimally want to know this:
  • A table starts out in one region.
  • If the region grows to the hbase.regionserver.max.filesize, the region splits in two. Now your table has two regions. Rinse, repeat.
  • Each region takes a minimum amount of memory to serve.
  • If your RegionServer processes end up burdened by serving more regions than ideal, they stand a good chance of encountering the Out of Memory killer (especially while running backups or other HBase-intensive jobs).
  • RegionServers constantly restarting makes your HBase cluster unstable.
I chased a lot of ghosts (Juliet Pause, how I wanted it to be you!) before finally increasing hbase.regionserver.max.filesize. If running 0.92.x, it's not possible to use online merges to decrease the number of regions in a table. The best way I found to shrink our largest table's number of regions was to simply CopyTable it to a new name, then cut over production to write to the new name, then Export/Import changes. (Table renaming is supported starting in 0.94.x with the snapshot facility.)

The conventional wisdom says to limit the number of regions served by any given RegionServer to around 100. In my experience, while it's possible to serve three to four times more on m1.xlarges, you're liable to OOM your RegionServer processes every few days. This isn't great for cluster stability.

Closure on the Opening Anecdote

After that 4am phone call, I did get our HBase cluster and (nearly) all its data back. It was a monumental yak-shave spanning multiple days, but in short, here's how: I started a new cluster that was a restore from the last unbroken spot in the incremental chain. On that cluster, I restored the data from any other valid incremental backups on top of the original restore. Splitlog files in the partial backups on S3 turned out to be unhelpful here; editing the Manifest so it wasn't looking for them would have saved hassle. And for the backups that failed and couldn't be used for a restore with the AWS tools, I asked one of our developers with deep Java skills to parse out the desired data from each partial backup's files on S3 and write it to TSV for me. I then used Import to read those rows into any tables missing them. And we were back in business, map-reducing for fun and profit!

Now that you know everything I didn't, you can prevent a cascading failure of flaky RegionServers leading to failed backups on a cluster unprotected against termination. If you run HBase on EMR after reading all this, you may have new and exciting things go pear-shaped. Please blog about them, ideally somewhere that they will show up in my search results. :)

Lessons learned? Launch your HBase clusters with both Keep Alive and Termination Protection. Config early and often. Use Spot Instances (but only for Task nodes). Monitor and alert for great justice. Make sure your HBase backups are succeeding. If they aren't, take a close look at your RegionServers. Don't allow too much splitting.

And most important of all, have a happy and healthy holiday season while you're mapping, reducing, and enjoying wide columns in your data store!

December 17, 2013

Day 17 - Stupid SSH tricks

Written by: Corey Quinn (@kb1jwq)
Edited by: Ben Cotton (@funnelfiasco)

Every year or two, I like to look back over my client's SSH configuration file and assess what I've changed.

This year's emphasis has been on a few options that center around session persistence. I’ve been spending a lot of time on the road this year, using SSH to log into remote servers over terrible hotel wireless networks. As a result, I’ve found myself plagued by SSH session resets. This can be somewhat distracting when I’m in the midst of a task that requires deep concentration— or in the middle of editing a configuration file without the use of screen or tmux.

ServerAliveInterval 60
This triggers a message from the client to the server every sixty seconds requesting a response, in the event that data haven’t been received from the server in that time. This message is sent via SSH’s encrypted channel.

ServerAliveCountMax 10
This sets the number of server alive messages that will be sent. Combined with ServerAliveInterval, this means that the route to the server can vanish for 11 minutes before the client will forcibly disconnect. Note that in many environments, the system’s TCP timeout will be reached before this.

TCPKeepAlive no
Counterintuitively, setting this results in fewer disconnections from your host, as transient TCP problems can self-repair in ways that fly below SSH's radar. You may not want to apply this to scripts that work via SSH, as "parts of the SSH tunnel going non-responsive" may work in ways you neither want nor expect!

ControlMaster auto
ControlPath ~/.ssh/%r@%h:%p
ControlPersist 4h
These three are a bit interesting. ControlMaster auto permits multiple SSH sessions to opportunistically reuse an existing connection, the socket for which lives at ControlPath (in this case, a socket file that lives at ~/.ssh/$REMOTE_LOGIN_USENAME@$HOST:$SSH_PORT). Should that socket not exist, it will be created— and thanks to ControlPersist, it will continue to exist for four hours. Taken as a whole, this has the effect of causing subsequent SSH connections (including scp, rsync (provided you’re using SSH as a transport), and sftp) to be able to skip the SSH session establishment.

As a quick test, my initial connection with these settings takes a bit over 2 seconds to complete. Subsequent connections to that same host complete in 0.3 seconds -- almost an order of magnitude faster. This is particularly useful when using a configuration management that’s establishing repeated SSH connections to the same host, such as ansible, or salt-ssh. It’s worth mentioning that ControlMaster was introduced in OpenSSH 4.0, whereas ControlPersist didn’t arrive until OpenSSH 5.6.

The last trick is a bit off the topic of SSH, as it’s not (strictly speaking) SSH based. Mosh (from “mobile shell”) is a project that uses SSH for its initial authentication, but then falls over to a UDP-based transport. It offers intelligent local echoing over latent links (text that the server hasn’t acknowledged shows up as underlined locally), and persists through connection changes. Effectively, I can start up a mosh session, close my laptop, and go to another location. When I connect to a new wireless network, the session resumes seamlessly. This has the effect of making latent links far more comfortable to work with; I’m typing this post in vim on a server that’s currently 6000 miles and 150ms away from my laptop, for instance.

As an added benefit, mosh prioritizes Ctrl-C; if you’ve ever accidentally catted a 3GB log file, you’ll appreciate this little nicety! Ctrl-C stops the flood virtually instantly.

I will say that mosh is relatively new, and implements a different cryptography scheme than SSH does. As a result, you may not be comfortable running this across the open internet. Personally, I run it over OpenVPN only; while I have no reason to doubt its cryptography implementation, I tend to lean more toward a paranoid stance when it comes to new cryptographic systems.

Ideally this has been enlightening; SSH has a lot of strange options that allow for somewhat nifty behavior in the face of network issues, and mosh is a bit of a game-changer around this space as well.

December 16, 2013

Day 16 - omnibus'ing your way to happiness

Written by: John Vincent (@lusis)
Edited by: Ben Cotton (@funnelfiasco)

We've all been there.

You find this awesome Python library or some cool new utility that you want to run.

You check your distribution's package repository. It's not there or worse it's an ancient version.

You check EPEL or a PPA for it. Again, it's either not available or it's an ancient version. Oh and it comes broken out into 20 sub-packages so you can't just grab the package and install it. And you have to add a third-party repository which may have other stuff you don't want pulled in.

You try to compile it yourself and realize that your version of OpenSSL isn't supported or you need a newer version of Python.

You Google furiously to see if someone has posted a spec file or some blog entry on building it. None of it works. Four hours later you still don't have it installed. You look all around and see nothing but yak hair on the ground. Oh and there's a yak with a full coat grinning at you.

My friend have I got a deal for you. Omnibus!

What is omnibus?

Omnibus is a toolchain of sorts. It was created by Chef (formerly Opscode) as a way to provide a full install of everything needed to run their products in a single native system package. While you may not have a need to build Chef, Omnibus is flexible enough to build pretty much anything you can throw at it.

Omnibus can be confusing to get started with so consider this your guidebook to getting started and hopefully having more time to do the fun things in your job. Omnibus works by leveraging two tools you may already be using - Vagrant and FPM. While it's written in Ruby, you don't have to have Ruby installed (unless you're creating your own software to bundle) and you don't even have to have FPM installed. All you need is Vagrant and two vagrant plugins.

The omnibus workflow

The general flow of an omnibus build is as follows:
  • Check out an omnibus project (or create your own)
  • Run vagrant up or vagrant up <basebox name>
  • Go get coffee
  • Come back to a shiny new native system package of whatever it was omnibus was building
Under the covers while you're drinking your coffee, omnibus is going through a lot of machinations to get you that package (all inside the vagrant vm):
  • Installing Chef
  • Checking out a few chef cookbooks to run for prepping the system as a build host
  • Building ruby with with chef-solo and the rbenv cookbook.
  • installing omnibus and fpm via bundler
  • Using the newly built ruby to run the omnibus tool itself against your project.
From here Omnibus is compiling everything above libc on the distro it's running under from source and installing it into /opt/<project-name>/. This includes basics such as OpenSSL, zlib, libxml2, pcre and pretty much everything else you might need for your final package. Every step of the build process sets your LDFLAGS, CFLAGS and everything else to point to the project directory in /opt.

After everything is compiled and it thinks it's done, it runs a sanity check (by running ldd against everything in that directory) to ensure that nothing has linked against anything on the system itself.
If that check passes, it calls FPM to package up that directory under /opt into the actual package and drops it off in a subdirectory of your vagrant shared folder.

The added benefit here is that the contents of the libraries included in the system package are almost exactly the same across every distro omnibus builds your project against. Gone are the days of having to worry about what version of Python is installed on your distro. It will be the same one everywhere.

A sample project

For the purposes of this post, we're going to work with a use case that I would consider fairly common - a Python application.

You can check out the repo used for this here:

As I mentioned previously, the only thing you need installed is Vagrant and the following two plugins:
  • vagrant-omnibus
  • vagrant-berkshelf
If you have that done, you can simply change into the project directory and do a vagrant up. I would advise against that, however, as this project is configured to build packages for Ubuntu-10.04 through Ubuntu-12.04 as well as CentOS 5 and CentOS 6. Instead, I would run it against just a specific distribution such with vagrant up ubuntu-12.04.

Note that on my desktop (Quad-core i7/32GB of memory) this build took 13 minutes

While it's building, you should also check out the README in the project.

Let's also look at a few key files here.

The project configuration

The project definition file is the container that describes the software you're building. The project file is always located in the config/projects directory and is always named the same as the package you're building. This is important as Omnibus is pretty strict about names used aligning across the board.

Lets look at the definition for this project in config/projects/sample-python-app.rb. Here is an annotated version of that file:
annotated project def

The things you will largely be concerned with are the block of dependencies. Each of these corresponds to a file in one of two places (as noted):
  • a ruby file in <project root>/config/software
  • a file in the official omnibus-software repository on github (
This dependency resolution issue is important and we'll address it below.

A software definition

The software definition has a structure similar to the project definition. Not every dependency you have listed needs to live in your repository but if it is not there, it will be resolved from the master branch of the opscode repository as mentioned.

This can obviously affect the determinism of your build. It's a best practice to copy any dependencies explicitly into your repository to ensure that Chef doesn't introduce a breaking change upstream. This is as simple as copying the appropriate file from the official repository into your software directory.

We're going to gloss over that for this article since we're focusing on writing our own software definitions. If you look at the dependencies we've defined, you'll see a few towards the end that are "custom". Let's focus on pyopenssl first as that's one that is always a pain from distro to distro:

annotated software def

The reason I chose pyopenssl was not only because it's a common need but because of it's sensitivity to the version of OpenSSL it builds against.

This shows the real value of Omnibus. You are not forced to only use a specific version of pyopenssl that matches your distro's OpenSSL library. You can use the same version on ALL distros because they all link against the same version of OpenSSL which Omnibus has kindly built for you.
This also shows how you can take something that has an external library dependency and ensure that it links against the version in your package.

Let's look at another software definition - config/software/virtualenv.rb
Note that this is nothing more than a standard pip install with some custom options passed. We're ensuring we're calling the right version of pip by using the one in our package directory - install_dir matches up with the install_path value in config/projects/sample-python-app.rb which is /opt/sample-python-app.

Some other things to note:
  • the embedded directory This is a "standard" best practice in omnibus. The idea is that all the bits of your package are installed into /opt/package-name/embedded. This directory contains the normal [bin|sbin|lib|include] directory structure you're familiar with. The intent of this is to signal to end-users that the stuff under embedded is internal and not something you should ever need to touch.
  • passing --install-option="--install-scripts=#{install_dir}/bin to the python package This ensures that pip will install the library binaries into /opt/package-name/bin. This is the public-facing side of your package if you will. The bits/binaries that users actually need to call should either be installed in a top-level hierarchy under /opt/package-name or symlinked from a file in the embedded directory to the top-level directory. You'll see a bit of this in the post-install file your package will call below.


The final directory we want to look at is <project-root>/package-scripts/sample-python-app. These contain files that are passed to fpm as postinstall and postremove scripts for the package manager.
annotated postinstall

The biggest thing to note here is the chown. This is thing that bites folks the most. Since fpm simply creates tar files of directories, those files will always be owned by the user that runs fpm. With Omnibus, that's the vagrant user. What ends up happening is that your package will install but the files will all be owned by whatever uid/gid matches the one used to package the files. This isn't what you want. In this case we simply do a quick chown post-install to fix that.

As I mentioned above, we're also symlinking some of the embedded files into the top-level to signal to the user they're intended for use.

Installing the final artifact

At this point your build should be done and you should now have a pkg directory under your project root. Note that since this is using vagrant shared folders, you still haven't ever logged on to the actual system where the package is built. You don't even need to scp the file over.

If you want to "test" your package you can do the following: - vagrant destroy -f ubuntu-12.04 - vagrant up ubuntu-12.04 --no-provision - vagrant ssh ubuntu-12.04 -c 'PATH=/opt/sample-python-app/bin:\$PATH virtualenv --version'

Next steps

Obviously there's not always going to be an omnibus project waiting for everything you might need. The whole point of this is to create system packages of things that you need - maybe even your company's own application codebase.

If you want to make your own omnibus project, you'll need Ruby and the omnibus gem installed. This is only to create the skeleton project. Once you have those installed somewhere/anywhere, just run:
omnibus project <project-name>

This will create a skeleton project called omnibus-<project-name>. As I mentioned earlier, naming is important. The name you pass to the project command will define the package name and all values in the config/projects/ directory.

You'll likely want to customize the Vagrantfile a bit and you'll probably need to write some of your own software definitions. You'll have a few examples created for you but they're only marginally helpful. Your best bet is to remove them and learn from examples online in other repos. Also don't forget that there's a plethora of predefined software in the Opscode omnibus-software repository.
Using the above walk-through, you should be able to easily navigate any omnibus project you come across and leverage it to help you write your own project.

If I could give one word of advice here about the generated skeleton - ignore the generated README file. It's confusing and doesn't provide much information about how you should use omnibus. The vagrant process I've described above is the way to go. This is mentioned in the README but it's not the focus of it.

Going beyond vagrant

Obviously, this workflow is great for your internal development flow. There might come a time when you want to make this part of the official release process as part of a build step in something like Jenkins. You can do that as well but you'll probably not be using vagrant up for that.
You have a few options:
  • Duplicate the steps performed by the Vagrantfile on your build slaves:
  • install chef from opscode packages
  • use the omnibus cookbook to prep the system
  • run bundle install
  • run omnibus build
Obviously you can also prep some of this up front in your build slaves using prebuilt images - Use omnibus-omnibus-omnibus This will build you a system package with steps 1-3 above taken care of. You can then just clone your project and run /opt/omnibus/bin/omnibus build against it.

Getting help

There's not an official community around omnibus as of yet. The best place to get help right now is twitter (feel free to /cc @lusis), the #chef channel on Freenode IRC and the chef-users mailing list.
Here are a few links and sample projects that can hopefully also help: