December 24, 2019

Day 24 - Expanding on Infrastructure as Code

By: Wyatt Walter (@wyattwalter)
Edited by: Joshua Smith (@jcsmith)

Introduction

As operators thinking about infrastructure as Code, we often think of infrastructure as just the stuff that runs inside our data centers or cloud providers. I recently worked on a project that expanded my view of what I consider “infrastructure” and what things were within grasp of managing similarly to the way I manage cloud resources. In this post I want to inspire you expand your view of what infrastructure might be for your organization and give an example using Terraform to help give a more concrete view of what that could look like.

First, the example I’ll use is a workflow for managing GitHub repositories at an organization. There are tons of other services Terraform can manage (“providers” in Terraform terms), but this example is a service that is free to recreate if you want to experiment. Then, we’ll dig into why you’d even want to go through the trouble of setting something like this up. Lastly, I’ll leave you with some inspiration on other services or ideas on where this can be applied.

The example and source code are very contrived, but available here (link: https://github.com/sysadventco-2019/sysadventco-terraform).

An example using GitHub

At SysAdventCo, developers use GitHub as a tool for source code management. The GitHub organization for the company is managed by a central IT team. While the IT team did allow a few individuals throughout the company permission to create repositories or teams, some actions were only accessible to administrators. So, even though teams could modify or create some settings, the IT team often was a bottleneck because many individuals needed access to see or modify settings that they could not do on their own.

So, the IT team imported the configuration for their organization into Terraform and allowed anyone in the organization to view it and submit pull requests to make changes. Their role has shifted from taking in tickets to modify settings (which often had multiple rounds of back-and-forth to ensure correctness) and manually making changes to simply being able to approve pull requests. In the pull requests, they can see exactly what is being asked for and receive validation through CI systems what the exact impact of that change would be.

A stripped down version of the configuration looks something like this:

# We define a couple of variables we can pass via environment variables.
variable "github_token" {
  type = string
}

variable "github_organization" {
  type = string
}

# Include the GitHub provider, set some basics
# for the example, set these with environment variables:
# TF_github_token=asdf TF_github_organization=sysadventco terraform plan
provider "github" {
  token        = var.github_token
  organization = var.github_organization
}

# This one is a bit meta: the definition for this repository
resource "github_repository" "sysadventco-terraform" {
  name               = "sysadventco-terraform"
  description        = "example Terraform source for managing the example-service repository"
  homepage_url       = "https://sysadvent.blogspot.com"
  gitignore_template = "Terraform"
}

SysAdventCo operates a number of services. The one we'll focus on is example-service. It's a Rails application, and has its own entry in the configuration:

resource "github_repository" "example-service" {
  name               = "example-service"
  description        = "the source code for example-service"
  homepage_url       = "https://sysadvent.blogspot.com/"
  gitignore_template = "Rails"
}

The team that builds and operates example-service wants to integrate a new tool into their testing processes that requires an additional webhook. In some organizations, a member of the team may have access to edit that directly. In others, maybe they have to find a GitHub administrator to ask them for help. In either case, only those who have access to change the settings can even see how the webhooks are configured. Luckily, things work a bit differently at SysAdventCo.

The developer working on example-service already has access to see what webhooks are configured for this repository. She is ready to start testing the new service, so she submits a small PR (link: https://github.com/sysadventco-2019/sysadventco-terraform/pull/2):

+
+resource "github_repository_webhook" "example-service-new-hook" {
+  repository = github_repository.example-service.name
+
+  configuration {
+    url          = "https://web.hook.com/"
+    content_type = "form"
+    insecure_ssl = false
+  }
+
+  active = false
+
+  events = ["issues"]
+}

The system then automatically creates a comment with exactly what actions Terraform would do if this were approved for a member of the IT team to review and collaborate with the developer requesting the change.

No one is stuck filling out a form or ticket trying to explain what is needed with words that get interpreted into manual actions. They have simply updated the configuration themselves which is automatically validated and a comment is added with the exact details of the repository which would change as a result of this request. Once the pull request is approved and merged, it is automatically applied.

This seems like a lot of work, why bother?

What an astute observation, dear reader! Yes, there is a good deal of setup involved once you get past this simple example. And yes, managing more automatically can often be more work. In addition, if your organization already exists but doesn’t use something like this method already, you probably have a good deal of configuration to import into the tool of your choice. I’d argue that there are a number of reasons that you would want to consider using a tool like this to manage tools that aren’t strictly servers, firewall rules, etc.

First, and what is the first thing I reached for, is that we can track changes the same way we do for other things in the delivery pipeline while also ensuring consistency. For me on my project, importing configuration of a PagerDuty account into management by Terraform allowed me to see inconsistencies in the manually configured service. While the tool added value, a huge part of the value was the simple act of doing the import and having a tool that enforced consistency. I caught a number of things that could’ve misrouted alerts if conditions were right before they became issues.

The next and most compelling reasons to me are in freeing up administrative time and giving teams the freedom to affect changes directly without creating a free-for-all situation. You can restrict administrative access to a very small number of people (or just a bot) without creating a huge bottleneck. It also allows anyone without elevated privileges to confirm settings without having to ask someone else. I’d also argue this creates an excellent model for the basis of a change control process for organizations that require or have them as well.

A further advantage is that, since none of these tools exist in isolation, using a method like this can give you an opportunity to reference configuration dynamically. This allows your team to spin up full environments to test configuration end-to-end.

But wait, there’s more!

Within the Terraform world, there’s an entire world of providers out there just waiting for you to explore! Imagine using the same tools you use to manage AWS or GCP resources that often are linked to other important things your team uses:

  • Manage your on-call rotations, escalation paths, routing decisions, and more with the PagerDuty provider
  • Manage the application list and alerts in NewRelic
  • Add external HTTP monitoring using tools like Statuscake or Pingdom

No comments :