December 25, 2017

Day 25 - How to choose a data store for the new shiny thing

By: Silvia Botros (@dbsmasher)

Edited By: Kirstin Slevin (@andersonkirstin)

Databases can be hard. You know what’s harder? Choosing one in the first place. This is challenging whether you are in a new company that is still finding its product/market fit or in a company that has found its audience and is simply expanding the product offering. When building a new thing, one of the very first parts of that design process is what data stores should we use and should that be a single or a plural? Should we use relational stores or do we need to pick a key value store? What about time options? Should we also sprinkle in some distributed log replay? So. Many. Options…

I will try in this article to describe a process that will hopefully guide that decision and, where applicable, explain how the size and maturity of your organization can impact this decision.

Baseline requirements

Data is the lifeblood of any product. Even if we’re planning in the design to use more bleeding edge technology to store the application state (because MySQL or Postgres aren’t “cool” anymore), whatever we choose is still a data store and hence requires that we apply rigor when making our selection. The important thing to remember is that nothing is for free. All data stores come with compromises and if you are not being explicit about what compromises you are taking as a business risk, you will be taking unknown risk that will show itself at the worst possible time.

Your product manager is unlikely to know or even need to care what you use for your data store but they will drive the needs that shrink the option list. Sometimes even that needs some nudging by the development team, though. Here is a list of things you need to ask the product team to help drive your options:

  • Growth rate - How is the data itself or the access to it expected to change over time?
  • How will the billing team use this new data?
  • How will the ETL team use this data?
  • What accuracy/consistency requirements are expected for this new feature?
    • What time span for that consistency is acceptable? is post processing correction acceptable?

Find the context that is not said

The choice of the data store is not a choice reserved for the DBA or the Ops team or even just the engineer writing the code. For an already mature organization with a known addressable market, the requirements that feed this decision need to come from across the organization. If the requirements from the product team fit a dozen data stores, how do you determine requirements not explicitly called out? You need to surface unspoken requirements as soon as possible because it is the road to failed expectations down the line. A lot of implied things can make you fail in this ‘too many choices’ trap. This includes but is not limited to:

  • Incomplete feature lists
  • Performance requirements that are not explicitly listed
  • Consistency needs that are assumed
  • Growth rate that is not specified
  • Billing or ETL query needs that aren’t yet available/known

These are all possible ways that can leave an engineering team spinning their wheels too long vetting a long list of data store choices simply because the explicit criteria they are working with are too permissive or incomplete.

For more ‘greenfield’ products, as i mentioned before, your goal is flexibility. So a more general purpose, known quality, data store will help you to get closer to a deliverable, with the knowledge that down the line, you may need to move to a datastore that is more amenable to your new scale.

Make your list

It is time to filter potential solutions by your list of requirements. The resulting list needs to be not more than a handful of possible data stores. If the list of potential databases you can use is more than that then your requirements are too permissive and you need to go back and find out more information.

For younger, less mature companies, data store requirements is the area of the most unknowns. You are possibly building a new thing that no one offers just yet and so things like total addressable market size and growth rate may be relatively unknown and hard to quantify. In this case what you need is to not constrain yourself too early in the lifetime of your new company by using a one trick pony datastore. Yes, at some point your data will grow in new and unexpected ways but what you need right now is flexibility as you try to find your market niche and learn what the growth of your data will look like and what specific scalability features will become crucial to your growth.

If you are a larger company with a growing number of paying customers, your task here is to shrink the option list to preferably data stores you already have and maintain. When you already have a lot of paying customers, the risk of adding new data stores that your team is not familiar with becomes higher and, depending on the context of the data, simply unacceptable. Another thing to keep in mind is what tooling already exists for data stores and what would adopting a new one mean as far as up front work your team has to do. Configuration management, backup scripts, data recovery scripts, new monitoring checks, new dashboards to build and get familiar with. The list of operational cost of a new data store, regardless of risk, is not trivial.

Choose your poison

So here is a badly kept secret that DBAs hold on to. Databases are all terrible at something. There is even a whole theorem about that. Not just databases in the traditional sense, but any tech that stores state will be horrible in a way unique to how you use it. That is just a fact of life that you better internalize now. No, I am not saying you should avoid using any of these technologies, I am saying keep your expectations sane and know that YOU and only you and your team ultimately own delivering on the promises you make.

What does this mean in non abstract terms? Once you have a solid idea what data stores are going to be part of what you are building, you should start by knowing the weaknesses of these data stores. These weaknesses include but are not limited to:

  • Does this datastore work well under scan queries?
  • Does this datastore rely on a gossip protocol for data replication? if so, how does it handle network partitions? How much data is involved in that gossip?
  • Does this datastore have a single point of failure?
  • How mature are the drivers in the community to talk to it or do you need to roll your own?
  • This list can be huge

Thinking through the weaknesses of the potential solutions still on your list should knock more options off the list. This is now reality meeting the lofty promises of tech.

Spreadsheet and Bake off!

Once your list of choices is down to a small handful, it is time to put them all in a spreadsheet and start digging a little deeper. You need a pros column and a cons column and at this point, you will need to spend some time in each database documentation to find out nitty gritty details on how to do certain tasks. If this is data you expect to have a large growth rate, you need to know which of these options is easier to scale out. If this is a feature that does a lot of fuzzy search, you need to know which datastore can handle scans or searching through a large number of rows better and with what design. The target at this stage is to whittle down the list to ideally 2 or 3 options via documentation alone because if this new feature is critical enough to the company success, you will have to benchmark all three.

Why benchmark you say? Because no 2 companies use the same datastore the same way. Because sometimes documentation implies caveats that only gets exposed in other people’s war stories. Because no one owns the stability, the reliability and the predictability of this datastore but you.

Design your benchmark in advance. Ideally, you set up a full instance of the datastores in your list with production level specifications and produce test data that is not too small to make load testing useless. Make sure to not only benchmark for ‘normal load’ but also to test out some failure scenarios. The hope is that through the benchmark, you can find out any caveats that are severe enough to cause you to revisit the option list now instead of later when all the code is written and you are now at the fire drill phase with a lot of time and effort committed to the choice you made.

Document your choice

No matter what you do, you must document and broadcast internally the method by which you reached your choice and the alternatives that were investigated on the route to that decision. Presuming there is an overarching architecture blueprint of how this new feature will be created and all its components, you make sure to create a section dedicated to the datastore powering this new feature with links to all the benchmarks done to reach the decision the team came to. This is not just for the benefit of future new hires but also for your team’s benefit in the present. A document that people can asynchronously read and develop opinions on provides a way to keep the decision process transparent, grow a sense of best intent among the team members and can bring in criticism from perspectives you didn’t foresee.

Wrap up

These steps are not only going to lead to data-informed decisions when growing the business offering, but will also lead to a robust infrastructure and a more disciplined approach to when and where you use an ever growing field of technologies to provide value to your paying customers.

December 24, 2017

Day 24 - On-premise Kubernetes with dynamic load balancing using rke, Helm and NGINX

By: Sebastiaan van Steenis (@svsteenis)

Edited By: Mike Ciavarella (@mxcia)

Containers are a great solution for consistent software deployments. When you start using containers in your environment, you and your team will quickly realise that you need a system which allows you to automate container operations. You need a system to keep your containers running when stuff breaks (which always happens, expect failure!), be able to scale up and down, and which is also extensible, so you can interact with it or built upon it to get the functionality you need. The most popular system for deploying, scaling and managing containerized applications is Kubernetes.

Kubernetes is a great piece of software. It includes all the functionality you'll initially need to deploy, scale and operate your containers, as well as more advanced options for customising exactly how your containers are managed. A number of companies provide managed Kubernetes-as-a-service, but there are still plenty of use-cases that need to run on bare metal to meet regulatory requirements, use existing investments in hardware, or for other reasons. In this post we will use Rancher Kubernetes Engine (rke) to deploy a Kubernetes cluster on any machine you prefer, install the NGINX ingress controller, and setup dynamic load balancing across containers, using that NGINX ingress controller.

Setting up Kubernetes

Let's briefly go through the Kubernetes components before we deploy them. You can use the picture below for visualisation. Thanks to Lucas Käldström for creating this (@kubernetesonarm), used in his presentation on KubeCon.

Using rke, we can define 3 roles for our hosts:

  • control (controlplane)

    The controlplane consists of all the master components. In rke the etcd role is specified separatately but can be placed on the same host as the controlplane. The API server is the frontend to your cluster, handling the API requests you run (for example, through the Kubernetes CLI client kubectl which we talk about later). The controlplane also runs the controller manager, which is responsible for running controllers that execute routine tasks.

  • etcd

    The key-value store and the only component which has state, hence the term SSOT in the picture (Single Source of Truth). etcd needs quorum to operate, you can calculate quorum by using (n/2)+1 where n is the amount of members (which are usually hosts). This means for a production deployment, you would deploy at least 3 hosts with the etcd role. etcd will continue to function as long as it has quorum, so with 3 hosts with the etcd role you can have one host fail before you get in real trouble. Also make sure you have a backup plan for etcd.

  • worker

    A host with the worker role will be used to run the actual workloads. It will run the kubelet, which is basically the Kubernetes agent on a host. As one of its activities, kubelet will process the requested workload(s) for that host. Each worker will also run kube-proxy which is responsible for the networking rules and port forwarding. The container runtime we are using is Docker, and for this setup we'll be using the Flannel CNI plugin to handle the networking between all the deployed services on your cluster. Flannel will create an overlay network between the hosts, so that deployed containers can talk to each other.

For more information on Kubernetes components, see the Kubernetes documentation.

For this setup we'll be using 3 hosts, 1 host will be used as controlplane (master) and etcd (persistent data store) node, 1 will be used as worker for running containers and 1 host will be used as worker and loadbalancer entrypoint for your cluster.

Hosts

You need at least OpenSSH server 7 installed on your host, so rke can use it to tunnel to the Docker socket. Please note that there is a known issue when connecting as the root user on RHEL/CentOS based systems, you should use an other user on these systems.

SSH key authentication will be used to setup an SSH tunnel to the Docker socket, to launch the needed containers for Kubernetes to function. Tutorials how to set this up, can be found for Linux and Windows

Make sure you have either swap disabled on the host, or configure the following in cluster.yml for kubelet (we will generate cluster.yml in the next step)

kubelet:
  image: rancher/k8s:v1.8.3-rancher2
  extra_args: {"fail-swap-on":"false"}
Docker

The hosts need to run Linux and use Docker version 1.12.6, 1.13.1 or 17.03.2. These are the Docker versions that are validated for Kubernetes 1.8, which we will be deploying. For easy installation of Docker, Rancher provides shell scripts to install a specific Docker version. For this setup we will be using 17.03.2 which you can install using (for other versions, see https://github.com/rancher/install-docker):

curl https://releases.rancher.com/install-docker/17.03.2.sh | sudo sh

If you are not using the root user to connect to the host, make sure the user you are using can access the Docker socket (/var/run/docker.sock) on the host. This can be achieved by adding the user to the docker group (e.g. by using sudo usermod -aG docker your_username). For complete instructions, see the Docker documentation.

Networking

The network ports that will be used by rke are port 22 (to all hosts, for SSH) and port 6443 (to the master node, Kubernetes API).

rke

Note: in the examples we are using rke_darwin-amd64, which is the binary for MacOS. If you are using Linux, replace that with rke_linux-amd64.

Before we can use rke, we need to get the latest rke release, at the moment this is v0.0.8-dev. Download rke v0.0.8-dev from the GitHub release page, and place in a rke directory. We will be using this directory to create the cluster configuration file cluster.yml. Open a terminal, make sure you are in your rke directory (or that rke_darwin-amd64 is in your $PATH), and run ./rke_darwin-amd64 config. Pay close attention to specifying the correct SSH Private Key Path and the SSH User of host:

$ ./rke_darwin-amd64 config
Cluster Level SSH Private Key Path [~/.ssh/id_rsa]:
Number of Hosts [3]: 3
SSH Address of host (1) [none]: IP_MASTER_HOST
SSH Private Key Path of host (IP_MASTER_HOST) [none]:
SSH Private Key of host (IP_MASTER_HOST) [none]:
SSH User of host (IP_MASTER_HOST) [ubuntu]: root
Is host (IP_MASTER_HOST) a control host (y/n)? [y]: y
Is host (IP_MASTER_HOST) a worker host (y/n)? [n]: n
Is host (IP_MASTER_HOST) an Etcd host (y/n)? [n]: y
Override Hostname of host (IP_MASTER_HOST) [none]:
Internal IP of host (IP_MASTER_HOST) [none]:
Docker socket path on host (IP_MASTER_HOST) [/var/run/docker.sock]:
SSH Address of host (2) [none]: IP_WORKER_HOST
SSH Private Key Path of host (IP_WORKER_HOST) [none]:
SSH Private Key of host (IP_WORKER_HOST) [none]:
SSH User of host (IP_WORKER_HOST) [ubuntu]: root
Is host (IP_WORKER_HOST) a control host (y/n)? [y]: n
Is host (IP_WORKER_HOST) a worker host (y/n)? [n]: y
Is host (IP_WORKER_HOST) an Etcd host (y/n)? [n]: n
Override Hostname of host (IP_WORKER_HOST) [none]:
Internal IP of host (IP_WORKER_HOST) [none]:
Docker socket path on host (IP_WORKER_HOST) [/var/run/docker.sock]:
SSH Address of host (3) [none]: IP_WORKER_LB_HOST
SSH Private Key Path of host (IP_WORKER_LB_HOST) [none]:
SSH Private Key of host (IP_WORKER_LB_HOST) [none]:
SSH User of host (IP_WORKER_LB_HOST) [ubuntu]: root
Is host (IP_WORKER_LB_HOST) a control host (y/n)? [y]: n
Is host (IP_WORKER_LB_HOST) a worker host (y/n)? [n]: y
Is host (IP_WORKER_LB_HOST) an Etcd host (y/n)? [n]: n
Override Hostname of host (IP_WORKER_LB_HOST) [none]:
Internal IP of host (IP_WORKER_LB_HOST) [none]:
Docker socket path on host (IP_WORKER_LB_HOST) [/var/run/docker.sock]:
Network Plugin Type [flannel]:
Authentication Strategy [x509]:
Etcd Docker Image [quay.io/coreos/etcd:latest]:
Kubernetes Docker image [rancher/k8s:v1.8.3-rancher2]:
Cluster domain [cluster.local]:
Service Cluster IP Range [10.233.0.0/18]:
Cluster Network CIDR [10.233.64.0/18]:
Cluster DNS Service IP [10.233.0.3]:
Infra Container image [gcr.io/google_containers/pause-amd64:3.0]:

This will generate a cluster.yml file, which can be used by rke to setup the cluster. By default, Flannel is used as CNI network plugin. To secure the Kubernetes components, rke generates certificates and configures the Kubernetes components to use the created certificates.

You can always check or edit the file (cluster.yml) if you made a typo or used the wrong IP address somewhere.

We are now ready to let rke create the cluster for us (specifying --config is only necessary when cluster.yml is not present in the same directory where you are running the rke command)

$ ./rke_darwin-amd64 up --config cluster.yml
INFO[0000] Building Kubernetes cluster
INFO[0000] [ssh] Setup tunnel for host [IP_MASTER_HOST]
INFO[0000] [ssh] Setup tunnel for host [IP_MASTER_HOST]
INFO[0000] [ssh] Setup tunnel for host [IP_WORKER_HOST]
INFO[0001] [ssh] Setup tunnel for host [IP_WORKER_LB_HOST]
INFO[0001] [certificates] Generating kubernetes certificates
INFO[0001] [certificates] Generating CA kubernetes certificates
INFO[0002] [certificates] Generating Kubernetes API server certificates
INFO[0002] [certificates] Generating Kube Controller certificates
INFO[0002] [certificates] Generating Kube Scheduler certificates
INFO[0002] [certificates] Generating Kube Proxy certificates
INFO[0003] [certificates] Generating Node certificate
INFO[0003] [certificates] Generating admin certificates and kubeconfig
INFO[0003] [reconcile] Reconciling cluster state
INFO[0003] [reconcile] This is newly generated cluster
...
INFO[0263] Finished building Kubernetes cluster successfully

All done! Your Kubernetes cluster is up and running in under 5 minutes, and most of that time was spent on pulling the needed Docker images.

kubectl

The most common way to interact with Kubernetes is using kubectl. After the cluster has been setup, rke generates a ready-to-use configuration file which you can use with kubectl, called .kube_config_cluster.yml. Before we can use the file, you will need to install kubectl. Please refer to Kubernetes documentation on how to do this for your operating system.

Note: the Kubernetes documentation helps you to place the downloaded binary in a directory in your $PATH. The following commands are based on having kubectl in your PATH.

When you have kubectl installed, make sure you execute the comand in the rke directory (because we point to .kube_config_cluster.yml in that directory).

Now you can check the cluster by getting the node status:

$ kubectl --kubeconfig .kube_config_cluster.yml get nodes --show-labels
NAME              STATUS    ROLES         AGE       VERSION           LABELS
IP_MASTER_HOST   Ready     etcd,master   5m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_MASTER_HOST,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true
IP_WORKER_HOST     Ready     worker        5m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_WORKER_HOST,node-role.kubernetes.io/worker=true
IP_WORKER_LB_HOST     Ready     worker        5m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_WORKER_LB_HOST,node-role.kubernetes.io/worker=true

Note: as reference to each node, we will be using IP_MASTER_HOST, IP_WORKER_HOST and IP_WORKER_LB_HOST to identify respectively the master, worker, and the worker functioning as entrypoint (loadbalancer)

Three node cluster ready to run some containers. In the beginning I noted that we are going to use one worker node as loadbalancer, but at this point we can't differentiate both worker nodes. We need to use a host with the role worker. Let's make that possible by adding a label to that node:

$ kubectl --kubeconfig .kube_config_cluster.yml \
  label nodes IP_WORKER_LB_HOST role=loadbalancer
node "IP_WORKER_LB_HOST" labeled

Great, let's check if it was applied correctly:

$ kubectl --kubeconfig .kube_config_cluster.yml get nodes --show-labels
NAME              STATUS    ROLES         AGE       VERSION           LABELS
IP_MASTER_HOST   Ready     etcd,master   6m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_MASTER_HOST,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true
IP_WORKER_HOST     Ready     worker        6m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_WORKER_HOST,node-role.kubernetes.io/worker=true
IP_WORKER_LB_HOST     Ready     worker        6m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_WORKER_LB_HOST,node-role.kubernetes.io/worker=true,role=loadbalancer

Note: If you mistakenly applied the label to the wrong host, you can remove it by adding a minus to the end of the label (e.g. kubectl --kubeconfig .kube_config_cluster.yml label nodes IP_WORKER_LB_HOST role=loadbalancer-)

Install and configure NGINX ingress controller

Helm

Helm is the package manager for Kubernetes, and allows you to easily install applications to your cluster. Helm uses charts to deploy applications; a chart is a collection of files that describe a related set of Kubernetes resources. Helm needs two components: a client (helm) and a server (tiller). Helm binaries are provided for all major platforms, download one and make sure it's available on your commandline (move it to a location in your $PATH). When installed correctly, you should be able to run helm help from the command line.

We bootstrap Helm by using the helm client to install tiller to the cluster. The helm command can use the same Kubernetes configuration file generated by rke. We tell helm which configuration to use by setting the KUBECONFIG environment variable as shown below:

$ cd rke
$ KUBECONFIG=.kube_config_cluster.yml helm init
Creating /homedirectory/username/.helm
Creating /homedirectory/username/.helm/repository
Creating /homedirectory/username/.helm/repository/cache
Creating /homedirectory/username/.helm/repository/local
Creating /homedirectory/username/.helm/plugins
Creating /homedirectory/username/.helm/starters
Creating /homedirectory/username/.helm/cache/archive
Creating /homedirectory/username/.helm/repository/repositories.yaml
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /homedirectory/username/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Happy Helming!

Assuming all went well, we can now check if Tiller is running by asking for the running version. Server should return a version here, as it will query the server side component (Tiller). It may take a minute to get Tiller started.

$ KUBECONFIG=.kube_config_cluster.yml helm version    
Client: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
A little bit on Pods, Services and Service Types

Services enable us to use service discovery within a Kubernetes cluster. Services allow us to use abstraction for one or more pods in your cluster. What is a pod? A pod is a set of one or more containers (usually Docker containers), with shared networking and storage. If you run a pod in your cluster, you usually would end up having two problems:

  • Scale: When running a single pod, you don't have any redundancy. You want to use a mechanism which ensures to run a given amount of pods, and be able to scale if needed. We will talk more on this when we are going to deploy our demo application later on.
  • Accessibility: What pods do you need to reach? (One static pod on one host is reachable, but how about scaling up and down, rescheduled pods?) and what IP address/name do you use the access the pod(s)?

By default, a Service will have the service type of ClusterIP. Which means it gets an internally accesible IP, which you can use to access your pods. The way the service knows what pods to target is by using a Label Selector. This will tell the Service to look for what labels on the pod to target.

Other service types are:

  • NodePort: expose the service on every host's IP on a selected port or randomly selected from the configured NodePort range (default: 30000-32767)
  • LoadBalancer: If a cloud provider is configured, this will request a loadbalancer from that cloudprovider and configure it as entrypoint. Cloud providers include AWS, Azure, GCE among others.
  • ExternalName: This makes it possible to configure a service to route to a predefined name outside the cluster by using a CNAME record in DNS.
Installing NGINX ingress controller

As the NGINX ingress controller meets all of the criteria of the technical requirements, it resides in the stable directory of Helm charts. As noted before, we labeled one node as our point of entry by applying the role=loadbalancer label to that node. We'll be using that label to pass onto the Helm chart and let the NGINX ingress controller get placed on the correct node. By default, the NGINX ingress controller gets created as service type LoadBalancer. Because we are assuming that you are running on-premise, this will not work. Service LoadBalancer will provision a loadbalancer from the configured cloud provider, which we didn't configure and is usually not available on a on-premise setup. Because of this we will set the service type to ClusterIP using the --set controller.service.type=ClusterIP argument. Secondly, because we don't have a external loadbalancer to get access to the services, we will configure the controller to use host networking. This way, the NGINX ingress controller will be reachable on the IP of the host. You can do so by setting controller.hostNetwork to true.

*NOTE: Another option is to use NodePort , which will use a port from the cluster defined range (30000-32767). You can use an external loadbalancer to loadbalance to this port on the node. For the simplicity of this post, I went for hostNetwork.*

$ KUBECONFIG=.kube_config_cluster.yml helm install stable/nginx-ingress \
--name nginx-ingress --set controller.nodeSelector."role"=loadbalancer --set controller.service.type=ClusterIP --set controller.hostNetwork=true

Run the following command to see if the deployment was successful, we should see

$ kubectl --kubeconfig .kube_config_cluster.yml rollout \
  status deploy/nginx-ingress-nginx-ingress-controller
deployment "nginx-ingress-nginx-ingress-controller" successfully rolled out

By default, the NGINX ingress controller chart will also deploy a default backend which will return default backend - 404 when no hostname was matched to a service. Let's test if the default backend was deployed successfully:

# First we get the loadbalancer IP (IP of the host running the NGINX ingress controller) and save it to variable $LOADBALANCERIP
$ LOADBALANCERIP=`kubectl --kubeconfig .kube_config_cluster.yml get node -l role=loadbalancer -o jsonpath={.items[*].status.addresses[?\(@.type==\"InternalIP\"\)].address}`
# Now we can curl that IP to see if we get the correct response
$ curl $LOADBALANCERIP
default backend - 404

Excellent, we reached the NGINX ingress controller. As there are no services defined, we get routed to the default backend which returns a 404.

Setup wildcard DNS entry

For this post, I decided to make a single host the entrypoint to the cluster. We applied the label role=loadbalancer to this host, and used it to schedule the deploy of the NGINX ingress controller. Now you can point a wildcard DNS record (*.kubernetes.yourdomain.com for example), to this IP. This will make sure that the hostname we will use for our demo application, will end it up on the host running the NGINX ingress controller (our designated entrypoint). In DNS terminology this would be (in this example, $LOADBALANCER is 10.10.10.10)

*.kubernetes IN A 10.10.10.10

With this configured, you can try reaching the default backend by running the curl to a host which resides under this wildcard record, i.e. dummy.kubernetes.yourdomain.com.

$ curl dummy.kubernetes.yourdomain.com
default backend - 404

Running and accessing the demo application

A little bit on ReplicaSet, Deployment and Ingress

Before we deploy our demo application, some explanation is needed on the terminology. Earlier, we talked about Services and services types to provide access to your pod/group of pods. And that running pods alone is not a failure-tolerant way of running your workload. To make this better, we can use a ReplicaSet. The basic functionality of a ReplicaSet is to run a specified number of pods. This would solve our problem of running single pods.

From the Kubernetes documentation:

While ReplicaSets can be used independently, today it’s mainly used by Deployments as a mechanism to orchestrate pod creation, deletion and updates. When you use Deployments you don’t have to worry about managing the ReplicaSets that they create. Deployments own and manage their ReplicaSets.

Deployments give us some other nice benefits, like checking the rollout status using kubectl rollout status. We will be using this when we deploy our demo application.

Last but not least, the Ingress. Usually, the components in your cluster will be for internal use. The components need to reach each other (web application, key value store, database) using the cluster network. But sometimes, you want to reach the cluster services from the outside (like our demo application). To make this possible, you need to deploy an Ingress definition. Deploying an Ingress definition without using an Ingress controller will give you limited functionality, that's why we deployed the NGINX ingress controller. By using the following key value under annotations, we make sure the NGINX ingress controller picks up our Ingress definition: kubernetes.io/ingress.class: "nginx"

Deploy demo application

For this post, we are using a simple web application. When you visit this web application, the UI will show you every container serving requests for this web application.

Let's create the files necessary to deploy our application, we'll be using a Deployment to create a ReplicaSet with 2 replicas and a Service to link our ingress to. Save to following as docker-demo.yml in the rke directory.

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: docker-demo-deployment
spec:
  selector:
    matchLabels:
      app: docker-demo
  replicas: 2
  template:
    metadata:
      labels:
        app: docker-demo
    spec:
      containers:
      - name: docker-demo
        image: ehazlett/docker-demo
        ports:
        - containerPort: 8080

---

apiVersion: v1
kind: Service
metadata:
  name: docker-demo-svc
spec:
  ports:
  - port: 8080
    targetPort: 8080
    protocol: TCP
  selector:
    app: docker-demo

Let's deploy this using kubectl:

$ kubectl --kubeconfig .kube_config_cluster.yml create -f docker-demo.yml
deployment "docker-demo-deployment" created
service "docker-demo-svc" created

Again, like in the previous deployment, we can query the deployment for its rollout status:

$ kubectl --kubeconfig .kube_config_cluster.yml rollout \
  status deploy/docker-demo-deployment
deployment "docker-demo-deployment" successfully rolled out

With this running, the web application is now accessible within the cluster. This is great when you need to connect web applications with backends like key-value stores, databases etcetera. For now, we just want this web application to be available through our loadbalancer. As we've already deployed the NGINX ingress controller before, we can now make our application accessible by using an Ingress resource. Let's create the ingress.yml file:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: docker-demo-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: docker-demo.kubernetes.yourdomain.com
    http:
      paths:
      - path: /
        backend:
          serviceName: docker-demo-svc
          servicePort: 8080

This is a fairly standard Ingress definition, we define a name and rules to access an application. We define the host that should be matched docker-demo.kubernetes.yourdomain.com, and what path should route to what backend service on what port. The annotation kubernetes.io/ingress.class: "nginx" tells the NGINX ingress controller that this Ingress resource should be processed. When this Ingress is created, the NGINX ingress controller will see it, process the rules (in this case, create a "vhost" and point it the backend/upstream to the created pods in the ReplicaSet created by the Deployment ). This means, after creation, you should be able to reach this web application on http://docker-demo.kubernetes.yourdomain.com. Let's launch the ingress and find out:

$ kubectl --kubeconfig .kube_config_cluster.yml create -f ingress.yml
ingress "docker-demo-ingress" created

Check out the web application on http://docker-demo.kubernetes.yourdomain.com

Wrapping up

Is this a full blown production setup? No. Keep in mind that it will need some work, but hopefully you have gained a basic understanding of some of the core parts of Kubernetes, how to deploy Kubernetes using rke, how to use Helm and the basics of the NGINX ingress controller. Let me give you some resources to continue the journey:

  • Try to scale your deployment to show more containers in the web application (kubectl scale -h)

  • rke supports HA, you can (and should) deploy multiple hosts with the controlplane and/or the etcd role.

  • Take a look at all the options of the NGINX ingress controller, see if it suits your needs

  • Explore how easy it is to use Let's Encrypt certificates on your ingresses by setting an extra annotation using kube-lego.

  • The NGINX ingress controller is a single point of failure (SPOF) now, explore how you can make this go away. Most companies use some kind of external loadbalancer which you could use for this.

  • Keep an eye on the Kubernetes Incubator Project external-dns, which can automatically create DNS records in supported providers.

  • To gain a deeper understanding of all the Kubernetes components, check out Kubernetes The Hard Way.

December 23, 2017

Day 23 - Open Source Licensing in the Real World

By: Carl Perry (@edolnx)
Edited By: Amy Tobey (@AlTobey)

Before we get started, I need to say a couple of things. I am not a lawyer. What I am sharing should not be considered legal advice. The idea of this article is to share my experiences to date and provide you resources you should use to start conversations with your lawyer, your boss, and your company’s legal team. You should also know that my experiences are primarily based on laws in the United States of America, the State of Texas, and the State of California. Your governmental structure may be different, but most of what I am going to talk about should be at least partially applicable no matter where you are. Now, with that out of the way, let’s get started!

Open Source vs Free Software

I don’t want to spend a lot of time getting into a philosophical debate about Free Software vs Open Source. So, I’m going to define how I use those terms for this article. I refer to Free Software as things that are either coming from the Free Software Foundation (FSF), or using an FSF license. Anything where the source code is made publicly available I am referring to Open Source. Yes, this means to me FSF projects are Open Source. They also mean things like “shared source” are open source. But the important part is the license, which is what we are here to talk about.

The mixing of two worlds that don’t always see eye to eye

Capitalism and Open Source Software don’t always mix well. There are exceptions, but most companies see software no different than a brick: cheap, replaceable, frequently made by an outside provider, durable, and invisible (until it fails or is missing). We know this is not the case, and that is where the problems begin. Many corporate lawyers treat software identically to any other corporate purchase: a thing with a perpetual license that places all liability on the provider. We know that’s not true either, especially with Open Source. That means education is super important.

Everything starts with copyright

Copyright has been around for a long time, and the Berne Convention of 1989 really formed the basis for copyright we know today. Many countries ratified the treaty, and others still are forced to follow it using various treaties. The basics are that anything is automatically copywritten by the “author” for a minimum of 50 years. No filing is required. When something is copy-written then all rights are reserved by the “author”. This is the crux of the problem. That means you cannot copy, reuse, integrate, or otherwise manipulate the work at all. This is where licensing comes in.

Licensing to the rescue?

Licensing a work is a contract that allows for the “author” to grant rights to other parties, typically with restrictions. Many of the commercial software licenses simply allow use and redistribution of the work within an organization in binary form only in exchange for lack of warranty and lack of liability. Open source and free software licenses work quite a bit differently. Back in the early days of computing, even commercial software included the source code. Failing to include the source code usually meant you had something to hide, until some companies decided that source code was intellectual property and not to be shared. But, enough history for now, let’s get into open source and free software licenese and what they do.

Free Software licenses

The most Free Software licenses are the GNU Public License and its derivatives (GPLv2, GPLv3, LGPLv2, LGPLv3, AGPLv3, et cetera). These are all based on a very clever legal hack called “Copy Left”: the idea is to use a license to enforce the exact opposite of all rights reserved. Instead, the rights of access to source, ability to modify the source, the ability to redistribute, the inability to use as part of a closed source project, and the inability to charge money for the software. There are exceptions, in that you can charge for distribution costs for example, but that’s pretty much it. It’s important to realize that this is a one way street; things can start licensed by some other license but once it becomes GPL it cannot go back.

How businesses see the Free Software licenses

Typically, business do not have a favorable view of the GPL and it’s derivatives. Part of the reason is patents (the clauses are vaguely worded at best), but the biggest reason is the perception of “losing IP” and “competitive advantage”. Because you have to release all your changes, it can make life difficult. Great examples of this conflict are GPU and SOC drivers. There is a lot of proprietary tech in those but you have to make all the code that controls it open to the public.

There are bounding boxes, however. Any code that is derived from a Free Software based product must use the same or compatible license. However, if your code is not based on a Free Software base, vendors often choose to make a Free Software shim that converts calls from the Free Software project (like the Linux kernel) to vendor proprietary binaries or code under a difference license. This is how ZFS and the binary NVIDIA drivers work with the kernel. This is not always true, as the Linux Kernel has a very clear line of demarcation in their LICENSE file which states that public APIs are just that: public and not subject to the GPLv2. That makes the kernel a unique case. Otherwise, any lack of clarification like that leaves you with the poorly defined “linking clause” to deal with.

“Ah, but LGPL doesn’t have those restrictions!” someone is shouting at their keyboard. That’s not entirely true. The LGPL lacks the “linking clause” which can help with adoption, but that’s it. The license is otherwise identical to the GPL. What does this mean? Great question. That is left up to the “author”. An excellent example of how to do it well is the libssh project clearly spells out that you can use libssh in a commercial product as long as you do not modify the library itself (see their features page under the “License Explained” section). But if a project does not do this in their documentation then it can be ambiguous.

The Academic licenses

This is a term I use to lump the BSD, MIT, CERN, X11, and their cousin licenses in one place. These licenses are super simple, typically less than a paragraph long. They also don’t do much except limit liability and allow for royalty free redistribution.

How businesses see the Academic licenses

Most companies do not have a problem with these licenses. They are simple, and allow for packaging them into commercial products without issue. The only major sticking point is patents. They are not covered at all by many of these licenses, and that can be a sticking point for projects that are sponsored by a company. However, using code licensed using of the Academic liceneses tends to be a non-issue.

The Common Open Source licenses

This is another term I use for things like the Apache v2 License, the Artistic License (used by Perl), the LLVM License, the Eclipse Public License, the Mozilla Public License, and a few others that are widely used and well understood. These are typically very much like the Academic licenses but were written by corporate lawyers and thus are verbose. But that’s not a bad thing, their verbosity means they are ver clear about what they are trying to do. Most also have provisions for patents.

How businesses see Common Open Source licenses

Much like the Academic licenses, these are typically not an issue. There are some options available for patent protection and/or indemnification under different licenses. Typically projects that are under some form of corporate sponsorship tend to use Apache v2 for these reasons over Academic licenses.

Creative Commons

So far, everything we have talked about has been for software generally, and source code specifically. But what about things like blog articles, artwork, sound files, videos, schematics, 3D Objects, basically anything that isn’t code? That’s what creative commons is for. They have an excellent license chooser on their site, answer three questions and you are done. They have plain text descriptions of what each license does, and well structured legal text to back up those descriptions to keep the lawyers happy.

But why not just use one of the above licenses for things like 3D Objects, PCB Gerbers, and Schematics? Aren’t they just source code that uses a special compiler? Sort of, but frequently for non-code items the base components are remixed through other applications so it’s not so cut and dry. CC licenses help with this immensely. Also, there is CC0 for a “public domain work-alike” license (since it’s not always clear how to put something in public domain no matter how hard you try).

Every other license

There are far too many here to go over, but I’ll give some highlights: CDDL, Shared Source, The JSON License (yes, really), et cetera. My biggest lessons to pass along here are twofold; don’t create your own license, and if you stumble across one of these you will need to start having conversations with lawyers. CDDL was made so that Oracle didn’t have to see all the wonderful work from SUN Microsystems/Solaris wind up in the Linux Kernel. I’ll leave it up to the reader to figure out how that worked out for them. The JSON License holds a special place in my heart: “The software shall be used for good, not evil.” Funny right? Not so much. The tools that use the JSON License routinely are houned by IBM because they want to use their tool and cannot guarantee that IBM or its clients will only use the software for good. Literally, people get calls from IBM lawyers every quarter about this. Not so funny now. Don’t invent licenses, and don’t be cute in them. For everyone’s sake.

Why do you always say “author” in quotes like that?

Let’s take a quick seque to talk about when someone who wrote something is not the “author”. If you are writing code as part of your job, you are likely working under a “work for hire” contract. You need to check this and what the bounds are. In Texas, companies can get away with just about anything in this realm as there are not strong employee protection laws in the state. California on the other hand has very strong laws about this. In California work done on your own time not using any company resources (like a work laptop or AWS account) cannot be claimed by the company as part of their work for hire. Texas is a lot less clear. Your local jurisdiction may vary. If you are working for the United States Government, for example, all work created for the US Government cannot be copywritten by a company or individual. So, you may not be the author of the code you write. It’s important to understand that, because of what we are about to talk about next….

Sometimes the license is not enough

Many commercially sponsored projects have additional clarifications put in place to deal with shortcomings of the license, or to assign copyright back to the sponsoring organization to simplify distribution issues. These are typically done using a Contributor License Agreement (CLA) which are typically managed by an out of band process and enforced using some form of gating system. But there are problems here as well: many projects want contributing to be as lightweight as possible so they implement a technical solution (an example is Developer Certificate of Origin). This is great for expediency of individual contributors but can be a real pain for corporate contributors.

Protecting yourself

OK, so now that I’ve probably scared you let’s talk about risk and how to mitigate it. Step one is get a lawyer. It’s a hard step, but if/when you need a lawyer it’s better to know who to call instead of getting someone who just deals with traffic tickets. It’s also important to point out that, in almost all cases, your company lawyer/legal team does not have your best interests in mind. They work for the company, not you. If you need help finding a lawyer that can understand this field, EFF is a great resource.

Second: understand your employment contract. You have one, even if you don’t think you do. Many companies have you sign a piece of paper that says you agree to the Employee Manual (or the like) when you join: congratulations, that makes the employee manual your employment contract. Understand if you are a work for hire employee or treated as a contractor. This has huge ramifications on what you can contribute to outside Open Source projects and who is the “author” in those cases.

Third: it’s likely your contract will not cover things like this. Fix that. Talk to your boss/manager and get an understanding of what the company’s expectations are and what the their expectations of you being an Open Source contributor are. It’s best to do this as part of your negotiations when being hired, but either way you need to have those conversations. It’s important to start with your boss/manager instead of legal because the last thing you want to do is confuse/annoy legal. If you work in a large company, expect this process to take a while and while it does do not contribute during company time or using company resources to open source projects. I cannot stress that enough. If the company doesn’t want you contributing and you do, then you are on the hook legally speaking. Be up front and transparent. If you have contributed start discussions now and stop contributing while you do. Hiding information is worse than making an honest mistake.

Fourth: Understand licenses used for projects you are using and/or contributing to. Also understand that if there is no license, you legally cannot use it. This includes Stack Overflow, Github, and random things found in search results. It’s much safer to find something that is properly licensed or to use what you find as a reference and reimplement the concepts yourself.

Fifth: Leave an audit trail. Did you get a chunk of code from somewhere? Link to it in a comment. Note where you are getting libraries and support applications from, and the licenses they use. If you need to use an open source piece of code, but modify it (like a Chef cookbook or an Ansible playbook) then add a file with a source (including version or even better immutable link like github with the revision SHA), what you changed (just a list of files), and why so if it needs to be upgraded later it’s easier to understand what your past self was thinking.

Sixth: If you are coding in an organization, find out what their open source policy is (or help build one). At several places I have worked, the company didn’t care about open source licenses we used, but we occasionally had contracts with customers/vendors who did. One in particular had a “No GPLv3 code would be delivered” clause in our contract and that caused some issues. It’s important to make this known to avoid surprises later.

Protecting your organization

Protecting your organization doesn’t just mean protecting the company you work for, it’s just as important to protect the open source communities you participate in and contribute to. A lot of what I suggested in the last section works wonders for both. But there are a couple of other steps that can be effective depending on your level of involvement, or contractual obligations:

Dependency Audit

These two words tend to strike a sense of horror and dread into most developers. Don’t let them. There are some great tools out there to help in the popular languages. You may discover some interesting surprises when you dive all the way down your dependency tree. Use of language native package management (pip, npm, rubygems, et cetera) can make this easier. Other languages (like Java and C#) you may have a harder time and need to do a lot more manual work. You may also find that there are libraries in there you do not want due to license concerns, but don’t fret too much as there are usually replacements. A great example of this is libreadline, which is used frequently in tools with an interactive CLI. Good old libreadline is GPL (2 or 3 depending on version), but there exists the equivalent libedit which is 100% API compatible and is BSD licensed. Things like that are a somewhat easy fix, you may need to build some more dependencies in your build pipeline to get the license coverage you desire. Some may be harder.

Legal Fiction

Do you run an open source project? You may want to build, or have it become part of, an organization to help protect it and you from a legal and financial standpoint. Examples of larger groups are The Free Software Foundation, Software in the Public Interest, The Linux Foundation, and the Apache Foundation. Others build their own: like the Blender Foundation and VLC. The idea there is the legal fiction (read: company) can absorb most/all of the legal risk from the individual contributors. It’s not perfect, as the recent unpleasantness with netfilter shows, but it can help. Larger groups can even provide legal support without you going out on your own.

Contributor License Agreements

Your organization may want to implement a CLA for external (and internal) contributors to do things like assign copyright to organization or grant royalty free patent licenses. If you are going to try and do something like that think about your users. If you are expecting or want contributions from corporations, think really hard about making something easier for corporate legal teams to work with rather than placing the onus on the contributor solely. As much of a pain as it can be, just having a Corporate Contributor License Agreement (CCLA) that is out of band with your other process to allow contributions from that org. This will allow your lawyers and their lawyers to work it out, and can be beneficial for larger orgs that don’t yet get Open Source.

Wrapping Up

This stuff is important, and it’s complicated, but it can be surmounted by anyone. To quote Lawrence Lessig’s excellent book title “Code is Law”, and using the transitive property “Law is Code”. All these licenses are written in legal code, and like any other language they can (and should) be read. If you are inexperienced, start reading licenses. When you get confused, ask for help (preferably from a lawyer). The more we all understand this, the better the world will be. Thanks for your time, and feel free to reach out if you have questions!

-Carl @edolnx on Twitter and on the HangOps Slack There is a discussion of this on my blog: https://www.gigofham.com/post/2017/07/23-sysadvent/

December 22, 2017

Day 22 - Building a secure bastion host, or, 50 ways to kill your server

By: Anna Kennedy (@anna_ken_)
Edited By: Gillian Gunson (@shebang_the_cat)

Bastion (noun) 1. A projecting part of a fortification 2. A special purpose computer on a network specifically designed and configured to withstand attacks

If you deploy servers to a private network, then you also need a way to connect to them. The two most common ways methods are to use a VPN, or to ssh through a bastion host (also known as a jump box). Shielding services this way massively reduces your attack surface, but you need to make sure that the server exposed to the internet is as secure as you can make it.

At Telenor Digital we have about 20 federated AWS accounts, and we wanted to avoid having to set up a complex system of VPNs. Additionally, we wanted to be able to connect to any account from anywhere, and not just from designated IP ranges. Deploying a bastion host to each account would allow us to connect easily to instances via ssh forwarding. Our preferred forwarding solution is sshuttle, a "transparent proxy server / poor man's VPN".

This is where it got... interesting. At the time of writing, Amazon AWS did not have a designated bastion host instance type available. Nor in fact do any of the other main cloud providers, nor did there appear to be any other trustworthy bastions available from other sources. There didn’t even seem to be any information about how other people were solving this problem.

I knew that we wanted a secure bastion such that:

  1. Only authorised users can ssh into the bastion
  2. The bastion is useless for anything BUT ssh'ing through

How hard could making such a bastion possibly be?

Constraints and processes

Our technology stack uses Ubuntu exclusively, and we wanted the bastion to be compatible with the various services we already deploy, such as Consul, Ansible, and Filebeat. Beyond that, I personally have a lot more experience with Ubuntu than I do with any other OS.

For these reasons we decided to base the bastion on a minimal Ubuntu install, strip out as many packages as possible, add some extra security, and make a golden image bastion AMI. Had it not been for these constraints, there might be better OSs to start with, such as Alpine Linux.

Additionally, we run everything in AWS so one or two points of the following are AWS-specific, but based on a lot of conversations it seems that the bastion problem is one that affects a much wider range of architectures.

We use Packer to build our AMIs, Ansible to set them up and Serverspec to test them, so building AMIs is a pretty fast process, typically taking about five minutes. After that we deploy everything using Terraform, so it's a quick turnaround from code commit to running instance.

Starting point: Ubuntu minimal-server

My first port of call: what packages are pre-installed in an Ubuntu minimal-server? Inspection via $apt list --installed or $ dpkg-query -W showed over 2000 packages, and of those I was surprised how many I'd never heard of. And of the ones I had heard of, I was further surprised how many seemed, well, superfluous.

I spent some time and made a few spreadsheets trying to figure out what all the mystery packages were before I got bored and had the bright idea of leveraging Ubuntu's package rating system: all packages are labelled as one of: required, important, standard, optional, or extra.

$ dpkg-query -Wf '${Package;-40}${Priority}\n'
apt                             important
adduser                         required
at                              standard
a11y-profile-manager-indicator  optional
adium-theme-ubuntu              extra

Remove optional and extra packages

Those optional and extra packages sounded very nonessential. I was pretty sure I could rip those out with a nice one-liner and be done.

dpkg-query -Wf '${Package;-40}${Priority}\n' | awk '$2 ~ /optional|extra/ { print $1 }' | xargs -I % sudo apt-get -y purge %

Turns out this was not my best ever idea. All sorts of surprising packages were marked optional or extra and were thus unceremoniously removed, including:

  • cloud-init
  • grub
  • linux-base
  • openssh-server
  • resolvconf
  • ubuntu-server (meta-package)

It doesn't take a genius to realise that removing grub, open-ssh or resolvconf is colossally ill-advised, but even after I tried not removing these but uninstalling the rest I had no luck. On every build I got an unstable and/or unusable image, often not booting at all. Interestingly it broke in a different way each time, possibly something to do with how fast it was uprooting various dependencies before it got to an unbootable state. After quite a lot of experimenting with package-removal lists and getting apparently nondeterministic results, it was time for a new strategy.

Remove a selected list of packages

I revised my plan somewhat in the realisation that maybe blindly removing lots of packages wasn't the best of ideas. Maybe I could look through the package list and remove the ones that seemed the most 'useful' and remove them. Some obvious candidates for removal were the various scripting languages, plus tools like curl and net-tools. I was pretty sure these were just peripherals to a minimal server.

Package nameOk to remove?Dependency
curlnoConsul
edyes
ftpyes
gawkyes
nanoyes
net-toolsnosshuttle
perlnossh
python 2.7noAnsible
python 3noAWS instance checks
rsyncyes
screenyes
tarnoAnsible
tmuxyes
vimyes
wgetyes

It turns out I was incorrect. Due to the various restrictions placed upon the system because we use Consul, sshuttle, Ansible and AWS, about half of my hitlist was unremovable.

To compensate for the limitations in my "remove all the things" strategy, I decided to explore limiting user powers.

Restrict user capabilities

Really, I didn't want users to be able to do anything - they should only be allowed to ssh tunnel or sshuttle through the bastion. Therefore locking down the specific commands a user could issue ought to limit potential damage. To restrict user capabilities, I found four possible methods:

  • Change all user shells to /bin/nologin
  • Use rbash instead of bash
  • Restrict allowed commands in authorized_keys
  • Remove sudo from all users

All seemed like good ideas - but on testing I discovered that the first three options only work for pure ssh tunnelling, and don’t work in conjunction with sshuttle.

Remove sudo

I disabled sudo by removing all users from the sudo group, which worked perfectly apart from introducing a new dimension to bastion troubleshooting - without sudo it’s not possible to read the logs nor perform any meaningful investigation on the instance.

We offset most of the pain by having the bastion export its logs to our ELG (Elasticsearch, Logstash, Graylog) logging stack, and export metrics to Prometheus. Between these, most issues were easily identified without needing direct sudo access directly. For the couple of bigger build issues that I had in the later stages of development, I built two versions of the bastion at a time, with and without sudo. A little clunky, but only temporarily.

With the bastion locked down as much as possible, I then added in a few more restrictions to finalise the hardening.

Install fail2ban

An oldie but a goodie, fail2ban is fantastic at restricting logon attempts by locking out anyone who fails to login three times in a row for a determined time period.

Use 2FA and port knocking

Some clever folks in my team ended up making a version of sshuttle that invokes AWS two-factor authentication for our users, and implements a port knocking capability which only opens the ssh port in response to a certain request. The details of this are outside the scope of this article, but we hope to make this open-source in the near future.

Finally! A bastion!

After a lot of experiments and some heavy testing, the bastion was declared production-ready and deployed to all accounts. The final image is t2.nano sized, but we’ve not seen any problems with performance so far as the ssh forwarding is so lightweight.

It's now been in use for a least half a year, and it's been surprisingly stable. We have a Jenkins job that builds and tests the bastion AMI on any change to the source AMI or on code merge, and we redeploy our bastions every few weeks.

I still lie awake in bed sometimes and try to work out how to build a bastion from the ground up, but to all intents and purposes I think the one we have is just fine.

References

Related content

December 21, 2017

Day 21 - Lighting Up Your Haunted Graveyards

By: Carla Geisser
Edited By: Brendan Murtagh (@bmurts)

The subsystem that shows up as a cause in every postmortem, but never gets fixed. The source file that begins with comments by several generations of engineers warning of the hazard that lies within. These are your organization’s haunted graveyards.

haunted graveyard

A common conversation goes like this -

Enthusiastic New Person: “My manager suggested I add better monitoring to the $scary_thing, where should I start?”

Grumpy Senior Engineer: “Um, the code is over there, but it’s 50,000 lines of spaghetti. The last time we touched it we learned it also processes payroll, and the person who wrote it quit 6 months ago. We just try not to touch it.”

A Different Grumpy Senior Engineer: “Yup, I looked at fixing it a few years ago and gave up. When it crashes we just restart it and hope it keeps going.”

Great. Now you have a system so haunted that two senior members on the team refuse to go near it. You’ve chosen to encase it in concrete and warning signs rather than fix it.

This is a huge trap. If you’re very lucky, you’ll only have to walk into the graveyard for security and platform updates, which are probably going to be ok. More likely the system will break spectacularly when you least expect it, and you’ll write a postmortem containing the phrases “technical debt” and “key components with no owner.”

What do you do now? March in with flashlights and throw a party.

History Class

Find out as much as you can about the system, especially about how it has evolved over time. You want facts, not tales that get told over $beverages after a big outage. Seek out what has been tried before, and why those attempts failed. Understand the circumstances, since things are probably different now.

Information Gathering

Figure out what the $scary_thing does and how it interacts with the world. You want to avoid touching the actual system because it is fragile and scary, but get as much data as you can from the outside.

Application, Client, Data, and Network logs can provide indications about what the system expects as inputs and outputs. If logs aren’t available, putting a proxy layer around the system and log everything there (more about that later).

Turn it off and see who yells. This is not recommended for components suspected to be mission critical, but can be very illuminating. And it is faster than staring at logs all day.

Develop A Risk Assessment

Often you’ll find that the circumstances have changed since the last attempt to make progress. Clients of the system may have better resiliency to brief interruptions. You may have moved 80% of the functionality elsewhere.

Look at the downside too: how much scarier do things get if you don’t try and fix this system? Do your Ops people rage-quit from frustration? Does it endanger your next product launch? Is it preventing other systems from improving?

Change Something Small

You know what the thing does (kinda) and you know what else depends on it (maybe?) so now you can change it. Pick something small: a minor library update, running a code linter, or adding some log lines

Treat the first few changes as incidents before you even start. Write up a plan, gather all the experts in case something goes wrong. Book a conference room (or chat channel) and order take-out.

Write a postmortem, even if everything goes fine.

Rinse, repeat as necessary A key reason systems become haunted is lack of practice which leads to fear of change. If a process is scary you should do it frequently until it isn’t scary anymore.

sunny graveyard

Proxies and Mirrored Requests

If leaping right in seems too risky, you have another option. Replace the $scary_thing with a tiny piece of code that forwards each request to the real thing.

Once this proxy is in place, you can use it for instrumentation or as a tool to deprecate your haunted system.

Here is an example workflow:

  1. Replace the $scary_thing with a tiny piece of code that forwards each request to the real thing and pushes the response back to the client.
  2. Have the proxy log everything it sees (or a sample if there’s a lot of it)
  3. Rewrite functionality into the proxy layer. If you’re paranoid, and you should be, run both the old and new workflows and have the proxy log any differences in results.
  4. Slowly migrate requests from the $scary_thing to the proxy layer which is now actually a rewrite of the functionality of the old thing.
  5. Turn off the $scary_thing. Have a party.

This process can be slow and requires engineering, but it is much safer and more reliable than trying to do a rewrite.

Preventing the Haunting

Ideally your systems should never get into this state. However, here are some tips to help prevent your own $scary_thing.

  1. Rebuild and deploy code often, even if not much has changed
  2. Ensure every component has an owner.
  3. Build robust logging and observability into everything so you can refer to data rather than lore.
  4. Treat technology and platform migrations as real projects, and don’t declare them done until all the old components are gone.

Happy Solstice everyone!

December 20, 2017

Day 20 - Regarding the Responsibility of Systems Administrators

By: Ben Rockwood (@benr) Edited By: Joshua Zimmerman (@TheJewberwocky)

Anyone who has worked for me will tell you I’m a big fan of reflection. At the end of every year I utilize a simple Compact Calendar created by David Seah to look back at the year and sketch out, week by week, what happened. By looking back, we can chart the direction that we’ve been traveling, with or, more often, without our conscious knowledge. In DevOps we often say that it’s all about “velocity” and “velocity is speed with direction”, thus our reflections allow us an opportunity to course correct when we determine that the direction we’ve been traveling on doesn’t actually align with our desired course.

SysAdvent is, to me, like a SysAdmin year book. We can see the topics that were important to us during a given time, allowing for reflections in the years to come. There is, therefore, a certain responsibility upon us as we contribute.

Many great things have happened this year in tech for us to reflect on. I strongly feel that that 2017 was the year in which containers finally proved that they are indeed here to stay. The technology itself was never really in question, but a massive ecosystem needed to come around the fundamental technology to make it truly feasible. Kubernetes, Service Meshes powered by intelligent and lightweight reverse proxies like Envoy, Serverless, application centric monitoring and metrics empowered by Prometheus, major changes in SD-WAN and NFV that will power Hybrid Cloud and reliance on telecom networks in a way never before seen, and on and on, so many technologies that have matured this year and begun to shape a new era in the way we operate services. And while all these things are important, there is a topic far more important, and indeed less enjoyable, which must be addressed as we reflect.

The biggest event of the year in tech was arguably the massive Equifax breach. Equifax made a lot of mistakes in a variety of ways, but if you are reading this you likely know how the IT and Operations functions are managed and run inside most companies. I doubt it would take much effort for you to examine the few facts we have about what happened at Equifax to build your own mental picture of what it was like before and during the breach. Tell me honestly, could this have been prevented? We all know that it could have been. Tell me honestly, is your organization any better? I’m willing to bet you are not. Yes, Equifax Inc. took the blame and certainly there is a very real systemic problem there. But while the ultimate responsibility lies in the leadership of the company, the problem occurred because the SysAdmins, our brothers and sisters, failed to prevent it.

If we’re not stopping to reflect on our part in these massive breaches then we’re frankly unqualified to do this type of work. When you hear of these breaches do you get a chill down your spine with fear that you too are vulnerable? That you could be next? And what do you do in response? My guess is very little. There may be a short, focused attention to security but after we’ve worked off enough of our guilt we tend back towards business as usual.

In 2011 when I delivered my LISA Keynote “The DevOps Transformation”, I tried to make a case for a new level of professionalism within Systems Administration by, among other things, casting DevOps as not any one specific thing but rather a “banner for change”, dispelling a general fear of standards and best practices, such as ITIL and COBIT, and connecting Systems Administration with the greater tradition of Operations Management over the previous 100 years. I have also predicted elsewhere that tech will become a regulated industry by 2030. We are seeing events nearly every month which make me more and more convinced of these positions. The level of professionalism and accountability in our trade must rise significantly. With no credible industry governing bodies to drive this transformation, it must happen culturally. DevOps was that cultural change, but the sheer magnitude of changes we have needed to take place has largely trumped this meta-theme.

I firmly believe that that every operations team, of any size, should be externally auditing for SOC2 Type II compliance with, at a minimum, the Security Principle. Furthermore, the European Unions General Data Protection Regulation (GDPR), while it may have its faults, is a quantum leap forward for both the rights of individuals and accountability of corporations. We can no longer afford to remain cynical geeks belittling such reasonable advances as a PITA. If your unsure of where to start, consider the Cloud Security Alliances STAR Self Assessment.

Above all, Systems Administrators must step up and take a more active role in systems engineering and architecture. SysAdmin’s don’t create business software, we assemble software together to provide value. That is, SysAdmin’s engineer and architect solutions as integrators; creating, maintaining, and protecting business value. All too often we feel lucky just to keep up on last years trends and handle the day to day incidents and maintenance tasks.

The solution isn’t to hit the brakes on innovation, lock everything down and focus just on security and privacy. Rather, we have a variety of innovations that give administrators increasing levels of sophistication and capability. Docker and Kubernetes allow for very small surface areas and a more testable and deployable solution set than we’ve ever had before. Hashicorp’s Vault and Consul allow for security capabilities that were only a dream just a couple of years ago.

One concept that particularly excites me is Kubernetes Operators which “represents human operational knowledge in software to reliably manage an application.” While this isn’t a terribly new concept, its never looked so achievable as it does with Kubernetes. A It drives towards the old famous SysAdmin mantra of “automate yourself out of a job.” Clearly, that never happens, but extending Operators with an increasing body of knowledge rather than creating text documents and procedures clearly the way of the future, particularly when Kubernetes give us a rich API and consistent deployment model.

And so, as we come to the end of 2017, I’d strongly encourage you to grab a Compact Calendar and reflect on your year to examine your velocity. How fast did you really go? What direction did you really go? And, above all, did you make the most important things the most important things? As we close out the decade lets use all these amazing new innovations serve the goal of providing real value to real people. We are in the best possible positions to advocate on their behalf and create real change to safeguard the rights to privacy and security that all people should enjoy.

December 19, 2017

Day 19 - Infrastructure Testing: Sanity Checking VPC Routes

By: Dan Stark (@danstarkdevops)
Edited By: James Wen

Testing Infrastructure is hard; testing VPCs is even harder

Infrastructure testing is hard. My entire career I’ve tried to bring traditional development testing practices into operations. Linters. Rspec. Mock objects. These tools provide semantic and syntax checking, as well as unit and integration level coverage for infrastructure as code. Ideally, we would also test the system after the code is deployed. End-to-end infrastructure testing has always been a stretch goal – too time-consuming to implement from scratch. This is especially true of network level testing. I am not aware of any existing tools that provide self-contained, end-to-end tests to ensure VPCs, subnets, and route tables are properly configured. As a result, production network deployments can be incredibly anxiety-inducing. Recently, my coworkers and I set up an entire VPC (virtual private cloud) using infrastructure as code, but felt we needed a tool that could perform a VPC specification validation to catch bugs or typos before deployment. The goal was to sanity check our VPC routes using a real resource in every subnet.

Why is this necessary

A typical VPC architecture may contain multiple VPCs and include peering, internal/external subnets, NAT instances/gateways, and internet gateways. In the “Reliability Pillar” of their “Well-Architected Framework” whitepaper, AWS recommends designing your system based on your availability need. At Element 84, we desired 99.9% reliability for EC2, which required 3 external and internal subnets having CIDR blocks of /20 or smaller. In addition, we needed 9 of these redundant VPCs to provide required network segregation. It took significant effort to carve out VPCs with dependent rules and resources for three availability zones.

Here is a hypothetical example of multiple VPCs (Dev, Stage, Prod) over two regions:

image1

Extending this example with additional VPCs for bastion hosts, reporting, and Demilitarized Zones for contractors (DMZs), both NonProd and Prod:

image 2

It’s too easy for a human to make a mistake, even with infrastructure as code.

Managing VPC infrastructure

Here’s one example of how we want a VPC to behave:

We want a Utility VPC peered to a Staging VPC. The Utility VPC contains bastion EC2 instances living in external subnets and the Staging VPC contains internally subnetted application EC2 instances. We want to test and ensure the connectivity between these resources. Also, we want to verify that every bastion EC2 can communicate with all the potential application EC2 instances, across all subnets. Additionally, we want to test connectivity to the external internet for the application EC2 instances, in this case via a NAT gateway.

These behaviors are well defined and should be tested. We decided to write a tool to help manage testing our VPCs and ensuring these kinds of behaviors. It contains:

  1. a maintainable top level DSL written in YAML to declare the VPC specification that sits above the VPC configuration code; and
  2. a mechanism to be able to test the network level connectivity between VPCs, subnets, IGW/NAT and report any problems.

Introducing: VpcSpecValidator

This project is “VpcSpecValidator,” a Python 3 library built on top of the boto3.

There are a few requirements in how you deploy your VPCs to use this library:

  1. You must have deployed your VPCs with CloudFormation and have Outputs for internal subnet with the strings “Private” or “Public” and “Subnet”, e.g. “DevPrivateSubnetAZ1A”, “DevPrivateSubnetAZ1B”, “DevPrivateSubnetAZ1C.”
  2. All VPCs should be tagged with ‘Name’ tags in your region(s).
  3. You must ensure your that a security group attached to these instances allows SSH access between your VPC peers. This is not recommended for production so you may want to remove these rules after testing.
  4. You must have permissions to create/destroy EC2 instances for complete setup and teardown. The destroy method has multiple guards to prevent you from accidentally deleting EC2 instances not created by this project.

You supply a YAML configuration file to outline your VPCs’ structure. Using our example above, this would look like:

project_name: mycompany
region: us-east-1
availability_zones:
  - us-east-1a
  - us-east-1b
  - us-east-1c

# Environment specification
dev:
  peers:
    - nonprod-util
    - nonprod-reporting
  us-east-1a:
    public: 172.16.0.0/23
    private: 172.16.6.0/23
  us-east-1b:
    public: 172.16.2.0/23
    private: 172.16.8.0/23
  us-east-1c:
    public: 172.16.4.0/23
    private: 172.16.10.0/23

stage:
  peers:
    - nonprod-util
    - nonprod-reporting
  us-east-1a:
    public: 172.17.0.0/23
    private: 172.17.6.0/23
  us-east-1b:
    public: 172.17.2.0/23
    private: 172.17.8.0/23
  us-east-1c:
    public: 172.17.4.0/23
    private: 172.17.10.0/23

nonprod-util:
  peers:
    - dev
    - stage
  us-east-1a:
    public: 172.19.0.0/23
    private: 172.19.6.0/23
  us-east-1b:
    public: 172.19.2.0/23
    private: 172.19.8.0/23
  us-east-1c:
    public: 172.19.4.0/23
    private: 172.19.10.0/23

nonprod-reporting:
  peers:
    - dev
    - stage
  us-east-1a:
    public: 172.20.0.0/23
    private: 172.20.6.0/23
  us-east-1b:
    public: 172.20.2.0/23
    private: 172.20.8.0/23
  us-east-1c:
    public: 172.20.4.0/23
    private: 172.20.10.0/23

prod:
  peers:
    - prod-util
  us-east-1a:
    public: 172.18.0.0/23
    private: 172.18.6.0/23
  us-east-1b:
    public: 172.18.2.0/23
    private: 172.18.8.0/23
  us-east-1c:
    public: 172.18.4.0/23
    private: 172.18.10.0/23

prod-util:
  peers:
    - prod
  us-east-1a:
    public: 172.19.208.0/23
    private: 172.19.214.0/23
  us-east-1b:
    public: 172.19.210.0/23
    private: 172.19.216.0/23
  us-east-1c:
    public: 172.19.212.0/23
    private: 172.19.218.0/23

prod-reporting:
  peers:
    - prod
  us-east-1a:
    public: 172.20.208.0/23
    private: 172.20.214.0/23
  us-east-1b:
    public: 172.20.210.0/23
    private: 172.20.216.0/23
  us-east-1c:
    public: 172.20.212.0/23
    private: 172.20.218.0/23

The code will:

  1. parse the YAML for a user-specified VPC,
  2. get the public or private subnets associated with each Availability Zone’s CIDR range in that VPC,
  3. launch an instance in those subnets,
  4. identify the peering VPC(s),
  5. create a list of the test instances in the peer’s subnets (public or private, depending on what was specified in step 2),
  6. attempt an TCP socket connection using the private IP and port 22 for each instance in this list.

Step 5 posed an interesting deployment challenge. We decided UserData was a good option to bootstrap and clone the repo on an EC2 instance, but did not know how to pass it peered VPC’s private IP addresses as SSH targets.

Given the entire specification is in one file and the CIDR ranges are available, we can cheat and look at the Outputs of the peer(s)’ CloudFormation stack and see if any instances created in Step 3 match.

def get_ip_of_peer_instances_and_write_to_settings_file(self):

    '''
    This is run on the source EC2 instance as part of UserData bootstrapping
    1) Look at the peer(s)' VPC CloudFormation Stack's Outputs for a list of subnets, public or private as defined in the constructor.
    2) Find instances in those subnets created by this library
    3) Get the Private IP address of target instances and write it to a local configuration file
    '''
        
    # Query for peer CloudFormation, get instances
    target_subnet_list = []
    target_ip_list = []
    with open(self.config_file_path, 'r') as ymlfile:
        cfg = yaml.load(ymlfile)
    
    for peer in self.peers_list:
        peer_stack_name = "{}-vpc-{}-{}".format(self.project_name, peer, cfg['region'])
    
        # Look at each peer's CloudFormation Stack Outputs and get a list of subnets (public or private)
        client = boto3.client('cloudformation')
        response = client.describe_stacks(StackName=peer_stack_name)
        response_outputs = response['Stacks'][0]['Outputs']
    
        for i in range(0,len(response_outputs)):
            if self.subnet_type == 'public':
                if 'Subnet' in response_outputs[i]['OutputKey'] and 'Public' in \
                        response_outputs[i]['OutputKey']:
                    subnet_id = response_outputs[i]['OutputValue']
                    target_subnet_list.append(subnet_id)
    
            else:
                if 'Subnet' in response_outputs[i]['OutputKey'] and 'Private' in \
                        response_outputs[i]['OutputKey']:
                    subnet_id = response_outputs[i]['OutputValue']
                    target_subnet_list.append(subnet_id)
    
    
        # Search the instances in the targeted subnets for a Name tag of VpcSpecValidator
        client = boto3.client('ec2')
        describe_response = client.describe_instances(
            Filters=[{
                'Name': 'tag:Name',
                'Values': ['VpcSpecValidator-test-runner-{}-*'.format(peer)]
            }]
        )
    
        # Get Private IP addresses of these instances and write them to target_ip_list.settings
    
        for i in range(0,len(describe_response['Reservations'])):
            target_ip_list.append(describe_response['Reservations'][i]['Instances'][0]['PrivateIpAddress'])
    
        # Write the list to a configuration file used at runtime for EC2 instance
        with open('config/env_settings/target_ip_list.settings', 'w') as settings_file:
            settings_file.write(str(target_ip_list))

There is also a friendly method to ensure that the YAML specification matches what is actually deployed via CloudFormation templates.

spec = VpcSpecValidator('dev', subnet_type='private')

spec.does_spec_match_cloudformation()

Next Steps

At Element 84, we believe our work benefits our world, and open source is one way to personify this value. We’re in the process of open sourcing this library at the moment. Please check back soon and we’ll update this blog post with a link. We will also post a link to the repo on our company Twitter.

Future features we would like to add:

  • Make the VpcSpecValidator integration/usage requirements less strict.
  • Add methods to test internet connectivity.
  • Dynamic EC2 keypair generation/destruction. The key pairs should be unique and throwaway after the test.
  • Compatibility with Terraform.
  • CI as a first-class citizen by aggregating results in JUnit-compatible format. Although I think it would be overkill to run these tests with every application code commit, it may make sense for infrastructure commits or running on a schedule.

Wrap Up - Testing VPCs is difficult but important

Many businesses use one big, poorly defined (default) VPC. There are a few problems with this:

Development resources can impact production

At a fundamental level, you want to have as many barriers between development and production environments as possible. This isn’t necessarily to stop developers from caring about production. As an operator, I want to prevent my developers from being in a position where they might unintentionally impact production. In addition to security group restrictions, avoid these potential mishaps impossible from a network perspective. To steal an idea from Google Cloud Platform, we want to establish "layers of security.” This tool helps to enforce these paradigms by validating VPC behavior prior to deployment.

Well-defined and well-tested architecture is necessary for scaling

This exercise forced our team to think about our architecture and its future. What are the dependencies as they sit today? How would we scale to multi-region? What about third party access? We would want them in a DMZ yet still able to get the information they need. How big do we expect these VPCs to scale?

It’s critical to catch these issues before anything is deployed

The best time to find typos, configuration, and logic errors is before the networking is in use. Once deployed, these are hard errors to troubleshoot because of the built-in redundancy. The goal is to prevent autoscaling events yielding a “how was this ever working” alarm at 3AM because one subnet route table is misconfigured. That’s why we feel a tool like this has a place in the community. Feel free to add comments and voice a +1 in support.

Happy SysAdvent!