sysadvent: Day 9 - Using Kubernetes for multi-provider, multi-region batch jobs

By: Eric Sigler (@esigler)
Edited By: Michelle Carroll (@miiiiiche)

Introduction

At some point you may find yourself wanting to run work on multiple infrastructure providers — for reliability against certain kinds of failures, to take advantage of lower costs in capacity between providers during certain times, or for any other reason specific to your infrastructure. This used to be a very frustrating problem, as you’d be restricted to a “lowest common denominator” set of tools, or have to build up your own infrastructure primitives across multiple providers. With Kubernetes, we have a new, more sophisticated set of tools to apply to this problem.

Today we’re going to walk through how to set up multiple Kubernetes clusters on different infrastructure providers (specifically Google Cloud Platform and Amazon Web Services), and then connect them together using federation. Then we’ll go over how you can submit a batch job task to this infrastructure, and have it run wherever there’s available capacity. Finally, we’ll wrap up with how to clean up from this tutorial.

Overview

Unfortunately, there isn’t a one-step “make me a bunch of federated Kubernetes clusters” button. Instead, we’ve got several parts we’ll need to take care of:

Have all of the prerequisites in place.
Create a work cluster in AWS.
Create a work cluster in GCE.
Create a host cluster for the federation control plane in AWS.
Join the work clusters to the federation control plane.
Configure all clusters to correctly process batch jobs.
Submit an example batch job to test everything.

Disclaimers

Kubecon is the first week of December, and Kubernetes 1.9.0 is likely to be released the second week of December, which means this tutorial may go stale quickly. I’ll try to call out what is likely to change, but if you’re reading this and it’s any time after December 2017, caveat emptor.
This is not the only way to set up Kubernetes (and federation). One of the two work clusters could be used for the federation control plane, and having a Kubernetes cluster with only one node is bad for reliability. A final example is that kops is a fantastic tool for managing Kubernetes cluster state, but production infrastructure state management often has additional complexity.
All of the various CLI tools involved (gcloud, aws, kube*, and kops) have really useful environment variables and configuration files that can decrease the verbosity needed to execute commands. I’m going to avoid many of those in favor of being more explicit in this tutorial, and initialize the rest at the beginning of the setup.
This tutorial is based off information from the Kubernetes federation documentation and kops Getting Started documentation for AWS and GCE wherever possible. When in doubt, there’s always the source code on GitHub.
The free tiers of each platform won’t cover all the costs of going through this tutorial, and there are instructions at the end for how to clean up so that you shouldn’t incur unplanned expense — but always double check your accounts to be sure!

Setting up federated Kubernetes clusters on AWS and GCE

Part 1: Take care of the prerequisites

Sign up for accounts on AWS and GCE.
Install the AWS Command Line Interface - brew install awscli.
Install the Google Cloud SDK.
Install the Kubernetes command line tools - brew install kubernetes-cli kubectl kops
Install the kubefed binary from the appropriate tarball for your system.
Make sure you have an SSH key, or generate a new one.
Use credentials that have sufficient access to create resources in both AWS and GCE. You can use something like IAM accounts.
Have appropriate domain names registered, and a DNS zone configured, for each provider you’re using (Route53 for AWS, Cloud DNS for GCP). I will use “example.com” below — note that you’ll need to keep track of the appropriate records.

Finally, you’ll need to pick a few unique names in order to run the below steps. Here are the environment variables that you will need to set beforehand:

export S3_BUCKET_NAME="put-your-unique-bucket-name-here"
export GS_BUCKET_NAME="put-your-unique-bucket-name-here"

Part 2: Set up the work cluster in AWS

To begin, you’ll need to set up the persistent storage that kops will use for the AWS work cluster:

aws s3api create-bucket --bucket $S3_BUCKET_NAME

Then, it’s time to create the configuration for the cluster:

kops create cluster \
 --name="aws.example.com" \
 --dns-zone="aws.example.com" \
 --zones="us-east-1a" \
 --master-size="t2.medium" \
 --node-size="t2.medium" \
 --node-count="1" \
 --state="s3://$S3_BUCKET_NAME" \
 --kubernetes-version="1.8.0" \
 --cloud=aws

If you want to review the configuration, use kops edit cluster aws.example.com --state="s3://$S3_BUCKET_NAME". When you’re ready to proceed, provision the AWS work cluster by running:

kops update cluster aws.example.com --yes --state="s3://$S3_BUCKET_NAME"

Wait until kubectl get nodes --show-labels --context=aws.example.com shows the NODE role as Ready (it should take 3–5 minutes). Congratulations, you have your first (of three) Kubernetes clusters ready!

Part 3: Set up the work cluster in GCE

OK, now we’re going to do a very similar set of steps for our second work cluster, this one on GCE. First though, we need to have a few extra environment variables set:

export PROJECT=`gcloud config get-value project`
export KOPS_FEATURE_FLAGS=AlphaAllowGCE

As the documentation points out, using kops with GCE is still considered alpha. To keep each cluster using vendor-specific tools, let’s set up state storage for the GCE work cluster using Google Storage:

gsutil mb gs://$GS_BUCKET_NAME/

Now it’s time to generate the configuration for the GCE work cluster:

kops create cluster \
 --name="gcp.example.com" \
 --dns-zone="gcp.example.com" \
 --zones="us-east1-b" \
 --state="gs://$GS_BUCKET_NAME/" \
 --project="$PROJECT" \
 --kubernetes-version="1.8.0" \
 --cloud=gce

As before, use kops edit cluster gcp.example.com --state="gs://$GS_BUCKET_NAME/" to peruse the configuration. When ready, provision the GCE work cluster by running:

kops update cluster gcp.example.com --yes --state="gs://$GS_BUCKET_NAME/"

And once kubectl get nodes --show-labels --context=gcp.example.com shows the NODE role as Ready, your second work cluster is complete!

Part 4: Set up the host cluster

It’s useful to have a separate cluster that hosts the federation control plane. In production, it’s better to have this isolation to be able to reason about failure modes for different components. In the context of this tutorial, it’s easier to reason about which cluster is doing what work.

In this case, we can use the existing S3 bucket we’ve previously created to hold the configuration for our second AWS cluster — no additional S3 bucket needed! Let’s generate the configuration for the host cluster, which will run the federation control plane:

kops create cluster \
 --name="host.example.com" \
 --dns-zone="host.example.com" \
 --zones=us-east-1b \
 --master-size="t2.medium" \
 --node-size="t2.medium" \
 --node-count="1" \
 --state="s3://$S3_BUCKET_NAME" \
 --kubernetes-version="1.8.0" \
 --cloud=aws

Once you’re ready, run this command to provision the cluster:

kops update cluster host.example.com --yes --state="s3://$S3_BUCKET_NAME"

And one last time, wait until kubectl get nodes --show-labels --context=host.example.com shows the NODE role as Ready.

Part 5: Set up the federation control plane

Now that we have all of the pieces we need to do work across multiple providers, let’s connect them together using federation. First, add aliases for each of the clusters:

kubectl config set-context aws --cluster=aws.example.com --user=aws.example.com
kubectl config set-context gcp --cluster=gcp.example.com --user=gcp.example.com
kubectl config set-context host --cluster=host.example.com --user=host.example.com

Next up, we use the kubefed command to initialize the control plane, and add itself a member:

kubectl config use-context host
kubefed init fed --host-cluster-context=host --dns-provider=aws-route53 --dns-zone-name="example.com"

If the message “Waiting for federation control plane to come up” takes an unreasonably long amount of time to appear, you can check the underlying pods for any issues by running:

kubectl get all --context=host.example.com --namespace=federation-system
kubectl describe po/fed-controller-manager-EXAMPLE-ID --context=host.example.com --namespace=federation-system

Once you see “Federation API server is running,” we can join the work clusters to the federation control plane:

kubectl config use-context fed
kubefed join aws --host-cluster-context=host --cluster-context=aws
kubefed join gcp --host-cluster-context=host --cluster-context=gcp
kubectl --context=fed create namespace default

To confirm everything’s working, you should see the aws and gcp clusters when you run:

kubectl --context=fed get clusters

Part 6: Set up the batch job API

(Note: This is likely to change as Kubernetes evolves — this was tested on 1.8.0.) We’ll need to edit the federation API server in the control plane, and enable the batch job API. First, let’s edit the deployment for the fed-apiserver:

kubectl --context=host --namespace=federation-system edit deploy/fed-apiserver

And within the configuration in the federation-apiserver section, add a –runtime-config=batch/v1 line, like so:

  containers:
  - command:
    - /hyperkube
    - federation-apiserver
    - --admission-control=NamespaceLifecycle
    - --bind-address=0.0.0.0
    - --client-ca-file=/etc/federation/apiserver/ca.crt
    - --etcd-servers=http://localhost:2379
    - --secure-port=8443
    - --tls-cert-file=/etc/federation/apiserver/server.crt
    - --tls-private-key-file=/etc/federation/apiserver/server.key
  ... Add the line:
    - --runtime-config=batch/v1

Then restart the Federation API Server and Cluster Manager pods by rebooting the node running them. Watch kubectl get all --context=host --namespace=federation-system if you want to see the various components change state. You can verify the change applied by running the following Python code:

# Sample code from Kubernetes Python client
from kubernetes import client, config


def main():
    config.load_kube_config()

    print("Supported APIs (* is preferred version):")
    print("%-20s %s" %
          ("core", ",".join(client.CoreApi().get_api_versions().versions)))
    for api in client.ApisApi().get_api_versions().groups:
        versions = []
        for v in api.versions:
            name = ""
            if v.version == api.preferred_version.version and len(
                    api.versions) > 1:
                name += "*"
            name += v.version
            versions.append(name)
        print("%-40s %s" % (api.name, ",".join(versions)))

if __name__ == '__main__':
    main()

You should see output from that Python script that looks something like:

> python api.py
Supported APIs (* is preferred version):
core                 v1
federation           v1beta1
extensions           v1beta1
batch                v1

Part 7: Submitting an example job

Following along from the Kubernetes batch job documentation, create a file, pi.yaml with the following contents:

apiVersion: batch/v1
kind: Job
metadata:
  generateName: pi-
spec:
  template:
    metadata:
      name: pi
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

This job spec:

Runs a single container to generate the first 2,000 digits of Pi.
Uses a generateName, so you can submit it multiple times (each time it will have a different name).
Sets restartPolicy: Never, but OnFailure is also allowed for batch jobs.
Sets backoffLimit. This generates a parse violation in 1.8, so we have to disable validation.

Now you can submit the job, and follow it across your federated set of Kubernetes clusters. First, at the federated control plane level, submit and see which work cluster it lands on:

kubectl --validate=false --context=fed create -f ./pi.yaml 
kubectl --context=fed get jobs
kubectl --context=fed describe job/<JOB IDENTIFIER>

Then (assuming it’s the AWS cluster — if not, switch the context below), dive in deeper to see how the job finished:

kubectl --context=aws get jobs
kubectl --context=aws describe job/<JOB IDENTIFIER>
kubectl --context=aws get pods
kubectl --context=aws describe pod/<POD IDENTIFIER>
kubectl --context=aws logs <POD IDENTIFIER>

If all went well, you should see the output from the job. Congratulations!

Cleaning up

Once you’re done trying out this demonstration cluster, you can clean up all of the resources you created by running:

kops delete cluster gcp.example.com --yes --state="gs://$GS_BUCKET_NAME/"
kops delete cluster aws.example.com --yes --state="s3://$S3_BUCKET_NAME"
kops delete cluster host.example.com --yes --state="s3://$S3_BUCKET_NAME"

Don’t forget to verify in the AWS and GCE console that everything was removed, to avoid any unexpected expenses.

Conclusion

Kubernetes provides a tremendous amount of infrastructure flexibility to everyone involved in developing and operating software. There are many different applications for federated Kubernetes clusters, including:

Using Cluster Autoscaler or kubernetes-ec2-autoscaler in combination with spot pricing to run workloads at the lowest cost.
Leveraging the recently added GPU scheduling capability and Node Affinity rules to assign jobs to wherever there’s available GPU capacity.
Having fault isolation across geographic regions and/or infrastucture providers, lower latency to your end users, and much more!

Good luck to you in whatever your Kubernetes design patterns may be, and happy SysAdvent!

December 9, 2017

Day 9 - Using Kubernetes for multi-provider, multi-region batch jobs