Showing posts with label kubernetes. Show all posts
Showing posts with label kubernetes. Show all posts

December 16, 2021

Day 16 - Setting up k3s in your home lab

By: Joe Block (@curiousbiped)
Edited by: Jennifer Davis (@sigje)

Background

Compute, even at home with consumer-grade hardware, has gotten ridiculously cheap. You can get a quad-core ARM machine with 4GB like a Raspberry Pi 4 for under $150, including power supply and SD card for booting - and it'll idle at less than 5 watts of power draw and be completely silent because it is fanless.

What we're going to do

In this post, I'll show you how to set up a Kubernetes cluster on a cheap ARM board (or an x86 box if you prefer) using k3s and k3sup so you can learn Kubernetes without breaking an environment in use.

These instructions will also work on x86 machines, so you can repurpose that old hardware instead of buying a new Raspberry Pi.

Why k3s?

k3s was created by Rancher as a lightweight, easy to install, and secure Kubernetes option.

It's packaged as a single ~40MB binary that reduces the dependencies needed to get a cluster up and running. It even includes an embedded containerd, so you don't need to install that or docker. The ARM64 and ARM7 architectures are fully supported, so it's perfect for running on a Raspberry Pi in a home lab environment.

Why k3sup?

Alex Ellis wrote k3sup, a great tool for bringing up k3s clusters and we're going to use it in this post to simplify setting up a brand new cluster. With k3sup, we'll have a running kubernetes cluster in less than ten minutes.

Lets get started!

Pre-requisites.

  • A spare linux box. I'll be using a Raspberry Pi for my examples, but you can follow along on an x86 linux box or VM if you prefer.
  • k3sup - download the latest release from k3sup/releases into a directory in your $PATH.

Set up your cluster.

In the following example, I'm assuming you've created a user (you can use the pi user on rPi if you prefer) for configuring the cluster (I used borg below), you've added your ssh public key to ~pi/.ssh/authorized_keys and that the user has sudo privileges. I'm also assuming you've downloaded k3sup and put it into /usr/local/bin, and that /usr/local/bin is in your $PATH.

Create the leader node

The first step is to create the leader node with the k3sup utility:


k3sup install --host $HOSTNAME --user pi

Below is the output when I ran it against my scratch rPi. In the scrollback you'll see that I'm using my borg account instead of the pi user. After setting up the rPi, the first step I took was to disable the known pi account. I also specify the path to an SSH key that is in the borg account's authorized_keys, and configure the borg account to allow passwordless sudo.

Notice that I don't have to specify an architecture - k3sup automagically determines the architecture of the host and installs the correct binaries when it connects to the machine. All I have to do is tell it what host to connect to, what user to use, what ssh key, and whether I want to use the stable or latest k3s channels or a specific version.


❯ k3sup install --host cephalopod.example.com --user borg --ssh-key demo-key
--k3s-channel stable

k3sup install --host cephalopod.example.com --user borg --ssh-key demo-key --k3s-channel stable
Running: k3sup install
2021/12/13 16:30:49 cephalopod.example.com
Public IP: cephalopod.example.com
[INFO]  Finding release for channel stable
[INFO]  Using v1.21.7+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.21.7+k3s1/sha256sum-arm64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.21.7+k3s1/k3s-arm64
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Skipping /usr/local/bin/ctr symlink to k3s, command exists in PATH at /usr/bin/ctr
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
Result: [INFO]  Finding release for channel stable
[INFO]  Using v1.21.7+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.21.7+k3s1/sha256sum-arm64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.21.7+k3s1/k3s-arm64
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Skipping /usr/local/bin/ctr symlink to k3s, command exists in PATH at /usr/bin/ctr
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
[INFO]  systemd: Starting k3s
 Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.

Saving file to: /Users/jpb/democluster/kubeconfig

# Test your cluster with:
export KUBECONFIG=/Users/jpb/democluster/kubeconfig
kubectl config set-context default
kubectl get node -o wide

Test it out

Per the directions output by k3sup, you can now test your brand new cluster by setting the environment variable KUBECONFIG, and then run kubectl to work with your new cluster.

My steps to verify my new cluster is up and running:

  1. export KUBECONFIG=/Users/jpb/democluster/kubeconfig
  2. kubectl config set-context default
  3. kubectl get node -o wide

And I see nice healthy output where the status shows Ready -



NAME         STATUS   ROLES                  AGE     VERSION        INTERNAL-IP
 EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME

cephalopod   Ready    control-plane,master   2m53s   v1.21.7+k3s1   10.1.2.3
<none>        Ubuntu 18.04.3 LTS   4.9.196-63       containerd://1.4.12-k3s1

And I can also look at pods in the cluster



❯ kubectl get pods -A
Alias tip: kc get pods -A
NAMESPACE     NAME                                      READY   STATUS
RESTARTS   AGE
kube-system   coredns-7448499f4d-b2rdp                  1/1     Running     0
      9m29s
kube-system   local-path-provisioner-5ff76fc89d-d9rrc   1/1     Running     0
      9m29s
kube-system   metrics-server-86cbb8457f-cqk6q           1/1     Running     0
      9m29s
kube-system   helm-install-traefik-crd-jgk2x            0/1     Completed   0
      9m29s
kube-system   helm-install-traefik-l2j96                0/1     Completed   2
      9m29s
kube-system   svclb-traefik-7tzzs                       2/2     Running     0
      8m38s
kube-system   traefik-6b84f7cbc-92kkp                   1/1     Running     0
      8m38s

Clean Up

k3s is tidy and easy to uninstall, so you can stand up a cluster on a machine, do some experimentation, then dispose of the cluster and have a clean slate for your next experiment. This makes it great for continuous integration!


sudo /usr/local/bin/k3s-uninstall.sh to shut down the node and delete
/var/lib/rancher and the data stored there.

Next Steps

Learn kubernetes! Some interesting tutorials that I recommend -

Finally, now that you've set up a cluster the easy way, if you want to understand everything k3sup did behind the scenes to get your Kubernetes cluster up and running, Kubernetes the Hard Way by Kelsey Hightower is a must-read.

December 24, 2017

Day 24 - On-premise Kubernetes with dynamic load balancing using rke, Helm and NGINX

By: Sebastiaan van Steenis (@svsteenis)

Edited By: Mike Ciavarella (@mxcia)

Containers are a great solution for consistent software deployments. When you start using containers in your environment, you and your team will quickly realise that you need a system which allows you to automate container operations. You need a system to keep your containers running when stuff breaks (which always happens, expect failure!), be able to scale up and down, and which is also extensible, so you can interact with it or built upon it to get the functionality you need. The most popular system for deploying, scaling and managing containerized applications is Kubernetes.

Kubernetes is a great piece of software. It includes all the functionality you'll initially need to deploy, scale and operate your containers, as well as more advanced options for customising exactly how your containers are managed. A number of companies provide managed Kubernetes-as-a-service, but there are still plenty of use-cases that need to run on bare metal to meet regulatory requirements, use existing investments in hardware, or for other reasons. In this post we will use Rancher Kubernetes Engine (rke) to deploy a Kubernetes cluster on any machine you prefer, install the NGINX ingress controller, and setup dynamic load balancing across containers, using that NGINX ingress controller.

Setting up Kubernetes

Let's briefly go through the Kubernetes components before we deploy them. You can use the picture below for visualisation. Thanks to Lucas Käldström for creating this (@kubernetesonarm), used in his presentation on KubeCon.

Using rke, we can define 3 roles for our hosts:

  • control (controlplane)

    The controlplane consists of all the master components. In rke the etcd role is specified separatately but can be placed on the same host as the controlplane. The API server is the frontend to your cluster, handling the API requests you run (for example, through the Kubernetes CLI client kubectl which we talk about later). The controlplane also runs the controller manager, which is responsible for running controllers that execute routine tasks.

  • etcd

    The key-value store and the only component which has state, hence the term SSOT in the picture (Single Source of Truth). etcd needs quorum to operate, you can calculate quorum by using (n/2)+1 where n is the amount of members (which are usually hosts). This means for a production deployment, you would deploy at least 3 hosts with the etcd role. etcd will continue to function as long as it has quorum, so with 3 hosts with the etcd role you can have one host fail before you get in real trouble. Also make sure you have a backup plan for etcd.

  • worker

    A host with the worker role will be used to run the actual workloads. It will run the kubelet, which is basically the Kubernetes agent on a host. As one of its activities, kubelet will process the requested workload(s) for that host. Each worker will also run kube-proxy which is responsible for the networking rules and port forwarding. The container runtime we are using is Docker, and for this setup we'll be using the Flannel CNI plugin to handle the networking between all the deployed services on your cluster. Flannel will create an overlay network between the hosts, so that deployed containers can talk to each other.

For more information on Kubernetes components, see the Kubernetes documentation.

For this setup we'll be using 3 hosts, 1 host will be used as controlplane (master) and etcd (persistent data store) node, 1 will be used as worker for running containers and 1 host will be used as worker and loadbalancer entrypoint for your cluster.

Hosts

You need at least OpenSSH server 7 installed on your host, so rke can use it to tunnel to the Docker socket. Please note that there is a known issue when connecting as the root user on RHEL/CentOS based systems, you should use an other user on these systems.

SSH key authentication will be used to setup an SSH tunnel to the Docker socket, to launch the needed containers for Kubernetes to function. Tutorials how to set this up, can be found for Linux and Windows

Make sure you have either swap disabled on the host, or configure the following in cluster.yml for kubelet (we will generate cluster.yml in the next step)

kubelet:
  image: rancher/k8s:v1.8.3-rancher2
  extra_args: {"fail-swap-on":"false"}
Docker

The hosts need to run Linux and use Docker version 1.12.6, 1.13.1 or 17.03.2. These are the Docker versions that are validated for Kubernetes 1.8, which we will be deploying. For easy installation of Docker, Rancher provides shell scripts to install a specific Docker version. For this setup we will be using 17.03.2 which you can install using (for other versions, see https://github.com/rancher/install-docker):

curl https://releases.rancher.com/install-docker/17.03.2.sh | sudo sh

If you are not using the root user to connect to the host, make sure the user you are using can access the Docker socket (/var/run/docker.sock) on the host. This can be achieved by adding the user to the docker group (e.g. by using sudo usermod -aG docker your_username). For complete instructions, see the Docker documentation.

Networking

The network ports that will be used by rke are port 22 (to all hosts, for SSH) and port 6443 (to the master node, Kubernetes API).

rke

Note: in the examples we are using rke_darwin-amd64, which is the binary for MacOS. If you are using Linux, replace that with rke_linux-amd64.

Before we can use rke, we need to get the latest rke release, at the moment this is v0.0.8-dev. Download rke v0.0.8-dev from the GitHub release page, and place in a rke directory. We will be using this directory to create the cluster configuration file cluster.yml. Open a terminal, make sure you are in your rke directory (or that rke_darwin-amd64 is in your $PATH), and run ./rke_darwin-amd64 config. Pay close attention to specifying the correct SSH Private Key Path and the SSH User of host:

$ ./rke_darwin-amd64 config
Cluster Level SSH Private Key Path [~/.ssh/id_rsa]:
Number of Hosts [3]: 3
SSH Address of host (1) [none]: IP_MASTER_HOST
SSH Private Key Path of host (IP_MASTER_HOST) [none]:
SSH Private Key of host (IP_MASTER_HOST) [none]:
SSH User of host (IP_MASTER_HOST) [ubuntu]: root
Is host (IP_MASTER_HOST) a control host (y/n)? [y]: y
Is host (IP_MASTER_HOST) a worker host (y/n)? [n]: n
Is host (IP_MASTER_HOST) an Etcd host (y/n)? [n]: y
Override Hostname of host (IP_MASTER_HOST) [none]:
Internal IP of host (IP_MASTER_HOST) [none]:
Docker socket path on host (IP_MASTER_HOST) [/var/run/docker.sock]:
SSH Address of host (2) [none]: IP_WORKER_HOST
SSH Private Key Path of host (IP_WORKER_HOST) [none]:
SSH Private Key of host (IP_WORKER_HOST) [none]:
SSH User of host (IP_WORKER_HOST) [ubuntu]: root
Is host (IP_WORKER_HOST) a control host (y/n)? [y]: n
Is host (IP_WORKER_HOST) a worker host (y/n)? [n]: y
Is host (IP_WORKER_HOST) an Etcd host (y/n)? [n]: n
Override Hostname of host (IP_WORKER_HOST) [none]:
Internal IP of host (IP_WORKER_HOST) [none]:
Docker socket path on host (IP_WORKER_HOST) [/var/run/docker.sock]:
SSH Address of host (3) [none]: IP_WORKER_LB_HOST
SSH Private Key Path of host (IP_WORKER_LB_HOST) [none]:
SSH Private Key of host (IP_WORKER_LB_HOST) [none]:
SSH User of host (IP_WORKER_LB_HOST) [ubuntu]: root
Is host (IP_WORKER_LB_HOST) a control host (y/n)? [y]: n
Is host (IP_WORKER_LB_HOST) a worker host (y/n)? [n]: y
Is host (IP_WORKER_LB_HOST) an Etcd host (y/n)? [n]: n
Override Hostname of host (IP_WORKER_LB_HOST) [none]:
Internal IP of host (IP_WORKER_LB_HOST) [none]:
Docker socket path on host (IP_WORKER_LB_HOST) [/var/run/docker.sock]:
Network Plugin Type [flannel]:
Authentication Strategy [x509]:
Etcd Docker Image [quay.io/coreos/etcd:latest]:
Kubernetes Docker image [rancher/k8s:v1.8.3-rancher2]:
Cluster domain [cluster.local]:
Service Cluster IP Range [10.233.0.0/18]:
Cluster Network CIDR [10.233.64.0/18]:
Cluster DNS Service IP [10.233.0.3]:
Infra Container image [gcr.io/google_containers/pause-amd64:3.0]:

This will generate a cluster.yml file, which can be used by rke to setup the cluster. By default, Flannel is used as CNI network plugin. To secure the Kubernetes components, rke generates certificates and configures the Kubernetes components to use the created certificates.

You can always check or edit the file (cluster.yml) if you made a typo or used the wrong IP address somewhere.

We are now ready to let rke create the cluster for us (specifying --config is only necessary when cluster.yml is not present in the same directory where you are running the rke command)

$ ./rke_darwin-amd64 up --config cluster.yml
INFO[0000] Building Kubernetes cluster
INFO[0000] [ssh] Setup tunnel for host [IP_MASTER_HOST]
INFO[0000] [ssh] Setup tunnel for host [IP_MASTER_HOST]
INFO[0000] [ssh] Setup tunnel for host [IP_WORKER_HOST]
INFO[0001] [ssh] Setup tunnel for host [IP_WORKER_LB_HOST]
INFO[0001] [certificates] Generating kubernetes certificates
INFO[0001] [certificates] Generating CA kubernetes certificates
INFO[0002] [certificates] Generating Kubernetes API server certificates
INFO[0002] [certificates] Generating Kube Controller certificates
INFO[0002] [certificates] Generating Kube Scheduler certificates
INFO[0002] [certificates] Generating Kube Proxy certificates
INFO[0003] [certificates] Generating Node certificate
INFO[0003] [certificates] Generating admin certificates and kubeconfig
INFO[0003] [reconcile] Reconciling cluster state
INFO[0003] [reconcile] This is newly generated cluster
...
INFO[0263] Finished building Kubernetes cluster successfully

All done! Your Kubernetes cluster is up and running in under 5 minutes, and most of that time was spent on pulling the needed Docker images.

kubectl

The most common way to interact with Kubernetes is using kubectl. After the cluster has been setup, rke generates a ready-to-use configuration file which you can use with kubectl, called .kube_config_cluster.yml. Before we can use the file, you will need to install kubectl. Please refer to Kubernetes documentation on how to do this for your operating system.

Note: the Kubernetes documentation helps you to place the downloaded binary in a directory in your $PATH. The following commands are based on having kubectl in your PATH.

When you have kubectl installed, make sure you execute the comand in the rke directory (because we point to .kube_config_cluster.yml in that directory).

Now you can check the cluster by getting the node status:

$ kubectl --kubeconfig .kube_config_cluster.yml get nodes --show-labels
NAME              STATUS    ROLES         AGE       VERSION           LABELS
IP_MASTER_HOST   Ready     etcd,master   5m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_MASTER_HOST,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true
IP_WORKER_HOST     Ready     worker        5m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_WORKER_HOST,node-role.kubernetes.io/worker=true
IP_WORKER_LB_HOST     Ready     worker        5m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_WORKER_LB_HOST,node-role.kubernetes.io/worker=true

Note: as reference to each node, we will be using IP_MASTER_HOST, IP_WORKER_HOST and IP_WORKER_LB_HOST to identify respectively the master, worker, and the worker functioning as entrypoint (loadbalancer)

Three node cluster ready to run some containers. In the beginning I noted that we are going to use one worker node as loadbalancer, but at this point we can't differentiate both worker nodes. We need to use a host with the role worker. Let's make that possible by adding a label to that node:

$ kubectl --kubeconfig .kube_config_cluster.yml \
  label nodes IP_WORKER_LB_HOST role=loadbalancer
node "IP_WORKER_LB_HOST" labeled

Great, let's check if it was applied correctly:

$ kubectl --kubeconfig .kube_config_cluster.yml get nodes --show-labels
NAME              STATUS    ROLES         AGE       VERSION           LABELS
IP_MASTER_HOST   Ready     etcd,master   6m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_MASTER_HOST,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true
IP_WORKER_HOST     Ready     worker        6m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_WORKER_HOST,node-role.kubernetes.io/worker=true
IP_WORKER_LB_HOST     Ready     worker        6m       v1.8.3-rancher1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=IP_WORKER_LB_HOST,node-role.kubernetes.io/worker=true,role=loadbalancer

Note: If you mistakenly applied the label to the wrong host, you can remove it by adding a minus to the end of the label (e.g. kubectl --kubeconfig .kube_config_cluster.yml label nodes IP_WORKER_LB_HOST role=loadbalancer-)

Install and configure NGINX ingress controller

Helm

Helm is the package manager for Kubernetes, and allows you to easily install applications to your cluster. Helm uses charts to deploy applications; a chart is a collection of files that describe a related set of Kubernetes resources. Helm needs two components: a client (helm) and a server (tiller). Helm binaries are provided for all major platforms, download one and make sure it's available on your commandline (move it to a location in your $PATH). When installed correctly, you should be able to run helm help from the command line.

We bootstrap Helm by using the helm client to install tiller to the cluster. The helm command can use the same Kubernetes configuration file generated by rke. We tell helm which configuration to use by setting the KUBECONFIG environment variable as shown below:

$ cd rke
$ KUBECONFIG=.kube_config_cluster.yml helm init
Creating /homedirectory/username/.helm
Creating /homedirectory/username/.helm/repository
Creating /homedirectory/username/.helm/repository/cache
Creating /homedirectory/username/.helm/repository/local
Creating /homedirectory/username/.helm/plugins
Creating /homedirectory/username/.helm/starters
Creating /homedirectory/username/.helm/cache/archive
Creating /homedirectory/username/.helm/repository/repositories.yaml
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /homedirectory/username/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Happy Helming!

Assuming all went well, we can now check if Tiller is running by asking for the running version. Server should return a version here, as it will query the server side component (Tiller). It may take a minute to get Tiller started.

$ KUBECONFIG=.kube_config_cluster.yml helm version    
Client: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
A little bit on Pods, Services and Service Types

Services enable us to use service discovery within a Kubernetes cluster. Services allow us to use abstraction for one or more pods in your cluster. What is a pod? A pod is a set of one or more containers (usually Docker containers), with shared networking and storage. If you run a pod in your cluster, you usually would end up having two problems:

  • Scale: When running a single pod, you don't have any redundancy. You want to use a mechanism which ensures to run a given amount of pods, and be able to scale if needed. We will talk more on this when we are going to deploy our demo application later on.
  • Accessibility: What pods do you need to reach? (One static pod on one host is reachable, but how about scaling up and down, rescheduled pods?) and what IP address/name do you use the access the pod(s)?

By default, a Service will have the service type of ClusterIP. Which means it gets an internally accesible IP, which you can use to access your pods. The way the service knows what pods to target is by using a Label Selector. This will tell the Service to look for what labels on the pod to target.

Other service types are:

  • NodePort: expose the service on every host's IP on a selected port or randomly selected from the configured NodePort range (default: 30000-32767)
  • LoadBalancer: If a cloud provider is configured, this will request a loadbalancer from that cloudprovider and configure it as entrypoint. Cloud providers include AWS, Azure, GCE among others.
  • ExternalName: This makes it possible to configure a service to route to a predefined name outside the cluster by using a CNAME record in DNS.
Installing NGINX ingress controller

As the NGINX ingress controller meets all of the criteria of the technical requirements, it resides in the stable directory of Helm charts. As noted before, we labeled one node as our point of entry by applying the role=loadbalancer label to that node. We'll be using that label to pass onto the Helm chart and let the NGINX ingress controller get placed on the correct node. By default, the NGINX ingress controller gets created as service type LoadBalancer. Because we are assuming that you are running on-premise, this will not work. Service LoadBalancer will provision a loadbalancer from the configured cloud provider, which we didn't configure and is usually not available on a on-premise setup. Because of this we will set the service type to ClusterIP using the --set controller.service.type=ClusterIP argument. Secondly, because we don't have a external loadbalancer to get access to the services, we will configure the controller to use host networking. This way, the NGINX ingress controller will be reachable on the IP of the host. You can do so by setting controller.hostNetwork to true.

*NOTE: Another option is to use NodePort , which will use a port from the cluster defined range (30000-32767). You can use an external loadbalancer to loadbalance to this port on the node. For the simplicity of this post, I went for hostNetwork.*

$ KUBECONFIG=.kube_config_cluster.yml helm install stable/nginx-ingress \
--name nginx-ingress --set controller.nodeSelector."role"=loadbalancer --set controller.service.type=ClusterIP --set controller.hostNetwork=true

Run the following command to see if the deployment was successful, we should see

$ kubectl --kubeconfig .kube_config_cluster.yml rollout \
  status deploy/nginx-ingress-nginx-ingress-controller
deployment "nginx-ingress-nginx-ingress-controller" successfully rolled out

By default, the NGINX ingress controller chart will also deploy a default backend which will return default backend - 404 when no hostname was matched to a service. Let's test if the default backend was deployed successfully:

# First we get the loadbalancer IP (IP of the host running the NGINX ingress controller) and save it to variable $LOADBALANCERIP
$ LOADBALANCERIP=`kubectl --kubeconfig .kube_config_cluster.yml get node -l role=loadbalancer -o jsonpath={.items[*].status.addresses[?\(@.type==\"InternalIP\"\)].address}`
# Now we can curl that IP to see if we get the correct response
$ curl $LOADBALANCERIP
default backend - 404

Excellent, we reached the NGINX ingress controller. As there are no services defined, we get routed to the default backend which returns a 404.

Setup wildcard DNS entry

For this post, I decided to make a single host the entrypoint to the cluster. We applied the label role=loadbalancer to this host, and used it to schedule the deploy of the NGINX ingress controller. Now you can point a wildcard DNS record (*.kubernetes.yourdomain.com for example), to this IP. This will make sure that the hostname we will use for our demo application, will end it up on the host running the NGINX ingress controller (our designated entrypoint). In DNS terminology this would be (in this example, $LOADBALANCER is 10.10.10.10)

*.kubernetes IN A 10.10.10.10

With this configured, you can try reaching the default backend by running the curl to a host which resides under this wildcard record, i.e. dummy.kubernetes.yourdomain.com.

$ curl dummy.kubernetes.yourdomain.com
default backend - 404

Running and accessing the demo application

A little bit on ReplicaSet, Deployment and Ingress

Before we deploy our demo application, some explanation is needed on the terminology. Earlier, we talked about Services and services types to provide access to your pod/group of pods. And that running pods alone is not a failure-tolerant way of running your workload. To make this better, we can use a ReplicaSet. The basic functionality of a ReplicaSet is to run a specified number of pods. This would solve our problem of running single pods.

From the Kubernetes documentation:

While ReplicaSets can be used independently, today it’s mainly used by Deployments as a mechanism to orchestrate pod creation, deletion and updates. When you use Deployments you don’t have to worry about managing the ReplicaSets that they create. Deployments own and manage their ReplicaSets.

Deployments give us some other nice benefits, like checking the rollout status using kubectl rollout status. We will be using this when we deploy our demo application.

Last but not least, the Ingress. Usually, the components in your cluster will be for internal use. The components need to reach each other (web application, key value store, database) using the cluster network. But sometimes, you want to reach the cluster services from the outside (like our demo application). To make this possible, you need to deploy an Ingress definition. Deploying an Ingress definition without using an Ingress controller will give you limited functionality, that's why we deployed the NGINX ingress controller. By using the following key value under annotations, we make sure the NGINX ingress controller picks up our Ingress definition: kubernetes.io/ingress.class: "nginx"

Deploy demo application

For this post, we are using a simple web application. When you visit this web application, the UI will show you every container serving requests for this web application.

Let's create the files necessary to deploy our application, we'll be using a Deployment to create a ReplicaSet with 2 replicas and a Service to link our ingress to. Save to following as docker-demo.yml in the rke directory.

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: docker-demo-deployment
spec:
  selector:
    matchLabels:
      app: docker-demo
  replicas: 2
  template:
    metadata:
      labels:
        app: docker-demo
    spec:
      containers:
      - name: docker-demo
        image: ehazlett/docker-demo
        ports:
        - containerPort: 8080

---

apiVersion: v1
kind: Service
metadata:
  name: docker-demo-svc
spec:
  ports:
  - port: 8080
    targetPort: 8080
    protocol: TCP
  selector:
    app: docker-demo

Let's deploy this using kubectl:

$ kubectl --kubeconfig .kube_config_cluster.yml create -f docker-demo.yml
deployment "docker-demo-deployment" created
service "docker-demo-svc" created

Again, like in the previous deployment, we can query the deployment for its rollout status:

$ kubectl --kubeconfig .kube_config_cluster.yml rollout \
  status deploy/docker-demo-deployment
deployment "docker-demo-deployment" successfully rolled out

With this running, the web application is now accessible within the cluster. This is great when you need to connect web applications with backends like key-value stores, databases etcetera. For now, we just want this web application to be available through our loadbalancer. As we've already deployed the NGINX ingress controller before, we can now make our application accessible by using an Ingress resource. Let's create the ingress.yml file:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: docker-demo-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: docker-demo.kubernetes.yourdomain.com
    http:
      paths:
      - path: /
        backend:
          serviceName: docker-demo-svc
          servicePort: 8080

This is a fairly standard Ingress definition, we define a name and rules to access an application. We define the host that should be matched docker-demo.kubernetes.yourdomain.com, and what path should route to what backend service on what port. The annotation kubernetes.io/ingress.class: "nginx" tells the NGINX ingress controller that this Ingress resource should be processed. When this Ingress is created, the NGINX ingress controller will see it, process the rules (in this case, create a "vhost" and point it the backend/upstream to the created pods in the ReplicaSet created by the Deployment ). This means, after creation, you should be able to reach this web application on http://docker-demo.kubernetes.yourdomain.com. Let's launch the ingress and find out:

$ kubectl --kubeconfig .kube_config_cluster.yml create -f ingress.yml
ingress "docker-demo-ingress" created

Check out the web application on http://docker-demo.kubernetes.yourdomain.com

Wrapping up

Is this a full blown production setup? No. Keep in mind that it will need some work, but hopefully you have gained a basic understanding of some of the core parts of Kubernetes, how to deploy Kubernetes using rke, how to use Helm and the basics of the NGINX ingress controller. Let me give you some resources to continue the journey:

  • Try to scale your deployment to show more containers in the web application (kubectl scale -h)

  • rke supports HA, you can (and should) deploy multiple hosts with the controlplane and/or the etcd role.

  • Take a look at all the options of the NGINX ingress controller, see if it suits your needs

  • Explore how easy it is to use Let's Encrypt certificates on your ingresses by setting an extra annotation using kube-lego.

  • The NGINX ingress controller is a single point of failure (SPOF) now, explore how you can make this go away. Most companies use some kind of external loadbalancer which you could use for this.

  • Keep an eye on the Kubernetes Incubator Project external-dns, which can automatically create DNS records in supported providers.

  • To gain a deeper understanding of all the Kubernetes components, check out Kubernetes The Hard Way.

December 9, 2017

Day 9 - Using Kubernetes for multi-provider, multi-region batch jobs

By: Eric Sigler (@esigler)
Edited By: Michelle Carroll (@miiiiiche)

Introduction

At some point you may find yourself wanting to run work on multiple infrastructure providers — for reliability against certain kinds of failures, to take advantage of lower costs in capacity between providers during certain times, or for any other reason specific to your infrastructure. This used to be a very frustrating problem, as you’d be restricted to a “lowest common denominator” set of tools, or have to build up your own infrastructure primitives across multiple providers. With Kubernetes, we have a new, more sophisticated set of tools to apply to this problem.

Today we’re going to walk through how to set up multiple Kubernetes clusters on different infrastructure providers (specifically Google Cloud Platform and Amazon Web Services), and then connect them together using federation. Then we’ll go over how you can submit a batch job task to this infrastructure, and have it run wherever there’s available capacity. Finally, we’ll wrap up with how to clean up from this tutorial.

Overview

Unfortunately, there isn’t a one-step “make me a bunch of federated Kubernetes clusters” button. Instead, we’ve got several parts we’ll need to take care of:

  1. Have all of the prerequisites in place.
  2. Create a work cluster in AWS.
  3. Create a work cluster in GCE.
  4. Create a host cluster for the federation control plane in AWS.
  5. Join the work clusters to the federation control plane.
  6. Configure all clusters to correctly process batch jobs.
  7. Submit an example batch job to test everything.

Disclaimers

  1. Kubecon is the first week of December, and Kubernetes 1.9.0 is likely to be released the second week of December, which means this tutorial may go stale quickly. I’ll try to call out what is likely to change, but if you’re reading this and it’s any time after December 2017, caveat emptor.
  2. This is not the only way to set up Kubernetes (and federation). One of the two work clusters could be used for the federation control plane, and having a Kubernetes cluster with only one node is bad for reliability. A final example is that kops is a fantastic tool for managing Kubernetes cluster state, but production infrastructure state management often has additional complexity.
  3. All of the various CLI tools involved (gcloud, aws, kube*, and kops) have really useful environment variables and configuration files that can decrease the verbosity needed to execute commands. I’m going to avoid many of those in favor of being more explicit in this tutorial, and initialize the rest at the beginning of the setup.
  4. This tutorial is based off information from the Kubernetes federation documentation and kops Getting Started documentation for AWS and GCE wherever possible. When in doubt, there’s always the source code on GitHub.
  5. The free tiers of each platform won’t cover all the costs of going through this tutorial, and there are instructions at the end for how to clean up so that you shouldn’t incur unplanned expense — but always double check your accounts to be sure!

Setting up federated Kubernetes clusters on AWS and GCE

Part 1: Take care of the prerequisites

  1. Sign up for accounts on AWS and GCE.
  2. Install the AWS Command Line Interface - brew install awscli.
  3. Install the Google Cloud SDK.
  4. Install the Kubernetes command line tools - brew install kubernetes-cli kubectl kops
  5. Install the kubefed binary from the appropriate tarball for your system.
  6. Make sure you have an SSH key, or generate a new one.
  7. Use credentials that have sufficient access to create resources in both AWS and GCE. You can use something like IAM accounts.
  8. Have appropriate domain names registered, and a DNS zone configured, for each provider you’re using (Route53 for AWS, Cloud DNS for GCP). I will use “example.com” below — note that you’ll need to keep track of the appropriate records.

Finally, you’ll need to pick a few unique names in order to run the below steps. Here are the environment variables that you will need to set beforehand:

export S3_BUCKET_NAME="put-your-unique-bucket-name-here"
export GS_BUCKET_NAME="put-your-unique-bucket-name-here"

Part 2: Set up the work cluster in AWS

To begin, you’ll need to set up the persistent storage that kops will use for the AWS work cluster:

aws s3api create-bucket --bucket $S3_BUCKET_NAME

Then, it’s time to create the configuration for the cluster:

kops create cluster \
 --name="aws.example.com" \
 --dns-zone="aws.example.com" \
 --zones="us-east-1a" \
 --master-size="t2.medium" \
 --node-size="t2.medium" \
 --node-count="1" \
 --state="s3://$S3_BUCKET_NAME" \
 --kubernetes-version="1.8.0" \
 --cloud=aws

If you want to review the configuration, use kops edit cluster aws.example.com --state="s3://$S3_BUCKET_NAME". When you’re ready to proceed, provision the AWS work cluster by running:

kops update cluster aws.example.com --yes --state="s3://$S3_BUCKET_NAME"

Wait until kubectl get nodes --show-labels --context=aws.example.com shows the NODE role as Ready (it should take 3–5 minutes). Congratulations, you have your first (of three) Kubernetes clusters ready!

Part 3: Set up the work cluster in GCE

OK, now we’re going to do a very similar set of steps for our second work cluster, this one on GCE. First though, we need to have a few extra environment variables set:

export PROJECT=`gcloud config get-value project`
export KOPS_FEATURE_FLAGS=AlphaAllowGCE

As the documentation points out, using kops with GCE is still considered alpha. To keep each cluster using vendor-specific tools, let’s set up state storage for the GCE work cluster using Google Storage:

gsutil mb gs://$GS_BUCKET_NAME/

Now it’s time to generate the configuration for the GCE work cluster:

kops create cluster \
 --name="gcp.example.com" \
 --dns-zone="gcp.example.com" \
 --zones="us-east1-b" \
 --state="gs://$GS_BUCKET_NAME/" \
 --project="$PROJECT" \
 --kubernetes-version="1.8.0" \
 --cloud=gce

As before, use kops edit cluster gcp.example.com --state="gs://$GS_BUCKET_NAME/" to peruse the configuration. When ready, provision the GCE work cluster by running:

kops update cluster gcp.example.com --yes --state="gs://$GS_BUCKET_NAME/"

And once kubectl get nodes --show-labels --context=gcp.example.com shows the NODE role as Ready, your second work cluster is complete!

Part 4: Set up the host cluster

It’s useful to have a separate cluster that hosts the federation control plane. In production, it’s better to have this isolation to be able to reason about failure modes for different components. In the context of this tutorial, it’s easier to reason about which cluster is doing what work.

In this case, we can use the existing S3 bucket we’ve previously created to hold the configuration for our second AWS cluster — no additional S3 bucket needed! Let’s generate the configuration for the host cluster, which will run the federation control plane:

kops create cluster \
 --name="host.example.com" \
 --dns-zone="host.example.com" \
 --zones=us-east-1b \
 --master-size="t2.medium" \
 --node-size="t2.medium" \
 --node-count="1" \
 --state="s3://$S3_BUCKET_NAME" \
 --kubernetes-version="1.8.0" \
 --cloud=aws

Once you’re ready, run this command to provision the cluster:

kops update cluster host.example.com --yes --state="s3://$S3_BUCKET_NAME"

And one last time, wait until kubectl get nodes --show-labels --context=host.example.com shows the NODE role as Ready.

Part 5: Set up the federation control plane

Now that we have all of the pieces we need to do work across multiple providers, let’s connect them together using federation. First, add aliases for each of the clusters:

kubectl config set-context aws --cluster=aws.example.com --user=aws.example.com
kubectl config set-context gcp --cluster=gcp.example.com --user=gcp.example.com
kubectl config set-context host --cluster=host.example.com --user=host.example.com

Next up, we use the kubefed command to initialize the control plane, and add itself a member:

kubectl config use-context host
kubefed init fed --host-cluster-context=host --dns-provider=aws-route53 --dns-zone-name="example.com"

If the message “Waiting for federation control plane to come up” takes an unreasonably long amount of time to appear, you can check the underlying pods for any issues by running:

kubectl get all --context=host.example.com --namespace=federation-system
kubectl describe po/fed-controller-manager-EXAMPLE-ID --context=host.example.com --namespace=federation-system

Once you see “Federation API server is running,” we can join the work clusters to the federation control plane:

kubectl config use-context fed
kubefed join aws --host-cluster-context=host --cluster-context=aws
kubefed join gcp --host-cluster-context=host --cluster-context=gcp
kubectl --context=fed create namespace default

To confirm everything’s working, you should see the aws and gcp clusters when you run:

kubectl --context=fed get clusters

Part 6: Set up the batch job API

(Note: This is likely to change as Kubernetes evolves — this was tested on 1.8.0.) We’ll need to edit the federation API server in the control plane, and enable the batch job API. First, let’s edit the deployment for the fed-apiserver:

kubectl --context=host --namespace=federation-system edit deploy/fed-apiserver

And within the configuration in the federation-apiserver section, add a –runtime-config=batch/v1 line, like so:

  containers:
  - command:
    - /hyperkube
    - federation-apiserver
    - --admission-control=NamespaceLifecycle
    - --bind-address=0.0.0.0
    - --client-ca-file=/etc/federation/apiserver/ca.crt
    - --etcd-servers=http://localhost:2379
    - --secure-port=8443
    - --tls-cert-file=/etc/federation/apiserver/server.crt
    - --tls-private-key-file=/etc/federation/apiserver/server.key
  ... Add the line:
    - --runtime-config=batch/v1

Then restart the Federation API Server and Cluster Manager pods by rebooting the node running them. Watch kubectl get all --context=host --namespace=federation-system if you want to see the various components change state. You can verify the change applied by running the following Python code:

# Sample code from Kubernetes Python client
from kubernetes import client, config


def main():
    config.load_kube_config()

    print("Supported APIs (* is preferred version):")
    print("%-20s %s" %
          ("core", ",".join(client.CoreApi().get_api_versions().versions)))
    for api in client.ApisApi().get_api_versions().groups:
        versions = []
        for v in api.versions:
            name = ""
            if v.version == api.preferred_version.version and len(
                    api.versions) > 1:
                name += "*"
            name += v.version
            versions.append(name)
        print("%-40s %s" % (api.name, ",".join(versions)))

if __name__ == '__main__':
    main()      

You should see output from that Python script that looks something like:

> python api.py
Supported APIs (* is preferred version):
core                 v1
federation           v1beta1
extensions           v1beta1
batch                v1

Part 7: Submitting an example job

Following along from the Kubernetes batch job documentation, create a file, pi.yaml with the following contents:

apiVersion: batch/v1
kind: Job
metadata:
  generateName: pi-
spec:
  template:
    metadata:
      name: pi
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

This job spec:

  • Runs a single container to generate the first 2,000 digits of Pi.
  • Uses a generateName, so you can submit it multiple times (each time it will have a different name).
  • Sets restartPolicy: Never, but OnFailure is also allowed for batch jobs.
  • Sets backoffLimit. This generates a parse violation in 1.8, so we have to disable validation.

Now you can submit the job, and follow it across your federated set of Kubernetes clusters. First, at the federated control plane level, submit and see which work cluster it lands on:

kubectl --validate=false --context=fed create -f ./pi.yaml 
kubectl --context=fed get jobs
kubectl --context=fed describe job/<JOB IDENTIFIER>

Then (assuming it’s the AWS cluster — if not, switch the context below), dive in deeper to see how the job finished:

kubectl --context=aws get jobs
kubectl --context=aws describe job/<JOB IDENTIFIER>
kubectl --context=aws get pods
kubectl --context=aws describe pod/<POD IDENTIFIER>
kubectl --context=aws logs <POD IDENTIFIER>

If all went well, you should see the output from the job. Congratulations!

Cleaning up

Once you’re done trying out this demonstration cluster, you can clean up all of the resources you created by running:

kops delete cluster gcp.example.com --yes --state="gs://$GS_BUCKET_NAME/"
kops delete cluster aws.example.com --yes --state="s3://$S3_BUCKET_NAME"
kops delete cluster host.example.com --yes --state="s3://$S3_BUCKET_NAME"

Don’t forget to verify in the AWS and GCE console that everything was removed, to avoid any unexpected expenses.

Conclusion

Kubernetes provides a tremendous amount of infrastructure flexibility to everyone involved in developing and operating software. There are many different applications for federated Kubernetes clusters, including:

Good luck to you in whatever your Kubernetes design patterns may be, and happy SysAdvent!

December 5, 2017

Day 5 - Do you want to build a helm chart?

By: Paul Czarkowski (@pczarkowski)

Edited By: Paul Stack (@stack72)

Kubernetes Kubernetes Kubernetes is the new Docker Docker Docker

“Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.”

https://kubernetes.io

Remember way back in 2015 when all anybody would talk about was Docker even though nobody actually knew what it was or what to do with it? That’s where Kubernetes is right now. Before we get into Helm charts, a quick primer on Kubernetes is a good idea.

Kubernetes provides scheduling and management for your containerized applications as well as the networking and other necessary plumbing and surfaces its resources to the developer in the form of declarative manifests written in YAML or JSON.

A Pod is the smallest deployable unit that can be deployed with Kubernetes. It contains one or more collocated containers that share the same [internal] IP address. Generally a pod is just a single container, but if your application requires a sidecar container to share the same IP or a shared volume then you would declare multiple containers in the same pod.

A Pod is unmanaged and will not be recreated if the process inside the container ends. Kubernetes has a number of resources that build upon a pod to provide different types of lifecycle management such as a Deployment that will ensure the correct number of replicas of your Pod are running.

A Service provides a stable name, IP address, and DNS (if KubeDNS is enabled) across a number of pods and acts as a basic load balancer. This allows pods to easily communicate to each other and can also provide a way for Kubernetes to expose your pod externally.

Helm is a Package Manager for Kubernetes. It doesn’t package the containers (or Pods) directly but instead packages the kubernetes manifests used to build those Pods. It also provides a templating engine that allows you to deploy your application in a number of different scenarios and configurations.

[Helm] Charts are easy to create, version, share, and publish — so start using Helm and stop the copy-and-paste madness. – https://helm.sh/

Let’s Build a Helm Chart!

In order to follow along with this tutorial you will need to install the following:

If you are on a Mac you should be able to use the following to install the necessary bits:

$ brew cask install minikube
$ brew install kubernetes-helm

If you already have a Kubernetes manifest its very easy to turn it into a Helm Chart that you can then iterate over and improve as you need to add more flexibility to it. In fact your first iteration of a Helm chart can be your existing manifests tied together with a simple Chart.yaml file.

Prepare Environment

Bring up a test Kubernetes environment using Minikube:

$ minikube start
Starting local Kubernetes v1.7.5 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.

Wait a minute or so and then install Helm’s tiller service to Kubernetes:

$ helm init
$HELM_HOME has been configured at /home/XXXX/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Happy Helming!

If it fails out you may need to wait a few more minutes for minikube to become accessible.

Create a path to work in:

$ mkdir -p ~/development/my-first-helm-chart

$ cd ~/development/my-first-helm-chart

Create Example Kubernetes Manifest.

Writing a Helm Chart is easier when you’re starting with an existing set of Kubernetes manifests. One of the easiest ways to get a basic working manifest is to ask Kubernetes to run something and output the resultant manifest to a file.

Run a basic nginx Deployment and expose it via a NodePort Service:

$ mkdir -p templates

$ kubectl run example --image=nginx:alpine \
    -o yaml > templates/deployment.yaml
    
$ kubectl expose deployment example --port=80 --type=NodePort \
    -o yaml > templates/service.yaml

Minikube has some helper functions to let you easily find the URL of your service. run curl against your service to ensure that its running as expected:

$ minikube service example --url
http://192.168.99.100:30254

$ curl $(minikube service example --url)
...
<title>Welcome to nginx!</title>
...

You’ll see you now have two Kubernetes manifests saved. We can use these to bootstrap our helm charts:

$ tree
└── templates
    ├── deployment.yaml
    └── service.yaml

Explore the deployment.yaml file in a text editor. Following is an abbreviated version of it with comments to help you understand what some of the sections mean:

# These first two lines appear in every Kubernetes manifest and provide
# a way to declare the type of resource and the version of the API to
# interact with.
apiVersion: apps/v1beta1
kind: Deployment
# under metadata you set the resource's name and can assign labels to it
# these labels can be used to tie resources together. In the service.yaml
# file you'll see it refers back to this `run: example` label.
metadata:
  labels:
    run: example
  name: example
# how many replicas of the declared pod should I run ?
spec:
  replicas: 1
  selector:
    matchLabels:
      run: example
# the Pod that the Deployment will manage the lifecycle of.
# You can see once again the use of the label and the containers
# to run as part of the pod.
  template:
    metadata:
      labels:
        run: example
    spec:
      containers:
      - image: nginx:alpine
        imagePullPolicy: IfNotPresent
        name: example

Explore the service.yaml file in a text editor. Following is an abbreviated version of it:

apiVersion: v1
kind: Service
metadata:
  labels:
    run: example
  name: example
spec:
# The clusterIP is the IP address that other nodes can use to access the pods
# Since we didn't specify and IP Kubernetes picked one for us.
  clusterIP: 10.0.0.62
# The Port mappings for the service.
  ports:
  - nodePort: 32587
    port: 80
    protocol: TCP
    targetPort: 80
# Any pods that have this label will be exposed by this service.    
  selector:
    run: example
# All Kubernetes worker nodes will expose this service to the outside world
# on the port specified above as `nodePort`.
  type: NodePort

Delete the resources you just created so that you can move on to creating the Helm Chart:

$ kubectl delete service,deployment example
service "example" deleted
deployment "example" deleted

Create and Deploy a Basic Helm Chart

The minimum set of things needed for a valid helm chart is a set of templates (which we just created) and a Chart.yaml file which we need to create.

Copy and paste the following into your text editor of choice and save it as Chart.yaml:

Note: the file should be capitalized as shown above in order for Helm to use it correctly.

apiVersion: v1
description: My First Helm Chart
name: my-first-helm-chart
version: 0.1.0

We now have the the most basic Helm Chart possible:

$ tree
.
├── Chart.yaml
└── templates
    ├── deployment.yaml
    └── service.yaml

Next you should be able to install this helm chart giving it a release name of first and using the current directory as the source of the Helm Chart:

$ helm install -n example .
NAME:   example
LAST DEPLOYED: Wed Nov 22 10:55:11 2017
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1beta1/Deployment
NAME     DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
example  1        1        1           0          1s

==> v1/Service
NAME     CLUSTER-IP  EXTERNAL-IP  PORT(S)       AGE
example  10.0.0.62   <nodes>      80:32587/TCP  1s

Just as you did earlier you can use minikube to get the URL:

$ curl $(minikube service example --url)
...
<title>Welcome to nginx!</title>

Congratulations! You’ve just created and deployed your first Helm chart. However its a little bit basic, the next step is to add some templating to the manifests and update the deployment.

Add variables to your Helm Chart

In order to render templates you need a set of variables. Helm charts can come with a values.yaml file which declares a set of variables and their default values that can be used in your templates. Create a values.yaml file that looks like this:

replicaCount: 2
image: "nginx:alpine"

These values can be accessed in the templates using the golang templating engine. For example the value replicaCount would be written as {{ .Values.replicaCount }}. Helm also provides information about the Chart and Release that can be handy to utilize.

Update your templates/deployment.yaml to utilize our values:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  labels:
    run: "{{ .Release.Name }}"
    chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
    release: "{{ .Release.Name }}"
  name: "{{ .Release.Name }}"
  namespace: default
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      run: "{{ .Release.Name }}"
  template:
    metadata:
      labels:
        run: "{{ .Release.Name }}"
    spec:
      containers:
      - image: "{{ .Values.image }}"
        name: "{{ .Release.Name }}"

Edit your templates/service.yaml to look like:

apiVersion: v1
kind: Service
metadata:
  name: "{{ .Release.Name }}"
  labels:
    run: "{{ .Release.Name }}"
    chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
    release: "{{ .Release.Name }}"
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: "{{ .Release.Name }}"
  type: NodePort

Once your files are written out you should be able to update your deployment:

$ helm upgrade example .
Release "example" has been upgraded. Happy Helming!
LAST DEPLOYED: Wed Nov 22 11:12:25 2017
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/Service
NAME     CLUSTER-IP  EXTERNAL-IP  PORT(S)       AGE
example  10.0.0.79   <nodes>      80:31664/TCP  14s

==> v1beta1/Deployment
NAME     DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
example  2        2        2           2          14s

You’ll notice that your Deployment now shows as having two replicas of your pod demonstrating that the replicas value provided has been applied:

$ kubectl get deployments
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
example   2         2         2            2           2m

$ kubectl get pods       
NAME                       READY     STATUS    RESTARTS   AGE
example-5c794cbb55-cvn4k   1/1       Running   0          2m
example-5c794cbb55-dc7gf   1/1       Running   0          2m

$ curl $(minikube service example --url)
...
<title>Welcome to nginx!</title>

You can override values on the command line when you install (or upgrade) a Release of your Helm Chart. Create a new release of your helm chart setting the image to apache instead of nginx:

$ helm install -n apache . --set image=httpd:alpine --set replicaCount=3
NAME:   apache
LAST DEPLOYED: Wed Nov 22 11:20:06 2017
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1beta1/Deployment
NAME    DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
apache  3        3        3           0          0s

==> v1/Service
NAME    CLUSTER-IP  EXTERNAL-IP  PORT(S)       AGE
apache  10.0.0.220  <nodes>      80:30841/TCP  0s

Kubernetes will now show two sets of Deployments and Services and their corresponding pods:

$ kubectl get svc,deployment,pod                                        
NAME             TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
svc/apache       NodePort    10.0.0.220   <none>        80:30841/TCP   1m
svc/example      NodePort    10.0.0.79    <none>        80:31664/TCP   8m
svc/kubernetes   ClusterIP   10.0.0.1     <none>        443/TCP        58m

NAME             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/apache    3         3         3            3           1m
deploy/example   2         2         2            2           8m

NAME                          READY     STATUS    RESTARTS   AGE
po/apache-5dc6dcd8b5-2xmpn    1/1       Running   0          1m
po/apache-5dc6dcd8b5-4kkt7    1/1       Running   0          1m
po/apache-5dc6dcd8b5-d2pvt    1/1       Running   0          1m
po/example-5c794cbb55-cvn4k   1/1       Running   0          8m
po/example-5c794cbb55-dc7gf   1/1       Running   0          8m

By templating the manifests earlier to use the Helm release name in the labels for the Kubernetes resources the Services for each release will only talk to its corresponding Deployments:

$ curl $(minikube service example --url)
...
<title>Welcome to nginx!</title>

$ curl $(minikube service apache --url)
<html><body><h1>It works!</h1></body></html>

Clean Up

Delete your helm deployments:

$ helm delete example --purge          
release "example" deleted

$ helm delete apache --purge          
release "apache" deleted

$ minikube delete
Deleting local Kubernetes cluster...
Machine deleted.

Summary

Congratulations you have deployed a Kubernetes cluster on your laptop using minikube and deployed a basic application to Kubernetes by creating a Deployment and a Service. You have also built your very first Helm chart and used the Helm templating engine to deploy different versions of the application.

Helm is a very powerful way to package up your Kubernetes manifests to make them extensible and portable. While it is quite complicated its fairly easy to get started with it and if you’re like me you’ll find yourself replacing the Kubernetes manifests in your code repos with Helm Charts.

There’s a lot more you can do with Helm, we’ve just scratched the surface. Enjoy using and learning more about them!

December 19, 2016

Day 19 - Troubleshooting Docker and Kubernetes

Written by: Jorge Salamero (@bencerillo)
Edited by: Brian O'Rourke (@borourke)

Container orchestration platforms like Kubernetes, DC/OS Mesos or Docker Swarm help towards making your experience like riding an unicorn over a rainbow, but don’t help much with troubleshooting containers:

  • They are isolated, there is a barrier between you and the process you want to monitor and traditional troubleshooting tools run on the host doesn’t understand containers, namespaces and orchestration platforms.
  • They bring a minimal runtime, just the service and its dependencies without all the troubleshooting tools, think of troubleshooting with just busybox!
  • They are scheduled across your cluster… containers move, scale up and down. Are highly volatile, appearing and disappearing as the process ends, gone.
  • And talk to each other through new virtual network layers.

Today we will demonstrate through a real use case how to do troubleshooting in Kubernetes. The scenario will use is a simple Kubernetes service with 3 Nginx pods and a client with curl. In the previous link you will find the backend.yaml file we will use for this scenario.

If you are new to Kubernetes services, we explained how to deploy this service and learned how it works in Understanding how Kubernetes DNS Services work.To bring up the setup, will run:

$ kubectl create namespace critical-appnamespace “critical-app” created$ kubectl create -f backend.yamlservice “backend” createddeployment “backend” created

And then will spawn a client to load our backend service:

$ kubectl run -it –image=tutum/curl client –namespace critical-app –restart=Never

Part1: Network troubleshooting Kubernetes services

From our client container we could simply run a test by doing root@client:/# curl backend to see how our Kubernetes service works. But we don’t want to leave things loose and we thought that using fully qualified domain names is a good idea. If we go and check Kubernetes documentation it says that every service gets this default DNS entry: my-svc.my-namespace.svc.cluster.local. So let’s instead use the full domain name.Let’s go back to the curl client container shell and run: root@client:/# curl backend.critical-app.svc.cluster.local. But this time curl hangs for 10 seconds and then correctly returns the expected website! As a distributed systems engineer, this is one of the worst things that can happen: you want something to fail or succeed straight away, not a wait of 10 seconds.To troubleshoot what’s going on, we will use sysdig. Sysdig is an open source linux visibility tool that offers native visibility into containers, including Docker, Kubernetes, DC/OS, and Mesos just to name a few. Combining the functionality of htop, tcpdump, strace, lsof, netstat, etc in one open source tool, Sysdig gives you all of the system calls and application data in the context of your Kubernetes infrastructure. Monitoring Kubernetes with Sysdig is a good introduction to using the tool with Kubernetes.

To analyze what is going on, we will ask sysdig to dump all the information into a capture file:$ sudo sysdig -k http://127.0.0.1:8080 -s8192 -zw capture.scap

I’ll quickly explain each parameter here:

-k http://localhost:8080 connects to Kubernetes API

-s8192 enlarges the IO buffers, as we need to show full content, otherwise gets cut off by default

-zw capture.scap compresses and dumps into a file all system calls and metadataIn parallel, we’ll reproduce this hairy issue again running the curl command: # curl backend.critical-app.svc.cluster.local. This ensures that we have all the appropriate data in the file we captured above to reproduce the scenario and troubleshoot the issue.Once curl returns, we can Ctrl+C sysdig to stop the capture, and we will have a ~10s capture file of everything that happened in our Kubernetes host. We can now start troubleshooting the issue either in the cluster or out of band, basically anywhere we copy the file with sysdig installed.$ sysdig -r capture.scap -pk -NA “fd.type in (ipv4, ipv6) and (k8s.ns.name=critical-app or proc.name=skydns)” | less

Let me explain each parameter here as well:

-r capture.scap reads from a capture file

-pk prints Kubernetes fields in stdout

-NA shows ASCII output

And the filter between double quotes. Sysdig is able to understand Kubernetes semantics so we can filter out traffic on sockets IPv4 or IPv6, coming from any container in the namespace critical-app or from any process named skydns. We included proc.name=skydns because this is the internal Kubernetes DNS resolver and runs outside our namespace, as part of the Kubernetes infrastructure.

Sysdig also has an interactive ncurses interface htop alike

In order to follow along with this troubleshooting example, you can download the capture file capture.scap and explore it yourself with sysdig.Immediately we see how curl tries to resolve the domain name but on the DNS query payload we have something odd (10049): backend.critical-app.svc.cluster.local.critical-app.svc.cluster.local. Seems like for some reason curl didn’t understand I gave it a fully qualified domain name already and decided to append a search domain to it.

[…]

10030 16:41:39.536689965 0 client (b3a718d8b339) curl (22370:13) < socket fd=3(<4>) 10031 16:41:39.536694724 0 client (b3a718d8b339) curl (22370:13) > connect fd=3(<4>) 10032 16:41:39.536703160 0 client (b3a718d8b339) curl (22370:13) < connect res=0 tuple=172.17.0.7:46162->10.0.2.15:53 10048 16:41:39.536831645 1 <NA> (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53) 10049 16:41:39.536834352 1 <NA> (36ae6d09d26e) skydns (17280:11) < recvmsg res=87 size=87 data=backendcritical-appsvcclusterlocalcritical-appsvcclusterlocal tuple=::ffff:172.17.0.7:46162->:::53 10050 16:41:39.536837173 1 <NA> (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)

[…]

SkyDNS makes a request (10097) to /local/cluster/svc/critical-app/local/cluster/svc/critical-app/backend through the etcd API. Obviously etcd doesn’t recognize that service and returns (10167) a “Key not found”. This is passed back to curl via DNS query response.

[…]

10096 16:41:39.538247116 1 <NA> (36ae6d09d26e) skydns (4639:8) > write fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=221 10097 16:41:39.538275108 1 <NA> (36ae6d09d26e) skydns (4639:8) < write res=221 data=GET /v2/keys/skydns/local/cluster/svc/critical-app/local/cluster/svc/critical-app/backend?quorum=false&recursive=true&sorted=false HTTP/1.1Host: 10.0.2.15:4001User-Agent: Go 1.1 package httpAccept-Encoding: gzip10166 16:41:39.538636659 1 <NA> (36ae6d09d26e) skydns (4617:1) > read fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=4096 10167 16:41:39.538638040 1 <NA> (36ae6d09d26e) skydns (4617:1) < read res=285 data=HTTP/1.1 404 Not FoundContent-Type: application/jsonX-Etcd-Cluster-Id: 7e27652122e8b2aeX-Etcd-Index: 1259Date: Thu, 08 Dec 2016 15:41:39 GMTContent-Length: 112{“errorCode”:100,“message”:“Key not found”,“cause”:“/skydns/local/cluster/svc/critical-app/local”,“index”:1259}

[…]

curl doesn’t give up and tries again (10242) but this time with backend.critical-app.svc.cluster.local.svc.cluster.local. Looks like curl is trying a different search domain this time, as critical-app was removed from the appended domain. Of course, when forwarded to etcd (10274), this fails again (10345).

[…]

10218 16:41:39.538914765 0 client (b3a718d8b339) curl (22370:13) < connect res=0 tuple=172.17.0.7:35547->10.0.2.15:53 10242 16:41:39.539005618 1 <NA> (36ae6d09d26e) skydns (17280:11) < recvmsg res=74 size=74 data=backendcritical-appsvcclusterlocalsvcclusterlocal tuple=::ffff:172.17.0.7:35547->:::53 10247 16:41:39.539018226 1 <NA> (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53) 10248 16:41:39.539019925 1 <NA> (36ae6d09d26e) skydns (17280:11) < recvmsg res=74 size=74 data=0]backendcritical-appsvcclusterlocalsvcclusterlocal tuple=::ffff:172.17.0.7:35547->:::53 10249 16:41:39.539022522 1 <NA> (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53) 10273 16:41:39.539210393 1 <NA> (36ae6d09d26e) skydns (4639:8) > write fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=208 10274 16:41:39.539239613 1 <NA> (36ae6d09d26e) skydns (4639:8) < write res=208 data=GET /v2/keys/skydns/local/cluster/svc/local/cluster/svc/critical-app/backend?quorum=false&recursive=true&sorted=false HTTP/1.1

Host: 10.0.2.15:4001User-Agent: Go 1.1 package httpAccept-Encoding: gzip10343 16:41:39.539465153 1 <NA> (36ae6d09d26e) skydns (4617:1) > read fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=4096 10345 16:41:39.539467440 1 <NA> (36ae6d09d26e) skydns (4617:1) < read res=271 data=HTTP/1.1 404 Not Found[…]

curl will try once again, this time appending cluster.local as we can see the DNS query request (10418) to backend.critical-app.svc.cluster.local.cluster.local. This one (10479) obviously fails as well (10524), again.

[…]

10396 16:41:39.539686075 0 client (b3a718d8b339) curl (22370:13) < connect res=0 tuple=172.17.0.7:40788->10.0.2.15:53 10418 16:41:39.539755453 0 <NA> (36ae6d09d26e) skydns (17280:11) < recvmsg res=70 size=70 data=backendcritical-appsvcclusterlocalclusterlocal tuple=::ffff:172.17.0.7:40788->:::53 10433 16:41:39.539800679 0 <NA> (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53) 10434 16:41:39.539802549 0 <NA> (36ae6d09d26e) skydns (17280:11) < recvmsg res=70 size=70 data=backendcritical-appsvcclusterlocalclusterlocal tuple=::ffff:172.17.0.7:40788->:::53 10437 16:41:39.539805177 0 <NA> (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53) 10478 16:41:39.540166087 1 <NA> (36ae6d09d26e) skydns (4639:8) > write fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=204 10479 16:41:39.540183401 1 <NA> (36ae6d09d26e) skydns (4639:8) < write res=204 data=GET /v2/keys/skydns/local/cluster/local/cluster/svc/critical-app/backend?quorum=false&recursive=true&sorted=false HTTP/1.1Host: 10.0.2.15:4001User-Agent: Go 1.1 package httpAccept-Encoding: gzip10523 16:41:39.540421040 1 <NA> (36ae6d09d26e) skydns (4617:1) > read fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=4096 10524 16:41:39.540422241 1 <NA> (36ae6d09d26e) skydns (4617:1) < read res=267 data=HTTP/1.1 404 Not Found[…]

To the untrained eye, it might look that we have found the issue: a bunch of inefficient calls. But actually this is not true. If we look at the timestamps, the difference between the first etcd request (10097) and the last one (10479), the timestamps in the second column are less than 10ms apart. We are looking at an issue of seconds, not milliseconds - so where is the wait then?When we keep looking through the capture file, we can see that curl doesn’t stop trying with DNS queries to SkyDNS, now with backend.critical-app.svc.cluster.local.localdomain (10703). This .localdomain is not recognized by SkyDNS as an internal domain for Kubernetes so instead of going to etcd, it decides to forward this query to its upstream DNS resolver (10691).

[…]

10690 16:41:39.541376928 1 <NA> (36ae6d09d26e) skydns (4639:8) > connect fd=8(<4>) 10691 16:41:39.541381577 1 <NA> (36ae6d09d26e) skydns (4639:8) < connect res=0 tuple=10.0.2.15:44249->8.8.8.8:53 10702 16:41:39.541415384 1 <NA> (36ae6d09d26e) skydns (4639:8) > write fd=8(<4u>10.0.2.15:44249->8.8.8.8:53) size=68 10703 16:41:39.541531434 1 <NA> (36ae6d09d26e) skydns (4639:8) < write res=68 data=Nbackendcritical-appsvcclusterlocallocaldomain 10717 16:41:39.541629507 1 <NA> (36ae6d09d26e) skydns (4639:8) > read fd=8(<4u>10.0.2.15:44249->8.8.8.8:53) size=512 10718 16:41:39.541632726 1 <NA> (36ae6d09d26e) skydns (4639:8) < read res=-11(EAGAIN) data= 58215 16:41:43.541261462 1 <NA> (36ae6d09d26e) skydns (4640:9) > close fd=7(<4u>10.0.2.15:54272->8.8.8.8:53) 58216 16:41:43.541263355 1 <NA> (36ae6d09d26e) skydns (4640:9) < close res=0

[…]

Scanning down the timestamp column we see the first large gap when SkyDNS sends out a request and then hangs for about 4 seconds (10718-58215). Given that .localdomain is not a valid TLD (top level domain), the upstream server will be just ignoring this request. After the timeout, SkyDNS tries again with the same query (75923), hanging for another few more seconds (75927-104208). In total we have been waiting around 8 seconds for a DNS entry that doesn’t exist and is being ignored.

[…]

58292 16:41:43.542822050 1 <NA> (36ae6d09d26e) skydns (4640:9) < write res=68 data=Nbackendcritical-appsvcclusterlocallocaldomain 58293 16:41:43.542829001 1 <NA> (36ae6d09d26e) skydns (4640:9) > read fd=8(<4u>10.0.2.15:56371->8.8.8.8:53) size=512 58294 16:41:43.542831896 1 <NA> (36ae6d09d26e) skydns (4640:9) < read res=-11(EAGAIN) data= 75904 16:41:44.543459524 0 <NA> (36ae6d09d26e) skydns (17280:11) < recvmsg res=68 size=68 data=[…]

75923 16:41:44.543560717 0 <NA> (36ae6d09d26e) skydns (17280:11) < recvmsg res=68 size=68 data=Nbackendcritical-appsvcclusterlocallocaldomain tuple=::ffff:172.17.0.7:47441->:::53 75927 16:41:44.543569823 0 <NA> (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53) 104208 16:41:47.551459027 1 <NA> (36ae6d09d26e) skydns (4640:9) > close fd=7(<4u>10.0.2.15:42126->8.8.8.8:53) 104209 16:41:47.551460674 1 <NA> (36ae6d09d26e) skydns (4640:9) < close res=0

[…]But finally, it all works! Why? curl stops trying to patch things and applying search domains. It tries the domain name verbatim as we typed in the command line. The DNS request is resolved by SkyDNS through the etcd API request (104406). A connection is opened against the service IP address (107992), then forwarded to the pod with iptables and the HTTP response travels back to the curl container (108024).

[…]

104406 16:41:47.552626262 0 <NA> (36ae6d09d26e) skydns (4639:8) < write res=190 data=GET /v2/keys/skydns/local/cluster/svc/critical-app/backend?quorum=false&recursive=true&sorted=false HTTP/1.1[…]

104457 16:41:47.552919333 1 <NA> (36ae6d09d26e) skydns (4617:1) < read res=543 data=HTTP/1.1 200 OK[…]

{“action”:“get”,“node”:{“key”:“/skydns/local/cluster/svc/critical-app/backend”,“dir”:true,“nodes”:[{“key”:“/skydns/local/cluster/svc/critical-app/backend/6ead029a”,“value”:“{\"host\”:\“10.3.0.214\”,\“priority\”:10,\“weight\”:10,\“ttl\”:30,\“targetstrip\”:0}“,"modifiedIndex”:270,“createdIndex”:270}],“modifiedIndex”:270,“createdIndex”:270}}[…]

107992 16:41:48.087346702 1 client (b3a718d8b339) curl (22369:12) < connect res=-115(EINPROGRESS) tuple=172.17.0.7:36404->10.3.0.214:80 108002 16:41:48.087377769 1 client (b3a718d8b339) curl (22369:12) > sendto fd=3(<4t>172.17.0.7:36404->10.3.0.214:80) size=102 tuple=NULL 108005 16:41:48.087401339 0 backend-1440326531-csj02 (730a6f492270) nginx (7203:6) < accept fd=3(<4t>172.17.0.7:36404->172.17.0.5:80) tuple=172.17.0.7:36404->172.17.0.5:80 queuepct=0 queuelen=0 queuemax=128 108006 16:41:48.087406626 1 client (b3a718d8b339) curl (22369:12) < sendto res=102 data=GET / HTTP/1.1[…]

108024 16:41:48.087541774 0 backend-1440326531-csj02 (730a6f492270) nginx (7203:6) < writev res=238 data=HTTP/1.1 200 OKServer: nginx/1.10.2[…]

Looking at how things operate at the system level we can conclude that there are two different issues as the root cause of this problem. First, curl doesn’t trust me when I give it a FQDN and tries to apply a search domain algorithm. Second, .localdomain should have never been there because it’s not routable within our Kubernetes cluster.If for a second you thought this could have been done using tcpdump, you haven’t tried yourself yet. I’m 100% sure is not going to be installed inside your container. You can run it outside from the host, but good luck finding the network interface matching the network namespace of the container that Kubernetes scheduled. If you don’t buy me, keep reading: we are not done with the troubleshooting yet.

Part2: DNS resolution troubleshooting

Let’s have a look at what’s in the resolv.conf file. The container could be gone already, or the file could have changed after the curl call.  But we have a sysdig capture that contains everything that happened.

Usually containers live as long as the process running inside them, disappearing when that process dies. This is one of the most challenging parts of troubleshooting containers. How we can explore something that’s gone already? How we can reproduce exactly what happened? Sysdig capture files come extremely useful in these cases.

Let’s analyze the capture file but instead of filtering the network traffic, we will filter on that file this time. We want to see resolv.conf exactly as it was read by curl, to confirm what we thought, it contains the localdomain.

$ sysdig -pk -NA -r capture.scap -c echo_fds “fd.type=file and fd.name=/etc/resolv.conf”—— Read 119B from  [k8s_client.eee910bc_client_critical-app_da587b4d-bd5a-11e6-8bdb-028ce2cfb533_bacd01b6] [b3a718d8b339]  /etc/resolv.conf (curl)

search critical-app.svc.cluster.local svc.cluster.local cluster.local localdomain

nameserver 10.0.2.15

options ndots:5

[…]

Here’s a new way to use sysdig:

-c echo_fds uses a Sysdig chisel - an add-on script - to aggregate the information and to format the output.

Also the filter includes only IO activity on file descriptors that are a file and with the name /etc/resolv.conf, exactly what we are looking for.Through the syscalls, we see there is an option called ndots. This option is the reason why curl didn’t trust our FQDN (fully qualified domain name) but tried to append all the search domain first. If you read the manpage, ndots forces libc that any resolution on a domain name with less than 5 dots won’t be treated as a fqdn but will try to first append all the search domains. ndots is there for a good reason, so we can perform a curl backend. But who added localdomain there?

Part3: Troubleshooting Docker containers run by Kubernetes

We don’t want to finish our troubleshooting without finding the culprit for this localdomain. That way, we can blame software and not people :) Was Docker who added that search domain? Or Kubernetes instructing Docker when creating the container?.

Since we know that all control communication between Kubernetes and Docker is done through a Unix socket, we can use that to filter things out:$ sudo sysdig -pk -s8192 -c echo_fds -NA “fd.type in (unix) and evt.buffer contains localdomain”

This time we will be capturing live with the help of an awesome filter, evt.buffer contains. This filter takes all the events buffers and if it contains the string we are looking for, will be considered for printing by our chisel that formats the output.

Now I need to create a new client to spy on what happens at container orchestration time:$ kubectl run -it –image=tutum/curl client-foobar –namespace critical-app –restart=NeverI can see that hyperkube, which is part of Kubernetes, wrote on /var/run/docker.sock using Docker API an HTTP POST request to /containers/create. If we read through it, we will find how this request contains an option “DnsSearch”:[“critical-app.svc.cluster.local”, “svc.cluster.local”, “cluster.local”, “localdomain”]. Kubernetes, we caught you!. Most probably it was there for some reason, like my local development machine having that search domain set up. In any case, that’s a different story.

[…]

—— Write 2.61KB to  [k8s-kubelet] [de7157ba23c4]   (hyperkube)POST /containers/create?name=k8s_POD.d8dbe16c_client-foobar_critical-app_085ac98f-bd64-11e6-8bdb-028ce2cfb533_9430448e HTTP/1.1Host: docker[…]

  "DnsSearch":[“critical-app.svc.cluster.local”,“svc.cluster.local”,“cluster.local”,“localdomain”],

[…]

Conclusion

Reproducing exactly what happened inside container can be very challenging as they terminate when the process dies or just ends. Sysdig captures contain all the information through the system calls including network traffic, file system I/O and processes behaviour providing all the data required for troubleshooting.

When troubleshooting in a container environment, being able to filter and add container contextual information like Docker container names or Kubernetes metadata makes our lives significantly easier.

Sysdig is available in all the main Linux distros, for OSX and also Windows. Download it from here to get the last version. Sysdig is an open source tool but the company behind the project also offers a commercial product to monitor and troubleshoot containers and microservices across multiple hosts.