December 8, 2019

Day 8 - Going Nomad in a Kubernetes World

By: Paul Welch (@pwelch)
Edited By: Nathen Harvey (@nathenharvey)

What is Nomad

Nomad by HashiCorp is a flexible orchestration tool that allows for management and scheduling of different types of compute workloads. Nomad is able to orchestrate legacy applications, containers, and even Machine Learning tasks. Kubernetes is a well-known orchestration platform but this post provides an introduction to Nomad, a tool that can provide some of the same orchestration capabilities.

Most distributed systems benefit from having schedulers and a facility for service discovery. Schedulers programmatically manage compute resources across a large number of nodes. Service discovery tools are used to distribute information about services in a cluster.

How is Nomad Different

Let’s look at some of the differences between Nomad and Kubernetes. Both Nomad and Kubernetes are able to manage thousands of nodes across multiple availability zones or regions, but this is where they begin to differ. Kubernetes is specifically designed to manage Docker containers. It is designed with more than a half-dozen services interconnected to provide full functionality. Administrating a Kubernetes management cluster can be a full time job if you are not able to leverage one of the many managed services most major cloud providers offer today e.g., Amazon EKS, Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE).

In contrast, Nomad is a more general purpose scheduler supporting virtualized applications, containers, standalone binaries, and even tasks requiring GPU resource management. Nomad is a single binary for both clients and servers that provides a lightweight scheduler and resource manager. Nomad aims to follow the Unix design philosophy of having a smaller scope focusing on cluster management and scheduling, while leveraging other tools, such as Consul for service discovery and Vault for secrets management.

Getting Started

This article is merely a quick introduction to getting started with Nomad using the local development environment with Docker installed. The steps described have been tested with Nomad version 0.10.1. Other great resources for learning more include Learn Nomad and the Nomad Documentation.

To get started, grab the latest version from the Nomad download page for your platform. Note that there is only one binary to install. The binary will run in server or client mode based on the configuration file given. Once you have it installed you can run nomad --version to verify a successful install.

With a successful install confirmed, let’s dive into setting up a local running instance.

Nomad has many options for task drivers available but this demo will be using Docker. Make sure you have Docker installed and running locally.

Nomad Server

Nomad consists of several agents running in server mode, typically 3–5 server instances at a minimum, and any number of agents running in client mode on hosts that will accept jobs from the Nomad server. For our purposes, a single server and client will be enough.

First create a server.hcl file with the following basic configuration:

# server.hcl
# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/tmp/server1"

# Give the agent a unique name. Defaults to hostname
name = "server1"

# Enable the server
server {
  enabled = true

  # Self-elect, should be 3 or 5 for production
  bootstrap_expect = 1
}

In a new terminal, run, nomad agent -config server.hcl. This will start a development server that includes a Web UI available at http://localhost:4646. Here you will be able to see details on Nomad Servers and Clients in this cluster, as well as current and past jobs. Now that we have a server to manage our jobs and resources, let’s add a client.

Nomad Client

Nomad clusters will have agents deployed in client mode on any host that has resources that need to be managed for jobs. In a new terminal window, let’s create the following configuration file and run nomad agent -config client1.hcl.

# client1.hcl
# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/tmp/client1"

# Give the agent a unique name. Defaults to hostname
name = "client1"

# Enable the client
client {
  enabled = true

  # For demo assume we are talking to server1. For production,
  # this should be like "nomad.service.consul:4647" and a system
  # like Consul used for service discovery.
  servers = ["127.0.0.1:4647"]
}

# Modify our port to avoid a collision with server1
ports {
  http = 5656
}

Now when you revisit the Web UI for the Nomad Server you should see a new client listed. You can click into the client details to see information such as available resources or drivers that Nomad has detected (e.g. Docker or QEMU).

Nomad Job Configuration

Now that we have an operational Nomad cluster, let’s create a job to be orchestrated. Generate a new Nomad job with nomad job init. This will create a new file called example.nomad in your current directory.

The Nomad job specifications have many options but for this article we are only going to focus on some of the primary stanzas. For a more in-depth breakdown, check out the Nomad Job Specifications documentation.

Job

The job stanza is the top most configuration value for a job specification file. There can only be one job stanza in a file and it must be unique per Nomad region. Parameters, such as resource constraints, can be set at the job, group, or task level based on your needs. A Nomad Job is similar to a Kubernetes Pod.

Type

The job type refers to the type of scheduler Nomad should use when creating the job. The options are service, batch, and system. The two more frequently used options will probably be service and batch. Service scheduler type is used for a long running tasks that should never go down such as an application or cache service such as Redis. A batch task is similar to service task but is less sensitive to performance and is expected to finish within a few minutes to a few days. The system scheduler type is useful for deploying tasks that should be present on every node.

Group

A job can have many groups and each group can have many tasks. The group stanza is used to define the tasks that should be co-located on the same Nomad client. Any task defined in a group will be placed on the same client node. It’s out of scope for this tutorial, but for failure tolerance configurations, see the spread stanza documentation.

Task

A task is the unit of work using Docker containers, binary applications, or any of the other Nomad supported task types. This is where you specify what you want to run and how you want it to run, with parameters such as command arguments, services using service discovery, or resource requirements, to name a few.

Service

Not to be confused with the Nomad job type mentioned above, the service configuration is for Nomad to register a service with Consul for service discovery. This allows you to reference the resource in other Nomad configurations by the service name.

Each job will have 1 type. A job will have N groups comprised of N tasks. Each task is a service.

job
 | type (1)
  \_ group (N)
        \_ task (N)
          | service (1)

Nomad Job Execution

After reviewing some of the basics of a Nomad job specification, it’s time to deploy a job to our local Nomad cluster.

The example nomad job created with nomad job init defaults to a service job with a task to run a Redis instance using the Docker driver. Let’s deploy that now.

In a separate terminal, with your server and client nodes running in the others, submit the job by running nomad job run example.nomad. You have now setup a basic Nomad installation and deployed your first job!

You can view the status and details of the job with nomad job status example or via the Web UI mentioned previously. Since we are using the Docker driver, we can also see the running container Nomad is managing by running docker ps. Each job is given an allocation number. From the nomad job status example command you can retrieve the allocation number and see the details by running nomad alloc status ALLOC_ID. As with other Nomad commands, you can see what other options are available by running nomad alloc to see the other subcommands you can run such as exec, fs, stop, logs and a few others to help manage jobs.

Scheduler Example

Now that we have a running job, let’s see the scheduler maintain the running service. Get the “Allocation ID” from the job status by running nomad job status example. Then run the allocation status command nomad alloc status ALLOC_ID. If you want, you can use the details in the “Address” field from the nomad alloc status command to connect to the redis container. For example, if the value is db: 192.168.1.33:29906 you can connect to it by running redis-cli -h 192.168.1.33 -p 29906.

Using the alloc status command to refresh the status after we cause the docker container to fail, run docker ps and get the redis container running id. Now stop the contain with docker stop CONTAINER_ID. After a few seconds you can run the nomad alloc status ALLOC_ID again to see the updated status and event details for the job. If you run docker ps as well you will see that the Nomad scheduler has started a new Redis container! If this had been a production cluster and the node running our job had failed, Nomad would have rescheduled the task to a healthy node. This is one of the advantages of using an orchestration tool like Nomad.

Wrap-up

Nomad is an exciting project because it focuses on managing resources across a fleet of systems; regardless of the type of resource. This is particularly useful in diverse environments where there is a need to manage multiple types of resources (e.g binary applications, LXC containers, QEMU virtual machines, etc.). No one tool is perfect for every environment, but hopefully this article has helped you determine if Nomad might be the right solution for you.

No comments :