December 6, 2015

Day 6 - Apache Mesos and the Rise of the Datacenter OS

Written by: Roger Ignazio (@rogerignazio)
Edited by: Justin Garrison (@rothgar)

Roger Ignazio is an Infrastructure Automation Engineer at Mesosphere and the author of "Mesos In Action." Thanks to the generosity of the team at Manning Publications, SysAdvent readers enjoy a 40% discount when they use the code “mesysad” at https://manning.com/books/mesos-in-action.

Containers and application orchestration are hot topics as organizations and engineering teams attempt to deploy changes to applications and infrastructure as quickly as possible, all while improving overall datacenter efficiency. When you read articles about containers, mentions of Apache Mesos (paper) usually aren’t too far away. You may be wondering what Mesos is, and how to use it for managing applications at scale.

In this article, I’ll provide an introduction to Mesos, drawing a number of comparisons between the Linux kernel and Mesos to help you understand how it works. I’ll cover two open source projects that allow engineering teams to quickly and easily deploy applications and scheduled tasks on a cluster. And finally, I’ll discuss how Mesosphere is combining all of this to create a datacenter-wide operating system, with Mesos at its core.

Mesos – A Distributed Kernel

Whether you’re reading this post on your laptop, smartphone, or tablet, chances are that you have no idea which core of the processor your web browser is using. Sure, you could find out, but why bother? The operating system’s kernel handles the resource abstraction and scheduling for you. In the end, the operating system probably doesn’t even matter all that much; you just want a way to run the apps you love, and the operating system is a means to provide that experience.

Mesos, not unlike the kernel of an operating system, provides a way to abstract resources from physical or virtual machines. But where it begins to differ from OS kernels (such as Linux) is that the abstraction isn’t bound to a single host. Instead, Mesos provides a way to abstract resources for any number of machines—from 10s to over 10,000—and program them as a single entity, leading to simplified systems management and improved resource utilization. Many companies such as Apple, Twitter, Airbnb, Bloomberg, and others have turned to Mesos to power their computing infrastructure.

To better visualize what I just explained, take a look at the following graphic comparing the resource abstraction and scheduling between a single machine running the Linux kernel, and multiple machines taking part in a Mesos cluster.

In both cases, some abstraction layer—whether it’s single machine hardware with the Linux kernel or multiple machines with Mesos—is responsible for offering compute resources (processor cores, memory, storage, and network ports) to applications. Linux does a fantastic job of doing this on a single machine, but what happens when you want to deploy applications to multiple machines? Each machine effectively becomes a silo, only able to provide resources to applications that also run on that box.

Resource scheduling

Despite being a popular buzzword lately, "resource scheduling" is anything but a new concept. Two popular examples are the Completely Fair Scheduler in the Linux kernel and the Distributed Resource Scheduler (DRS) in VMware vSphere. In both cases, these schedulers seek to optimize the scheduling of tasks based on the available resources. Mesos borrows ideas from both of these schedulers and builds them into an abstraction layer, or "distributed kernel" of its own. But unlike the Linux kernel, which primarily provides access to underlying physical (or virtual) compute resources on a single machine, Mesos agents offer resources to a Mesos master, to then be consumed by various applications.

The Mesos master implements a two-tier scheduling model, which allows the Mesos master to send resource offers to an application (or framework, in Mesos terms). The application can then accept or decline the offer, based on the attributes in the offer, or if it has any tasks to be launched. The Mesos master is able to schedule multiple different resource types—CPUs, memory, disk, and ports—among various different applications by using the Dominant Resource Fairness algorithm built into its resource allocation module. So instead of provisioning a number of machines to run a specific service, you’re now able to define the amount of resources an application needs, and allow it to be scheduled anywhere on the cluster.

The end of static partitioning

Let’s take a step back for a moment and consider the following scenario. You have two services in your datacenter: a Ruby on Rails application, and a Jenkins CI cluster. When one of these services requires more resources than a single machine can provide, you provision more machines. If your Rails app needs to handle additional users, you’re required to provision a new server and [re]configure your load balancer. If builds are being queued up by your Jenkins CI master, you provision an additional Jenkins agent and manually attach it to the master.

That scenario suggests that humans are being left to perform resource allocation and capacity planning by hand, and frankly, we’re pretty bad at it. Chances are that each of those applications doesn’t operate at 100% utilization 100% of the time, leading to a disappointing industry-average 6-12% system utilization. When you start measuring overall utilization against operating costs, that’s a lot of wasted capital! Regardless of whether you’re running your own datacenter or using an Infrastructure-as-a-Service provider like AWS or Azure, they’re all servers with unused cycles at the end of the day.

By abstracting system resources, you’re able to stop guessing at the number of machines required for a specific application and instead focus on the amount of resources they’re actually consuming—that is, the amount of CPUs and memory that it requires to run your Rails app or your CI service. As long as the number of machines in the datacenter provide enough resources to run all of your workloads, you’re in good shape. And if not, the workloads can be queued until resources become available. But the key takeaway is that you’re now adding additional compute resources to the larger Mesos cluster and not to a number of small, statically-partitioned services or manually creating and configuring VMs.

Containerization

In a way that draws many parallels to the rise of the intermodal shipping container and the containerization movement of the freight industry during the 20th century, we’re seeing the same movement in the IT industry, but at a staggering pace. In fact, the use of containers in computing isn’t exactly new either; control groups (or cgroups, for short) was first added to the Linux kernel back in January 2008 as a way to isolate individual processes. In the last couple of years, Docker has made it incredibly simple for end-users to get started with this technology.

Because containers provide a lightweight alternative to virtual machines, allowing users to run their applications in isolated environments, Mesos is built with containers at its core, supporting both Linux cgroups and Docker containers. Each of these container technologies allow for tasks to run at varying levels of isolation from other tasks on the same machine.

But being a distributed kernel, Mesos only provides a means to launch processes using these container technologies and handles things like resource allocation and port and volume mappings. To launch tasks using Mesos we need an init system to manage the tasks and it would be nice to have a cron system to go with it.

Marathon and Chronos – Init and Cron Frameworks

I already mentioned that Mesos provides a way for multiple applications (or frameworks to use Mesos terms) to share multiple different types of resources on a given cluster. A framework registers with the Mesos master and receives resource offers. There are a number of different frameworks currently available, including big data processing (Spark, Kafka), distributed storage and databases (HDFS, Cassandra), batch scheduling (Chronos, Aurora), and long-running services (Marathon, Aurora). I’ll focus on two popular frameworks: Marathon and Chronos. These frameworks are used to deploy long-running services (like web apps) and distributed cron jobs, respectively.

Marathon

Marathon is an open source init system for Mesos developed by Mesosphere. It’s roughly equivalent to supervisord in that it manages long-running tasks and automatically restarts application instances if one of them should fail. So if a machine in your cluster fails in the middle of the night, Marathon automatically reschedules the failed applications on an available machine.

Marathon supports launching both cgroups and Docker containers on a Mesos cluster, and can quickly and easily scale an application up to N instances.

Marathon also includes an extensive REST API which allows you to create, modify, and delete applications, and query the service for information about running instances. This allows you to automatically perform rolling upgrades of your application using your CI system, or to dynamically create HAProxy configurations and reload the service when changes have occurred. When it comes to application management, Marathon allows you to take the worst part of your scheduling, the human, out of your infrastructure.

Chronos

Chronos is an open source cron system for Mesos originally developed at Airbnb. It builds upon traditional cron with features such as ISO 8601-formatted timestamps, automatic retries of failed jobs, specifying a maximum number of times a task should run, and the ability for a job to have dependencies on other jobs. Like Marathon, it also supports running tasks in cgroups and Docker containers and provides an extensive REST API that can be used for creating, modifying, deleting, and manually triggering jobs.

Mesosphere DCOS – A Mesos-based Operating System

If you take a minute to think about the components that make up an operating system such as such as Red Hat Enterprise Linux or Ubuntu you’d probably identify the following:

  • Init system: A daemon (PID 1) such as Systemd (RHEL 7) or Upstart (Ubuntu 14.04) manage long-running services and can automatically restart (or respawn) services if and when they fail.

  • Package management: A package format (rpm, deb), package manager (yum, apt), and a set of base repositories (base, main).

  • Command line interface: A shell that is launched when a user logs in (bash, zsh).

  • Graphical user interface: An optional graphical user interface for monitoring and administering the system.

With the open source Mesos project as its distributed kernel and Marathon as its init system, Mesosphere has set out to build a modern, distributed, enterprise-grade operating system. This system, appropriately named the Datacenter Operating System (DCOS), provides a way for systems administrators to deploy applications and services at scale without needing to worry about things like statically partitioning services or machines failing in the middle of the night. DCOS is currently offered in two flavors: Enterprise and Community.

Package management

At the time of this writing, Mesosphere provides two package repositories for the DCOS: Universe and Multiverse. These repositories host production-ready and beta packages respectively. The documentation for the Universe covers the schema quite nicely so I won’t cover it all here, but it essentially boils down to a package definition being a JSON object that can be processed by the DCOS CLI and understood by Marathon’s API.

Command line interface

The DCOS CLI can be installed on your laptop or workstation and interacts with various services in DCOS. It provides functionality for managing packages, services, and nodes in a DCOS cluster.

SERVICES=( chronos jenkins spark hdfs cassandra kubernetes )
for service in ${SERVICES[@]}; do
    dcos package install --yes $service
done

Some of these services—Cassandra, HDFS, Kubernetes—require non-trivial amounts of effort to deploy effectively. The team at Mesosphere, using these package repositories, provides and maintains turn-key solutions for deploying these services in your own datacenter in a fully automated, fault-tolerant manner.

Graphical user interface

Although the DCOS CLI allows you to fully administer the operating system from the command line, the web interface provides information about the cluster including installed services, running tasks, and nodes belonging to the cluster.

To deploy your own applications and Docker containers you can use the CLI or navigate to the Marathon web interface (available on the Services tab). There you can create a new application specifying the required resources the number of instances the information for the Docker container image, etc. Although the DCOS includes Marathon as its init system, it’s also possible to deploy multiple instances of Marathon on top of Marathon so that you can provide individual teams with their own Platform-as-a-Service without worrying about them affecting another team’s applications.

Summary

Mesos provides a layer of abstraction for the resources on many machines in a datacenter allowing them to be programmed as a single entity. It allows multiple applications to share a single cluster of machines without worrying about statically partitioning services within the datacenter. Marathon allows you to deploy applications and long-running services on the cluster, and the information available via its API can be used to dynamically create load balancer configurations and reload the configuration when changes occur. Mesosphere DCOS combines a number of open source and commercial components into an easy to manage and deploy Mesos cluster, allowing you to quickly deploy applications and containers.

So, if you’re looking to improve the resource utilization of your own infrastructure and get rid of your human scheduling bottleneck, or if you’re just looking for something new to play with, maybe it’s time to give Mesos and Mesosphere DCOS a try.

1 comment :

Mathias Lafeldt said...

Hey Roger,

Your article does a great job of explaining what Mesos is and how it can serve as the foundation for other powerful services like Marathon or even DCOS. I really enjoyed reading it and I'm going to recommend it to people not familiar with Mesos yet.

Coincidentally, I also wrote a blog post about DCOS some weeks ago. The post gives a high-level overview of DCOS, but also explains how to easily bootstrap your own DCOS cluster, which might be interesting to Sysadvent readers as well.

You can find it here: https://mlafeldt.github.io/blog/getting-started-with-the-mesosphere-dcos/

Again, thanks for your article.

-Mathias