December 10, 2019

Day 10 - It’s OK if you’re not running Kubernetes

By: Mattias Geniar (@mattiasgeniar)

I love technology. We’re in an industry that is fast-paced, ever improving and loves to be cutting-edge and bold. It’s this very drive that gives us exciting new tech like HTTP/3, Kubernetes, Golang & so many other interesting projects.

But I also love stability, predictability and reliability. And that’s why I’m here to say that it’s OK if you’re not running the very latest flavor-du-jour insert-new-project-here.

The media tell us only half the truth

If you would only read the media headlines or news outlets, you would believe everyone is running their applications on top of an auto-scaling, load balanced, geo-distributed Kubernetes cluster backed by only a handful of developers that have set the whole thing up overnight. It was an instant success!

Well no. That’s not how that works.

The reality is, most Linux or open source applications today still run on a traditional Debian, Ubuntu or CentOS server. As a VM or a as physical server.

I’ve managed thousands of servers over my lifetime and have watched technology come and go. Today, Kubernetes is very hot. A few years ago it was Openstack. Go back some more and you’ll find KVM & Xen, paravirtualization & plenty more.

I’m not saying these technologies will vanish - far from it. There’s merit in each project or tool, they all solve particular problems. If your organisation can benefit from something that can be fixed that way, great!

There’s still much to improve on the old & boring side of technology

My background is mostly in PHP. We started out using CGI & FastCGI to run our PHP applications and have sinced moved from mod_php to php-fpm. For many sysadmins, that’s where it ended.

But there’s so much room for improvements here. The same applies to Python, Node or Ruby. We can further optimize our old and boring setups (you know, the ones being used by 90% of the web) and make it even safer, more performant and more robust.

Were you able to check every config and parameter? What does that obscure setting do, exactly? What happens if you start sending malicious traffic to your box? Can you improve the performance of OS scheduler? Are you monitoring everything you should be?

That Linux server that runs your applications isn’t finished. It requires maintenance, monitoring, upgrades, patches, interventions, back-ups, security fixes, troubleshooting, …

Please don’t let the media think you should be running Kubernetes just because it’s hot, you have servers running that you know best that still have room for improvements. They can be faster. They can be safer.

Get satisfaction in knowing that you’re making a difference for the business & its developers because your servers are running as best they can.

What you do matters, even if it looks like the industry has all gone and left for The Next Big Thing (tm).

But don’t sit still

Don’t take this as an excuse to stop looking for new projects or tools. Have you taken the time yet to look at Kubernetes? Do you think your business would benefit from such a system? Can everyone understand how it works? Its pitfalls?

Ask yourself the hard questions first. There’s a reason organisations adopt new technology. It’s because it solves a problem. You might have the same problems!

Every day new projects & tools come out. I know because I write a weekly newsletter about it. Make sure you stay up-to-date. Follow the news. If something looks interesting, try it out!

But don’t be afraid to stick to the old and boring server setups if that’s what your business requires.

December 9, 2019

Day 9 - In Defense Of The Modern Day JVM (Java Virtual Machine)

By: Gene Kim (@realgenekim)
Edited By: Joshua Smith (@jcsmith)

In this post, I'm going to tell you something very surprising that happened to me earlier this year, which has led me to make an impassioned (and perhaps surprising) defense of something that I feel has been unfairly maligned, and share with you some things you may find surprising.

In September, I had the privilege of attending the Sensu Summit here in my hometown of Portland, Oregon. On the evening before the first day of talks, I ran into so many friends I've met over the last 10+ years. Not surprisingly, It was great to catch up with everyone, and hearing about all about their their great new adventures.

When people asked what I’ve been up to, I remember enthusiastically telling everyone how much fun I've been having programming in Clojure, which I later wrote extensively about in a blog post called "Love Letter to Clojure."

I told people how initially difficult I found functional programming and Clojure to be, given that it was a LISP ("doesn't even look like code!") and didn't allow mutation of state ("you can't change variables!"), but the sense of incredible accomplishment I felt being able to quickly, easily, and safely solve problems, in a manner completely unlike the first 35 years of my programming career ("even after 3 years, my code base hasn't collapsed in on itself like a house of cards!").

I remember laughing, jubilantly telling everyone this, and then looking around, and then freezing in surprise... Something had changed... People weren't smiling at me anymore. In fact, people were looking at me as if I had said something incredibly impolite, uncouth, or maybe even immoral.

"Uh, what did I say?" I remember asking everyone in the group. No one said anything, instead just looking back at me with a forced smile. "No, really, what did I say?" I insisted. Still just polite smiles.

In my head, I furiously went through everything I had said, trying to figure out what I might have said that was offensive. I had mentioned Clojure, functional programming, LISP, that I loved that Clojure it ran on the JVM, the programs I had written, what I learned, and...

"Wait, is it because I said I loved that Clojure runs on the JVM?" I asked. Several people around me finally laughed, apparently with complete disbelief that a fellow enlightened DevOpser could say such a thing. When I asked what was so surprising, it’s like the floodgates had opened.

They tell me about all their horrific war stories of their lives in Ops, being thrown incomprehensible and completely opaque Java JAR files, which then invariably detonated in production, resulting in endless firefighting, at night, on weekends, during birthday parties...

"Holy cow," I remember saying, shaking my head in disbelief. "I totally forgot about all of that..."

Basically, when I said the word “JVM,” they heard, “Here's my JAR file. Good luck, chumps. Kbye.”

Why I Love And Appreciate The JVM! And It’s Not Just Me!

Until that moment, if you asked me what I thought about the JVM, I would have told you something like this:

"The JVM is amazing! Clojure runs on the JVM, and takes advantage of the billions of dollars of R&D spent over twenty years that has made it one of the most battle-tested and performant compute platforms around.

“And it can use any of the Java components in the Maven ecosystem — Maven -> Java as NPM -> NodeJS, Gems -> Ruby, Pip -> Python, etc... And there’s so much innovation happening right now, thanks to Red Hat's Quarkus, Oracle's GraalVM, Amazon AWS, Azul, and so much more.

The JVM has enabled me to be so productive! There’s never been a better time to be using the JVM than now!”

But after that astonishing evening at Sensu Summit, for weeks, I kept thinking, “Am I having so much fun programming in Clojure, being a Dev, that I’ve completely forgotten what it’s like to do Ops? Is it possible that Dev and Ops really do have two very different views of the JVM?”

As an experiment, I put out the following tweet (and this one, too) :

I recently observed something interesting/unexpected. I’m performing an experiment & will report out results.

Please reply to this tweet w/following info:

1. Which do you identify as? Dev or Ops

2. Then type any words / emotions that come to mind when I say ‘JVM’ or ‘Java Virtual Machine’.

Begin. Thx! 🙏❤️

Amazingly, I got over 300 replies, which included some of these gems that evoked bad memories from the past:

  • Very-annoying memory-hog
  • "Write once, run anywhere" and "It's running slow... Let's just give it more memory."
  • Pain, anguish, suffering, and screams of "Whhhhyyyyyyy?!?!?"
  • Possibly fast, definitely difficult-to-troubleshoot opaque process that is not working and no one knows why! But it's the most important thing in our stack!
  • Oh no, this thing runs in what now? It's horrendously slow, and will crash at inopportune times
  • Bane of my early Ops existence.
  • Pain
  • Oh FFS, another 3am callout!!
  • Out of memory again

And yet, I got a couple comments like these:

  • Amazingly cool under appreciated tech under the hood. See Clojure, JRuby.
  • My life 10 years ago. Can be really really fast and stable if you really really understand how to drive it.

My sample skewed decisively “Ops.” Last week, I asked some friends in the Java Dev community to repeat the tweet, and again, we quickly got over 200 responses. Here is a sample from the replies we got — my thanks to Stu Halloway from Clojure fame, Dr. Mik Kersten from Tasktop, and Josh Long from Pivotal.

Look how differently they talk about the JVM!

  • Fast, reliable, ubiquitous, large ecosystem, easy packaging
  • Brilliant piece of engineering
  • Love the JVM, bored with the core language, great library ecosystem, solid, reliable, familiar, Clojure runs great on it.
  • Robust, ubiquitous, ponderous, vintage, solipsistic
  • Solid, battle-tested with great ecosystem
  • Impressive, stable, rich, complex, ubiquitous, pervasive, Graal, native-awkward, powerful, elegant, clever, surprisingly long lived, under threat (licensing), tool supported, marketed.
  • It's not just for Java; Reliable; It's grown with me over the years; Runs everywhere including my toaster; Under-valued
  • Safe, known, capable, low risk
  • Backend, concurrency, stability, performance, maturity, excellent design
  • Runs everywhere. Java va Kotlin vs Scala. Spring Boot.
  • useful, but not trendy with the webdevs
  • Polyglot, JIT, fast, happy

I plan on creating a word cloud of all the amazing replies, and grouping them by sentiment — which will be written in Clojure and run on the JVM, of course. But due to deadlines, I’m not sure I can get it done in time for it to be included here. Stay tuned!

My Top Six Things You Should Know About The Modern JVM

To the Syadvent community, I wanted to share some things that excite me most about the JVM, which may surprise you. My hope is that you’ll see the JVM as a vibrant and viable way to run kickass applications in production. And that you’ll see some incredibly valuable characteristics of it that benefits all of us, and that we can move dramatically more of the JVM responsibilities to Dev (e.g., “you build it, you configure it, you run it”).

  1. The JVM runs more than just Java: some of the well-known languages include Groovy, Kotlin, and Clojure. There are also implementations of other languages, such as JRuby (I had fun reading the slides from this 2019 presentation from the JRuby core team) and Jython, which were created to take advantage of the amazing run-time performance and multi-threaded capabilities of the JVM. You can find a more extensive list here.
  2. The JVM runs some of the most compute- and data-intensive business processes on the planet, including at the technology giants (aka, the FAANGs, or Facebook, Amazon, Apple, Netflix, Google, and really, Microsoft should be in there, too — although I suspect it’s unlikely you’ll see too much of the JVM at Microsoft).

    You can see some of the fun stats and names in posts like this one.

    And most of the most famous data platforms all run on the JVM, either entirely or significant components: Hadoop, Spark, Storm, Kafka, etc…
  3. The JVM runs some of the biggest web properties on the planet, like eBay, Google, Amazon, Alibaba, Twitter, Netflix, and more. The JVM has been used at scale for decades, and this lived-experience has born a rich, mature ecosystem of options for application developers. Frameworks like Spring (and increasingly Spring Boot) power Netflix, eBay, all of Alibaba’s various online properties, and more. Developers do not need to choose between simplicity and power: they can have both.
  4. There are many JVM options out there, beyond the Oracle version and OpenJDK. And there are now really great utilities that make it easy to install, upgrade and even swap JVMs on your laptop,, thanks to utilities like SDK for MacOS — it supports many JVMs, including Correto (Amazon), GraalVM (Oracle), Zulu (Azul)...

    Personally, I’ve found it remarkably easy using SDK to switch my Clojure programs between different JVMs — I used this fabulous tutorial here. I used it to explore the GraalVM, which I’m very excited about, which I’ll describe next.

    And new JVMs now have a bunch of new memory garbage collectors, including the Shenandoah GC, written by Red Hat, which finally brings continuous compaction to the JVM, eliminating the need for “stop the world” GC pauses. Here’s a great talk about A New Age of JVM Garbage Collectors by Alexander Yakushev that I saw at the Clojure/conj conference two weeks ago!
  5. I think the GraalVM project is so exciting! For me, GraalVM defies easy explanation — it’s an ambitious, entirely new polyglot JVM that allows running Java and other JVM languages, as well as being able to host languages such as JavaScript, Ruby, R, Python and LLVM-based languages. (!!!)

    GraalVM also enables native-image compilation, which essentially compiles your code into native executables, which allow nearly instant, sub-millisecond startup times, which addresses one of the primary complaints of conventional JVMs. These binaries are usually small (e.g., 33 MB executables instead of 300 MB uberjar files).

    GraalVM is the brainchild of Dr. Thomas Wuerthinger, Senior Director of Research at Oracle. You can listen to an interview of him on Software Engineering Daily. I suspect you’ll be blown away by his ambitious vision, and the massive productivity advantages they’re gaining by writing a JVM in a language that’s not C++.

    I loved this video of how Twitter is now using GraalVM to run all their Scala workloads by Chris Thalinger, resulting in a 20% improvement in performance and compute density. And here’s a video of Jan Stepien presenting on how to create native images for Clojure programs. Here’s another great article from AstRecipes, showing how native images resulted in a 300x improvement in startup times.

    GraalVM seems to be energizing a flurry of innovation outside of Oracle — Red Hat has created Quakus platform, intended to optimize the JVM for Kubernetes and similar environments. I remember reading the this blog post, getting excited about instantaneous startup times and significantly reduced memory footprints. You can find an interview of Guillaume Smet and Emmanuel Bernard on Software Engineering Daily — I found it especially fascinating that they are using Go as the benchmark to beat.
  6. The JVM and Maven packaging ecosystem is a calming breath of fresh air: The Maven packaging ecosystem is one of the longest-lived and most successful package repositories. In terms of number of packages/components and versions, only NPM for NodeJS is larger.

    In a presentation that I did with Dr. Stephen Magill at GitHub Universe, we presented on the findings from the State of the Software Supply Chain research we did with Sonatype, studying the update behaviors within the software supply chain in the Maven ecosystem.

    One of the things this research reinforced for me is that packaging churn is growing to be untenable, especially in NPM in NodeJS. This funny tweet thread sums it nicely: “When npm was first released in 2010, the release cycle for typical nodeJS package was 4 months, and npm restore took 15-30 seconds on an average project. By early 2018, the average release cycle for a JS package was 11 days, and the average npm restore step took 3-4 minutes...."

    Of course, the catastrophic “nodularity” is a joke — but there are so many stories of projects where “I didn’t touch the project for 4 months, and now it no longer builds if you update any of the dependencies, and I probably need to update nom, too.”

    In other words, if every dependency you rely on is updating every week, and they are introducing breaking changes, you are in a world of hurt. In a world where the best way to keep your software supply chain is to integrate updating dependencies into your daily work, this becomes impossible.

    The Maven ecosystem are full of components that just work, have been working in production. In fact, there have been almost no breaking changes to the Clojure core libraries in 12 years!

    If you choose carefully, the components in the JVM ecosystem are stable, reliable, and updates can be made quickly, reliably, easily, without everything blowing up in your face.

    And by the way, one of most amazing talks I've seen is from @BrianGoetz, the Java language architect, on his stewardship of Java ecosystem from Clojure/conj 2016. What's unmistakable & so admirable is his sense of responsibility to not break the billions of lines of code that 9MM developers have written over the last two decades.

    The promise they make to them: “we won’t break your code”
  7. Devs should configure and run their own applications and JVMs: The days of Devs throwing JAR files over the wall to Ops, who must figure out how to run it in production. Instead, the more modern pattern is having the Devs configure their own JVMs however they want, which then get deployed into a platform that Ops creates and maintains — but it’s the Devs who will be woken up if and when things blow up...


I hope I’ve told you something about the JVM that you may not have known, and made the case that the modern day JVM should be a great thing, both for Devs and for Ops!

PS: You can read more about Clojure and functional programming here, and more about how critical it is for Ops creates platforms that enables developers to be productive in my description of the Five Ideals here, featured in “The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data” (this book about DevOps is now a Wall Street Journal bestseller!!)

December 8, 2019

Day 8 - Going Nomad in a Kubernetes World

By: Paul Welch (@pwelch)
Edited By: Nathen Harvey (@nathenharvey)

What is Nomad

Nomad by HashiCorp is a flexible orchestration tool that allows for management and scheduling of different types of compute workloads. Nomad is able to orchestrate legacy applications, containers, and even Machine Learning tasks. Kubernetes is a well-known orchestration platform but this post provides an introduction to Nomad, a tool that can provide some of the same orchestration capabilities.

Most distributed systems benefit from having schedulers and a facility for service discovery. Schedulers programmatically manage compute resources across a large number of nodes. Service discovery tools are used to distribute information about services in a cluster.

How is Nomad Different

Let’s look at some of the differences between Nomad and Kubernetes. Both Nomad and Kubernetes are able to manage thousands of nodes across multiple availability zones or regions, but this is where they begin to differ. Kubernetes is specifically designed to manage Docker containers. It is designed with more than a half-dozen services interconnected to provide full functionality. Administrating a Kubernetes management cluster can be a full time job if you are not able to leverage one of the many managed services most major cloud providers offer today e.g., Amazon EKS, Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE).

In contrast, Nomad is a more general purpose scheduler supporting virtualized applications, containers, standalone binaries, and even tasks requiring GPU resource management. Nomad is a single binary for both clients and servers that provides a lightweight scheduler and resource manager. Nomad aims to follow the Unix design philosophy of having a smaller scope focusing on cluster management and scheduling, while leveraging other tools, such as Consul for service discovery and Vault for secrets management.

Getting Started

This article is merely a quick introduction to getting started with Nomad using the local development environment with Docker installed. The steps described have been tested with Nomad version 0.10.1. Other great resources for learning more include Learn Nomad and the Nomad Documentation.

To get started, grab the latest version from the Nomad download page for your platform. Note that there is only one binary to install. The binary will run in server or client mode based on the configuration file given. Once you have it installed you can run nomad --version to verify a successful install.

With a successful install confirmed, let’s dive into setting up a local running instance.

Nomad has many options for task drivers available but this demo will be using Docker. Make sure you have Docker installed and running locally.

Nomad Server

Nomad consists of several agents running in server mode, typically 3–5 server instances at a minimum, and any number of agents running in client mode on hosts that will accept jobs from the Nomad server. For our purposes, a single server and client will be enough.

First create a server.hcl file with the following basic configuration:

# server.hcl
# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/tmp/server1"

# Give the agent a unique name. Defaults to hostname
name = "server1"

# Enable the server
server {
  enabled = true

  # Self-elect, should be 3 or 5 for production
  bootstrap_expect = 1

In a new terminal, run, nomad agent -config server.hcl. This will start a development server that includes a Web UI available at http://localhost:4646. Here you will be able to see details on Nomad Servers and Clients in this cluster, as well as current and past jobs. Now that we have a server to manage our jobs and resources, let’s add a client.

Nomad Client

Nomad clusters will have agents deployed in client mode on any host that has resources that need to be managed for jobs. In a new terminal window, let’s create the following configuration file and run nomad agent -config client1.hcl.

# client1.hcl
# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/tmp/client1"

# Give the agent a unique name. Defaults to hostname
name = "client1"

# Enable the client
client {
  enabled = true

  # For demo assume we are talking to server1. For production,
  # this should be like "nomad.service.consul:4647" and a system
  # like Consul used for service discovery.
  servers = [""]

# Modify our port to avoid a collision with server1
ports {
  http = 5656

Now when you revisit the Web UI for the Nomad Server you should see a new client listed. You can click into the client details to see information such as available resources or drivers that Nomad has detected (e.g. Docker or QEMU).

Nomad Job Configuration

Now that we have an operational Nomad cluster, let’s create a job to be orchestrated. Generate a new Nomad job with nomad job init. This will create a new file called example.nomad in your current directory.

The Nomad job specifications have many options but for this article we are only going to focus on some of the primary stanzas. For a more in-depth breakdown, check out the Nomad Job Specifications documentation.


The job stanza is the top most configuration value for a job specification file. There can only be one job stanza in a file and it must be unique per Nomad region. Parameters, such as resource constraints, can be set at the job, group, or task level based on your needs. A Nomad Job is similar to a Kubernetes Pod.


The job type refers to the type of scheduler Nomad should use when creating the job. The options are service, batch, and system. The two more frequently used options will probably be service and batch. Service scheduler type is used for a long running tasks that should never go down such as an application or cache service such as Redis. A batch task is similar to service task but is less sensitive to performance and is expected to finish within a few minutes to a few days. The system scheduler type is useful for deploying tasks that should be present on every node.


A job can have many groups and each group can have many tasks. The group stanza is used to define the tasks that should be co-located on the same Nomad client. Any task defined in a group will be placed on the same client node. It’s out of scope for this tutorial, but for failure tolerance configurations, see the spread stanza documentation.


A task is the unit of work using Docker containers, binary applications, or any of the other Nomad supported task types. This is where you specify what you want to run and how you want it to run, with parameters such as command arguments, services using service discovery, or resource requirements, to name a few.


Not to be confused with the Nomad job type mentioned above, the service configuration is for Nomad to register a service with Consul for service discovery. This allows you to reference the resource in other Nomad configurations by the service name.

Each job will have 1 type. A job will have N groups comprised of N tasks. Each task is a service.

 | type (1)
  \_ group (N)
        \_ task (N)
          | service (1)

Nomad Job Execution

After reviewing some of the basics of a Nomad job specification, it’s time to deploy a job to our local Nomad cluster.

The example nomad job created with nomad job init defaults to a service job with a task to run a Redis instance using the Docker driver. Let’s deploy that now.

In a separate terminal, with your server and client nodes running in the others, submit the job by running nomad job run example.nomad. You have now setup a basic Nomad installation and deployed your first job!

You can view the status and details of the job with nomad job status example or via the Web UI mentioned previously. Since we are using the Docker driver, we can also see the running container Nomad is managing by running docker ps. Each job is given an allocation number. From the nomad job status example command you can retrieve the allocation number and see the details by running nomad alloc status ALLOC_ID. As with other Nomad commands, you can see what other options are available by running nomad alloc to see the other subcommands you can run such as exec, fs, stop, logs and a few others to help manage jobs.

Scheduler Example

Now that we have a running job, let’s see the scheduler maintain the running service. Get the “Allocation ID” from the job status by running nomad job status example. Then run the allocation status command nomad alloc status ALLOC_ID. If you want, you can use the details in the “Address” field from the nomad alloc status command to connect to the redis container. For example, if the value is db: you can connect to it by running redis-cli -h -p 29906.

Using the alloc status command to refresh the status after we cause the docker container to fail, run docker ps and get the redis container running id. Now stop the contain with docker stop CONTAINER_ID. After a few seconds you can run the nomad alloc status ALLOC_ID again to see the updated status and event details for the job. If you run docker ps as well you will see that the Nomad scheduler has started a new Redis container! If this had been a production cluster and the node running our job had failed, Nomad would have rescheduled the task to a healthy node. This is one of the advantages of using an orchestration tool like Nomad.


Nomad is an exciting project because it focuses on managing resources across a fleet of systems; regardless of the type of resource. This is particularly useful in diverse environments where there is a need to manage multiple types of resources (e.g binary applications, LXC containers, QEMU virtual machines, etc.). No one tool is perfect for every environment, but hopefully this article has helped you determine if Nomad might be the right solution for you.

December 7, 2019

Day 7 - Be a Boat Lifter! A Rising Tide that Lifts Everyone Around You

By: Mike Dorman (@misterdorm)
Edited By: Ben Cotton (@Funnelfiasco)

I want to suggest to you what I think is the number one tip for advancing your career (and, really, for advancing your life, too). It’s not about education, a certain new skill you have to learn, or anything like that.

It’s simply this: do everything you can to help other people.

We all have a responsibility to help each other get better. None of us can reach our full potential unless we’re helping each other to learn and grow. This is especially true in the technology industry: things change so fast and move so quickly, it’s impossible for any one person to keep up.

An analogy I like is being a “boat lifter”: a rising tide that lifts up everyone around you. (That’s actually how I describe my work role sometimes. I’m a boat lifter–I’m here to help out everyone around me.) I think it’s a good metaphor for us to think about what we should be doing every day.

You probably already know some people who are living this out. Here are a couple examples folks that I know:

I want to point out this quote from John Allspaw:

“On Being a Senior Engineer,” Kitchen Soap

And I want you to notice it says “mature engineers” here. Not “senior engineers.” Not “principal engineers.” While this practice is a necessary condition for any senior role, this can and should be practiced by everybody at all levels.

So let me suggest 10 disciplines we can instill in ourselves and others to help each other get better.

1. Extra Set of Eyes

Realize that you bring an extra set of eyes and a unique perspective to every situation. Those around you can learn and be helped by this, as long as you’re willing to share it.

Your past experiences give you a kind of view into the future, where you can see some problems before they happen. Make sure to be calling these out! It’s so important to those around you!

Think about it this way: it’s like clearing away the mines before other people walk across the field. It’s all about identifying and stopping problems before they happen.

Code reviews and pair programming are a perfect example of this. Think of how often a bug is overlooked or a typo makes its way into a merged pull request. So much of that can be prevented on the front end simply by having someone else look it over. Automated tests and CI/CD systems can only get you so far—nothing can replace another human.

Make it a priority in your daily routine to ask for—and more importantly, perform— reviews. This is simply the most valuable thing you can do for your coworkers, your organization, and your career.

2. Patterns of Pain

Be on the lookout for “patterns of pain.” These are the recurring things that are always causing trouble and burning a lot of time. Focus yourself and others on dealing with those things first. Otherwise, they’ll just keep dragging you down.

Tech debt comes to mind here, but that’s really not what I mean. When we talk about tech debt, we mostly mean old or “dusty” code, or things that aren’t quite as perfect as we’d like. But, they’re actually not causing any problems. They’re working just fine.

What I mean are the things that are actively on fire and giving you headaches every day. The false positive alerts that are going off, the “automation” that’s failing more often than it’s succeeding, the server that crashes once a week and no one knows why, so you just reboot it every time. Those are the things you need to go after first!

3. Focus on Outcomes

This borrows from Agile quite a bit, but I actually think it’s super useful in our work roles, as well as everyday life: Focus everything you do on specific outcomes. The outcomes are your finish line, your ultimate goal, what you actually want to have happen.

As an example, after you eat dinner, you have to wash your dishes and clean up your kitchen, right? Those are the tasks, not the outcome. The outcome you’re after is to have enough clean dishes to cook and eat your next meal. There are a lot of different ways to accomplish that goal! So really try to aim for the outcome and don’t focus too much on the specific tasks that get you there.

So often we immediately jump to the implementation details, or the “how”, instead of really thinking through the “what.” I get it: you want to jump in, get your hands dirty, and build something. Your brain immediately starts churning with ideas about how to make it happen. But your finished product will be so much better if you truly understand the end goal before focusing on how to get there.

4. Continuous Evaluation

Constantly be evaluating how you’re doing against those outcomes. You can never get any better unless you’re doing this. Build that habit in others as well. It can be really easy to lose this perspective.

A good litmus test for that is to take your team’s outcomes and try to connect those to the higher-level goals of the company. You all need to be pulling in the same direction. After all, this is how your team provides value to the organization.

You can do this in your personal life, too. Just ask yourself, “is what I’m doing in my life helping to advance, encourage, or otherwise help those around me?”

5. Be Ready to Change Direction

If you have trouble making those connections, be prepared to shift your priorities. And this could even mean abandoning work or projects that are already in progress. It’s ok not to finish that stuff! If it’s not giving you the value and outcomes you want, there’s no point in doing it.

This is the “fail fast” mantra we’ve all heard. The most successful companies and teams jettison anything they’re doing that’s not producing benefit for their customers. That way they can focus their time and energy on things that are truly valuable.

But this could be a problem when you have to kill someone’s pet project. It’s pretty painful to have your hard work abandoned. That’s a difficult situation for people to be in. So what you need to do is help others to see that their value comes from their knowledge and skill set, not the particular project they work on.

6. Real Feedback

A good way to do that is to get people thinking about their career goals (really, their career outcomes that they want), and how to get there. Constructively point out areas they can be working on.

A good place to start is your company’s job leveling or career bands document (hopefully you have one!) Be familiar with the expectations for each category and level, and use that as a guide to identify improvement and growth opportunities in others.

This is especially important for those of you who have more experience. You have tons of great career advice that people newer to the industry need to hear from you! Be constructive, but also be honest. It’s super important to actually provide real feedback about this.

Sometimes we have a tendency to just say, “Hey, you are doing a great job, keep it up! Just keep doing what you’re doing!” I’ve certainly had some reviews like that (maybe you have, too), where there’s really no constructive advice about what to do differently.

But that is so not helpful. People need real feedback about how to get better!

7. Healthy Work/Life Balance

Next, encourage, instill, and practice a healthy work-life balance. We all know people who will stay up all night to fix an outage or knock out a big project. Sometimes we refer to these people as are our “heroes.” But they’re really not heroes at all. They’re encouraging all the wrong behavior and it’s not good for anybody. If you’re not sure who the “hero” on your team is--maybe it’s you!

Watch for people who are burned out, or aren’t as happy as they used to be. Talk to them and find out why! Don’t let them go unnoticed! Encourage them that it’s OK to take time to recharge. We need to be actively advocating for this in our industry! And in other areas of life, as well. I think we can agree that we’re all just way too busy.

8. Be Humble

Seek to be humble, and don’t attract a lot of attention to yourself. (This is good general life advice, too.) When complimented for a job well done, make sure to give credit also to all those around you who helped. Very little of what we do is completely on our own. Again, this is about us all working together to make everybody better.

Resist the temptation to believe that you’re better than everyone else. Let’s be honest: we all have this tendency sometimes. You might be the most senior and experienced person in your company, but remember there’s always something that somebody else knows more about than you.

It might be frustrating to walk someone more junior step-by-step through completing a project. But you were once at that level, too, and you needed the help and guidance of others to get to where you are now.

9. Walk the Path Together

Recognize that there will always be people in front and behind you on your journey. And you need help from both. You need both mentors and mentees in your life. You can lean on all the experience of those who have gone before you. And you need to be an advocate and encouragement to those walking the path behind.

Find a couple of people more senior and a couple of people more junior than you, and meet with them regularly. These are times not to work on projects or “talk shop”, but instead to dig into career goals, feedback about how things are going, and to soak up the knowledge and insights the other person has for you.

This is very much like the “extra set of eyes” discussed above. Often others are able to notice things in us that we can’t see for ourselves.

10. Teach Others to Fish

Finally, don’t keep these skills to yourself. We all need to be doing this. We can’t do it in a vacuum. Don’t just do these things, but teach others to do them as well. Work to build a culture in your organization, in your family, and among your friends, of helping each other succeed.

If you think you don’t have time for this, frankly, you’re wrong. You must make time. This is immensely more important and valuable than any code you’ll ever write or servers that you operate. The joke about automating and delegating yourself out of a job is true in the exact opposite: the more knowledge and skills you can impart on others, the more important and critical you become.

So I want to take you back to Swarna’s advice: helping others be successful is your success.

Let’s all be boat lifters every day!

Additional resources & articles on this topic:

December 6, 2019

Day 6 - KubeVirt

By: Tyler Auerbeck (@tylerauerbeck)
Edited By: Adam Whitlock (@alloydwhitlock)

Traditionally there have been very clear battle lines drawn for application and infrastructure deployment. When you need to run a Virtual Machine, you run it on your virtualization platform (Openstack, VMWare, etc.) and when you need to run a container workload, you run it on your container platform (Kubernetes). But when you’re deploying your application, do you really care where it runs? Or do you just care that it runs somewhere?

This is where I entered this discussion and I quickly realized that in most cases, I really didn’t care. What I knew was that I needed to have the things I required to build my application or run my training. I also knew that if I could avoid having to manage multiple sets of automation – that would be an even bigger benefit. So if I could have both running within a single platform, I was absolutely on board to give it a shot.

What is KubeVirt?

KubeVirt is a set of tools used to run Virtual Machines on top of a Kubernetes environment. You may have needed to read that through a few times, but it’s true, Virtual Machines running on top of your container platform. There’s no need for separate VM’s running elsewhere, just one place to deploy all of your things. I’m sure like many others you’ve heard “You can run anything in a container!”. While that’s mostly true, that doesn’t necessarily guarantee that it won’t be hard or it won’t force you to make some terrible decisions along the way. So if you find yourself heading down this path, ask yourself the following question: “If you can have both while reducing your cost (both technical and mental), what’s stopping you?”

What benefits does KubeVirt provide?

So what actual benefits does KubeVirt provide? From my experience, it reduces cognitive load on folks who are trying to deploy your application (whether that be manual deployments or the automation for those deployments). Rather than having to manage multiple workflows that know when something is going to platform A or platform B – we now have a common deployment model. And it’s YAML all the way down my friends.


And while we all may have our gripes with YAML, it reduces the cognitive lift of having to figure out what you’re looking at when you’re handed a new application to deal with. You may not know exactly what it is that you’re deploying, but you can safely assume that you’re just a kubectl apply -f away from finding out. This standardization can greatly increase the efficiency of your dev/ops/devops teams because now they’re all operating and communicating in a common set of tooling rather than breaking them into smaller teams based off of different skill sets.

The second benefit that you can get from using KubeVirt is (potential) savings from consolidating your tech stack. Rather than running separate sets of infrastructure for your virtualization and container platforms, you can begin consolidating these stacks for common purposes. To be clear, this isn’t going to be something that you wave a magic wand at and it would suddenly become a consolidated stack. It would look more like (something something something). However, once you begin that journey, you would then begin to see potential savings in things like software and utility costs. Depending on your workloads, you may also see an added benefit of being able to decrease your infrastructure completely because Kubernetes may be better at packing/scheduling your applications together than other systems. These benefits tend to vary based on workload, so your mileage may vary.

The last benefit that comes to mind are the things that a container platform provides you. When a virtual machine dies it generally stays dead until something tells it to power back on – whether that be something inside of the virtualization platform itself or some other monitoring system that either tells it (or tells someone) to bring it back online. Even at this point, the system may come back online – but it may not be in great shape (re: healthy). The scheduling capability that Kubernetes provides is a huge boost due to the fact that if something is scheduled to be running, it will continue to make sure that it is running. And with things like liveness and readiness probes, you now get these low-level monitoring components for free. So the need to engage either an external system or members of your team decrease from these capabilities alone. So now if something goes wrong, it really must have gone wrong before you need to become engaged. (feels like I need a bookend here)

Scenario: Building an Ansible training on Kubernetes

These details are all fine and dandy, but for me it’s always useful to see these tools in action before I tend to grasp some of these concepts. So I’m going to walk through the scenario that had me looking at KubeVirt in the first place. It all started with not having the permissions that I needed…

Jake the Dog Screaming
Jake the Dog Screaming

I was working with a customer and we quickly realized that we needed to catch them up to speed quickly if we were going to be able to be useful to them in the small amount of time that we had scheduled together. This meant that I needed to introduce the tools, make sure they were able to work with them (re: install and use them) and then make sure that they would be in good shape once we left. We do this a lot, so helping folks get up to speed didn’t concern me at all. However the next piece of information I was given haunts me anytime I hear it.

We don’t have privileges on our local machines.

Listen. I understand. We live in a scary world. There is always someone looking to poke a hole in your organization. But there needs to be balance. You need to make sure the people that you employ have the ability to do their job. Otherwise you are wasting valuable time that your people have that could be spent providing value to your organization and are effectively telling them to sit on their hands until someone tells them that they can begin working again. I also understand that there are ways to effectively manage these types of risks. This was not one of these times.

So stepping off of my soapbox and back to our scenario. Once I heard this and it was explained to me what needed to happen in order to get the necessary software onto their machines, I knew that I needed to come up with a plan. To be successful here, I needed to be able to get a set of tools (primarily Ansible) onto their local machines and then make sure that they had a set of machines to work against. Problem number one was getting these tools approved for installation. At a high level, this required a significant amount of paperwork (tickets, sign-offs, etc.). The next part was getting the appropriate teams to provide a set of VM’s to each of our developers so that they could use them to get familiar with Ansible. A rough estimate was given to me that essentially chewed into half of our scheduled time together before we would even receive them. Considering that we couldn’t do much without getting them familiar with the tools beforehand, this was a non-starter for me.

So it was time to get creative.

Mad Scientist
Mad Scientist

The problem?

So let’s first clearly define the problem. I needed to get Ansible in the hands of developers and provide them with a way to begin learning this technology.

This required the following:
- (1) workstation to run Ansible from
- (2) VM’s to run Ansible against

🚨🚨 Note: Yes, I could have used that single workstation to both run the Ansible from and run the Ansible against. However I was looking to provide a better illustration of the benefits of Ansible, along with how it’s used in a real-world scenario. 🚨🚨

The goal?

So we have a clearly defined problem. At this point, I at least had a cluster available to me throughout the engagement. I also knew by the end that the customer would also have a cluster available to them. The goal here was to both upskill them on new technologies and also ensure that they were able to “take it home” with them so that they could begin upskilling others in their organization. While I could have just solved this with my own cloud money, this wouldn’t have solved exactly what we were trying to do there. Granted this was a self-imposed constraint, but one that I decided was valid because of the restrictions that were in place in their organization and would likely take some time to unwind before they were (hopefully) lifted.

The solution!

This is when I ran into KubeVirt. I quickly realized that we had all the access we needed to be able to deploy and run things in the cluster. So after a quick proof-of-concept, I felt fairly confident that we could do what we needed to do all within the cluster. The big challenge at the time was ensuring that we were able to communicate over the network using the protocols that we needed to (primarily SSH required by Ansible). There were likely a ton of better ways to solve this problem and there are absolutely better ways that I know of doing this now. But to avoid having to dig a bit into Kubernetes networking, I’ll stick with the simplest solution hack that I found to get around the problem at the time. So without further delay, let’s get to work.

Step 1: Find yourself a Kubernetes platform

While this scenario was originally run on an Openshift cluster, this will work on any Kubernetes platform. To keep things simple, I’ll use KinD. The only thing you need to get this running is to grab the KinD release from Github and to have Docker running on your machine. For the purposes of this article, we’ll be using KinD version v0.6.0.

Once we’ve pulled down the binary, we’ll need to ensure that it’s somewhere in our path. We also need to define what our cluster is going to look like. Let’s create a directory called kv-demo and then create the following file called cluster.yml.

kind: Cluster
- role: control-plane
- role: worker

What this will give us is a 1 master and 1 worker cluster to start working with. All we need to do now is deploy this and we’ll be ready to get started. You can do this with the following command: kind create cluster --config cluster.yml --name kv-demo. After running this command you’ll see output similar to the one below:

Creating cluster "kv-demo" ...
 ✓ Ensuring node image (kindest/node:v1.16.3) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✓ Joining worker nodes 🚜
Set kubectl context to "kind-kv-demo"
You can now use your cluster with:

kubectl cluster-info --context kind-kv-demo

Thanks for using kind! 😊

As long as this matches up, move on to the next step.

Step 2: Install KubeVirt

Now that we have a platform to deploy on top of, we can go ahead and get the required components up and running. For this example, we’ll use KubeVirt v0.23.0. If you would like to use the CLI virtctl, this is where you can grab it from. Otherwise there’s nothing that you need directly from this repository as we’ll be referencing files remotely.

Nested Virtualization Check

The first thing you need to do is to see whether nested virtualization is enabled on your host. To do this, run the following: cat /sys/module/kvm_intel/parameters/nested. If the response is Y, then move on to the next step. If the response is N, then you’ll need to create a configmap with the following: kubectl create configmap kubevirt-config -n kubevirt --from-literal debug.useEmulation=true.

Deploy KubeVirt Operator

Once the config is in place, we can deploy the initial KubeVirt components. The first piece that needs to be put in place is the operator. Run the following to deploy the operator: kubectl create -f This deploys the following components:
- kubevirt namespace
- kubevirts custom resource definition
- clusterrole
- kubevirt-operator service account
- kubevirt-operator clusterrole
- kubevirt-operator clusterrolebinding
- virt-operator deployment

Once you see these objects applied, you’ll need to monitor the progress of your deployment. You can monitor this by watching for the pods to be created in the kubevirt namespace with kubectl get pods -n kubevirt -w. Once you see that everything is running and ready, we can move on to the next step. You can see the below output for reference output.

NAME                             READY     STATUS    RESTARTS   AGE
virt-operator-6b494c9fc8-l466w   1/1       Running   0          4m28s
virt-operator-6b494c9fc8-zql77   1/1       Running   0          4m28s

Deploy KubeVirt

Now that our operator is up and running, we can create a custom resource that will tell the operator to deploy KubeVirt itself. This consists of the virt-api, virt-controller and virt-handler components. If you’re interested in the architectural specifics of the deployment, you can see a nice description here from the KV team. Once you’re ready to deploy these components, you can run the following: kubectl create -f You’ll see that this creates a single instance of a kubevirt resource called kubevirt. This deployment can take some time, but you can again run kubectl get pods -n kubevirt -w to monitor the progress. You should eventually see output that looks something like what you see below:

NAME                               READY     STATUS    RESTARTS   AGE
virt-api-69769c7c48-dh2zj          1/1       Running   0          3m42s
virt-api-69769c7c48-p84lq          1/1       Running   0          3m42s
virt-controller-6f97b858b7-94fv7   1/1       Running   0          3m15s
virt-controller-6f97b858b7-cfhnv   1/1       Running   0          3m15s
virt-handler-bmmwn                 1/1       Running   0          3m15s
virt-operator-6b494c9fc8-l466w     1/1       Running   0          24m
virt-operator-6b494c9fc8-zql77     1/1       Running   0          24m


Congratulations! At this point, we’ve got KubeVirt up and running. But before we start deploying our virtual machines, we want to make sure we’ve got somewhere to work from. To accomplish this, we’ll be deploying a container inside of our cluster that we can connect to. This accomplishes two goals. The first being that if we define an image that has all of our tools already pre-baked into the container – that leaves one less thing that we leave up to user error.

works on my machine
works on my machine

The second goal that we accomplish is that we ensure we have access to a common network and can easily communicate with our virtual machines. Again, this was a bit of a hack that we went through at the time and there are absolutely better ways of doing this. But that discussion can be saved for another time.

Now back to work.

Step 3: Get yourself a workspace container

As mentioned above, the next step is ensuring that we have a workspace to operate from. We’ll do this by using a container than has the necessary toolset already made available for us. In our scenario, we already had one pre-built which you can find the Dockerfile for here. This has a number of tools already pre-installed on top of our RHEL 8 base image, but most importantly it has Ansible. You can cut this down as you see fit or you can use it as is – your choice. If you’re satisfied what how this looks, then you can just rely on a pre-built version of this image that we have hosted in Quay:

So once you’ve decided on how you’d like to use this image, we can then deploy it into our cluster with the following

kubectl create user-ns
kubectl create deployment tool-box -n user-ns

This will give us a new namespace and will then create a deployment of our tool-box container. You should see this is running with kubectl get pods -n user-ns.

NAME                      READY     STATUS    RESTARTS   AGE
tool-box-64f5d796-2db66   1/1       Running   0          3m49s

🚨🚨 Note:: If you built this locally or pushed it to your own remote registry, you simply just need to replace the quay registry above with the appropriate image registry, image name and tag. 🚨🚨

Now in order to check that we have connectivity to our workspace, we exec into our pod and run some commands to make sure we’re in good shape. Run the following:

kubectl exec -it tool-box-64f5d796-2db66 /bin/bash
ansible --version

🚨🚨 Note: You should replace the name of the toolbox container above with the one that appears on your screen. This will absolutely be different in your cluster than what I have in mine. 🚨🚨

After running the above commands, you’ll see output similar to the following:

bash-4.4$ whoami
bash-4.4$ ansible --version
ansible 2.8.6.post0
  config file = None
  configured module search path = ['/home/tool-box/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.6.8 (default, Oct 11 2019, 15:04:54) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]

As long as this all checks out, we’ve got a functional workspace. Now we’re ready to start working with some VM’s!

Step 4: Now for some Virtual Machines

As part of our KubeVirt deployment, we created a handful of CRD’s. One of those was called virtualmachineinstances. As you may guess, this is one way we have of creating a VM. There are additional methods, but this is the approach we’ll take as part of this example. We’ll focus on just deploying a single VM for brevity. However, you can customize this into a template to fit your requirements (or even just modify and run multiple times) in order to create multiple VM’s. In order to create our single instance, you can run the following:

kubectl create -f -n user-ns`

After a few minutes, you should see something similar to the following:

NAME                               READY     STATUS    RESTARTS   AGE
tool-box-64f5d796-2db66            1/1       Running   0          54m
virt-launcher-vmi-fedora-0-cj4c2   2/2       Running   0          7m18s

What this will do is create a VM called vmi-fedora-0 in your user-ns namespace. This will then be recognized by KubeVirt and it will then begin deploying your VM. If you inspect the yml above, you’ll notice a few things:

  - containerDisk:
      image: kubevirt/fedora-cloud-container-disk-demo:v0.21.0

This image that we refer to is a pre-built Fedora image from the KV team. If you’d like to know more particulars about how this image is built, you can read more here. All you need to know for now though is that this is just a Fedora base image that we use to spin up our VM.

The next piece you may notice is the following cloudInit:

      userData: |
        hostname: fedora-0
        password: fedora
        chpasswd: { expire: False }
        ssh_pwauth: True
        disable_root: false
          - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDsYL8SnJf3blzXmsqJrdyz8RF88W+k9tv/5muoL9ieUGpI67cCKbzCInfKRiuMaDd51D8f+ezZzwx6x/sSbhaDIA90cPBCJIVXY3sVLTSIYK+EzfzDdgYBdpphsRCapwK++5Yev68NT/02BJRbqXhNrYcE4bj2GEQX6Tq8n3LqOYg3j5TvmCBvxut7qztn16rNHFBFF2K/AEavzkyFrzaddFAdVzmV79zBAhCYwoRWhXffMr0NxihxdbglT7qNRtJbOlvBgbYinn2rSsXrSF+1TdCHk3Uo+H5q2sfSDtMQCN32Oh+bCG/zxwL8p2hbdC6AKIk3LzICTqFa+gRCvOWR 

There are plenty of things that you can specify here, but the ones to take note of here are:
- Setting a hostname
- Setting a default password
- Not disabling the root user
- Providing a public key to be inserted as part of the image (you can modify and use your own key)

To avoid introducing too many moving parts, I had initially decided to hard-code some of these things so that we could slowly build up a comfort and knowledge-base versus opening the firehose to start. If you would like to use the key that is already added to this yml, you can retrieve it here.

🚨🚨 Note: I know keeping private keys in public repositories are bad. This is for testing purposes and I don’t ever recommend doing this for any real-world workloads. 🚨🚨

Step 5: The finish: Let’s run a playbook!

Alright, so now we’re in the home stretch. We have a workspace to do our work from and we have a virtual machine to do our work against. Now we just need to add the last piece: the work. So let’s get started! The first thing we’ll want to do is take note of a piece of information that we’ll need later. In order for us to write an Ansible playbook, we need to know what host we need to run against. Again, in this case we’re going to keep it simple and just grab the IP address for the VM. You can get this by running kubectl get vmi -n user-ns. You should then see output similar to:

Get your IP address

NAME           AGE       PHASE     IP            NODENAME
vmi-fedora-0   18m       Running   kv-demo-worker

You should notice the IP column. Grab this value and stash it for later.

Configure your workspace

Now that we have the IP address for where we need to connect, let’s make sure that our workspace is in good shape to begin communicating with it. The first thing we need to do is reconnect to our tool-box container. We can do this again by running kubectl exec -it tool-box-64f5d796-2db66 /bin/bash. Once we’re dropped back in our terminal we need to do a few things
- Retrieve our private key
- Configure an Ansible inventory
- Create a simple Ansible playbook

Retrieve Private Key and Test Connectivity

We need to use curl in order to download our private key from the repository noted above (if you chose to use the example-provided public key). We could get fancier and mount the key in as part of our deployment, but the point here was to keep things as simple as possible to start.

To get our key into the right space with the right permissions, run the following:

mkdir -p /home/toolbox/.ssh
cd /home/toolbox/.ssh
curl -O
chmod 700 id_rsa

We can then test that we can connect to our VM with the following: ssh fedora@ hostname. This should return you the hostname and look similar to this output:

bash-4.4$ ssh fedora@ hostname
Ansible 101

Now that our workspace and VM can communicate with each other, we’ll roll in the last layer of our exercise: teaching some Ansible. The goal of this article wasn’t to become an Ansible expert, so we’re just going to do a very simple user creation to demonstrate tying all of this pieces together. The first thing we need to do is pull together our inventory. This tells our playbook where it’s connecting to and how. Let’s do the following in our tool-box container:

mkdir inventory
touch inventory/hosts

Once this structure has been created, add the following to your hosts file:

[my-test-group] ansible_ssh_user=fedora

This will allow us to specify this group to execute our playbook against. The playbook itself is the second key to all of this. This specifies what actions we’re going to be taking. To get this created, let’s run the following (again inside of the tool-box container):

touch playbook.yml

Once this file has been created, add the following content:

- name: Create Test User
  hosts: my-test-group
    - name: Add the user 'james' with a bash shell, appending the group 'admins' and 'developers' to the user's groups
        name: james
        shell: /bin/bash
        append: yes
      become: true

Once this is ready, we can go ahead and run our playbook, which creates a user named James on the VM that we had created. We can confirm this by accessing the host via SSH and checking that the user James exists.

bash-4.4$ ssh fedora@ "id james"
uid=1001(james) gid=1001(james) groups=1001(james)


This approach is meant to show how you can go about using these technologies together. I was able to collapse the need for managing multiple methods of automation to create a Kubernetes cluster and separate VM’s for my training. On top of that, I was able to collapse this down even farther because previously we needed to be able to deploy these types of automation in various types of cloud environments (AWS, Azure, GCP, etc.) as well as different on-premise environments. By taking this approach, we were able to break this down to the point that we only needed to rely on a single interface, the Kubernetes API.


These are just my thoughts and experiences. For more thorough information on KubeVirt visit their website or check out their Github. There are plenty of extra bells and whistles available to be used – so make sure to dig deeper to continue to gain the benefits that they provide!

December 5, 2019

Day 5 - Break up your Terraform project before it breaks you

By: Kief Morris (@kief)
Edited By: Kerim Satirli (@ksatirli)

Your Terraform project is out of control. Oh, it started fine. You and your crew needed a standard set of infrastructure for the application. Web server, application server, database cluster. A few security groups, network routes, and gateway, some IAM roles, that sort of thing. Five or six files, none of them more than sixty lines of HCL.

Playing dice with your universe

But look at it now. You started building containers for your application as it turned into a bunch of micro(ish)-services, and obviously, you needed a Kubernetes cluster to run them. There’s code for the cluster, and more to build the host nodes and manage how they get scaled up and down. You’ve reshuffled that code a few times, nicely organized into four .tf files rather than one massive .tf file.

At least you finally moved off your hand-rolled ELK stack onto your cloud provider’s log aggregation service, but that’s still a couple of .tf files. And few people on your team are working on the new API gateway implementation, which will be handy now that you’re running a couple dozen micro(ish)-services.

This environment is a beast. Running terraform apply can take an hour, sometimes more. And just hope it doesn’t fail and leave the state file wedged.

You’re up to four environments now, and whenever you make a change to one, you have to copy it to the other three projects. Sometimes you don’t have time to copy a change to all of them. Because of this, the code for each project is … different. “Special.”

Running terraform apply feels like rolling dice.

Break it up

I bet you can guess what I’m going to tell you. Sure, the title of this article gives it away, but even without that, you and your team must have discussed splitting your infrastructure code across multiple projects, rather than having it all in one big one.

Let’s make one thing clear. Modules won’t help you with this. I’m not saying that you shouldn’t use modules to share code, that’s fine. But modules won’t shrink your big hairy Terraform project; they’ll only make it look organized. When you run terraform apply, you’re still rolling dice with a big honking pile of infrastructure and a single state file.

So when we talk about breaking your project up, we’re talking about breaking it up so that a given environment is composed of multiple state files. We’re talking about making it so you can change the code for one project, and plan and apply it on its own.

If your team has chewed over the idea of breaking up your project this way, you’ve probably also considered some of the things that would make it hard to pull off:

  • Your projects will have dependencies across them, so changing one project might break another. For example, if one project creates a server in a subnet from a different project, you might have to tear down the server when you change the subnet.
  • While each Terraform project would be smaller on its own, you’d have a lot more of them to manage.
  • How will you move the infrastructure out of the current monolithic Terraform project into smaller projects?

I have suggestions.

Integrate loosely

I suspect you’re aware of microservices, given the buzz (not to say hype) in the industry. You can read what my colleague James Lewis wrote on Martin’ Fowler’s website a few years ago, or read our friend Sam Newman’s books on the subject.

The “small pieces, loosely joined” idea behind microservice architecture make a lot of sense for infrastructure, too. Just like with user-facing software, you should organize your projects so that they’re not very tightly coupled. The point is that you can make a change to one project, and not worry too much about the others. If every time you change one project, you also have to change code in other projects, you’re doing it wrong.

There are two key concerns. One is where to draw the boundaries between your projects. You want to organize your infrastructure code so that each project works on its own. Usually, this means organizing it in a way that matches applications and services, and teams (see Conway’s Law).

The other key concern is how your projects integrate. Returning to the example of one project that makes a subnet, and another that creates a server in that subnet, how does your server project know the subnet_id to use?

Avoid tight coupling between projects

How does the project that creates the subnet make it available to projects that use the subnet? With software that integrates over the network, you have an API, using a protocol like REST. With Terraform, you have a few options.

The most popular way to integrate across Terraform projects is to point your server project at the subnet project’s state file. You write a data "terraform_remote_state" block, and then refer to the outputs from the other project, as in this example from the Terraform docs.

This is a bad idea. Integrating using state files creates a tight coupling between projects.

As you may know, coupling is a term that describes how easy or hard it is to change one project without affecting the other.

When you write your server project to depend on an output of the subnet project, you are coupling to that output and its name. That’s usually OK. The team that owns the subnet project needs to understand that its outputs are a contract. They have the constraint that they can’t change that output name whenever they want without breaking other code. As long as everyone understands that, you’re OK.

But integration with the output in a state file, at least the way Terraform currently implements it, couples more tightly than just integrating with the output name. It couples to the data structure. When you apply your consumer project, Terraform reads the state file for your provider project to find the subnet id.

Doing this can be a problem if you used different versions of Terraform for the projects. If you upgrade to a new version of Terraform that changes the data structures in the state files, you need to upgrade all of your projects together.

The point is that you need to be aware of how your projects are coupled. Requiring all of your infrastructure projects to use the same version of the same tool is tight coupling.

So what are the alternatives?

You can integrate using data "aws_subnet", discovering the subnet based on its name or tags. The integration contract is then the name or tag. The project that creates the subnet can’t change these things without breaking consumers.

Another is to use a configuration registry, something like Hashicorp’s Consul, or any key-value parameter store. Your provider stack puts the value of the subnet_id in the registry, your consumer stack reads it from there. Doing this makes the integration point more explicit - the provider stack code needs to put the value under a specific name.

Another way you can do this is with dependency injection. My colleague Vincenzo Fabrizi suggested this to me.

Your consumer project code shouldn’t be responsible for getting the value. Instead, you pass the value as a parameter. If you run Terraform with a script or Makefile, then that script fetches it and passes it in. This keeps your code decoupled, which comes in handy for testing.

Testing your Terraform projects

You should write automated tests for your Terraform projects, and use CI or even CD to run those tests every time someone commits a change. Testing Terraform code is a big, messy topic. I’ll foist you off to my former colleague Rosemary Wang, who wrote about TDD for infrastructure, and current colleagues Effy Elden and Shohre Mansouri who created a site with examples of infrastructure testing tools and code

Using these tools and techniques, you should create and test each of your Terraform projects on its own. Avoid needing to create instances of other projects to test one project. Doing this enforces loose coupling. If you can’t do this, then your projects are probably too tightly coupled. Put in the effort to redesign and restructure your projects until you can.

If you can create a standalone instance of your project, and have automated tests you can run against it, then you can use a pipeline to deliver changes to different environments.

Automate your automation

One of the problems you have in your team is environments getting hosed when different people apply changes. One person works on a change and applies it to a test environment. Then someone else applies a different version of the code from their own machine, wiping out the first person’s work. Things get confusing and messy.

You should make sure that the code for any given instance of the project is only applied from one place. If you’re applying code from your laptop, nobody else should be doing anything to that environment. Nobody should apply code from their laptop to shared use environments, like “dev”, “staging”, and “production”, except in an emergency. Instead, you should always apply the code from a central location.

You might use a hosted service like Terraform Cloud to apply your code. Or you could run Terraform from a CI job or CD stage. Either way, you can use the central service to control what version of code is applied where. A central runner also ensures Terraform runs from a consistent environment, with the same version of Terraform.

In addition to applying code to your environments more cleanly, using a central runner helps to enforce the decoupled design of your infrastructure code. Every time someone commits a change, the project has to run cleanly on its own.

Refactoring for fun and profit

Refactoring is changing code without changing its behavior. When you refactor a single Terraform project into separate projects, the resulting infrastructure should look the same.

Refactoring application code is straightforward. Change the code, rebuild it, run tests that show that it behaves the same, then deploy it. Refactoring Terraform code is similar, except for the “deploy” part. You’re not only changing the code, but you’re also changing the state files.

The Terraform CLI includes the state command, which you can use to move state from one state file to another. You create your new Terraform project, and you transfer the state for resources from the original project’s state file to the new project state file. You can run terraform plan as you go to check that each piece is moved. You know everything is complete when terraform plan reports that it won’t change anything on the new project.

This is a very tricky operation. One wrong command, and you’re in a crisis, scrambling to undo the damage. If you back up your state files beforehand and don’t run terraform apply until terraform plan tells you you’re clear, you shouldn’t damage your live system. But it’s a painstaking process, and overly manual.

I’m not currently aware of any tools to automate this process, but I’m sure we’ll see them soon. However, I have a technique that can de-risk the process that you can use today.

My friend Pat Downey explained how his team uses the “expand-contract” pattern used for refactoring database schemas, also called “Parallel Change”, for Terraform projects.

In a nutshell, this is a three-step process:

  1. Expand: Create the infrastructure in the new project, without touching the old project. For example, create your new project that creates a new subnet.
  2. Migrate: Make and apply code changes needed to start using the new infrastructure. This involves implementing whatever integration technique you’ve chosen. In our example, you edit the original project so that the server uses the new subnet id. At this point, the old subnet still exists. If multiple things are using the subnet, you might break this step of the process down to migrate them one by one.
  3. Contract: Once the original resource is no longer used, delete it from the code and remove it from the infrastructure.

You can handle the projects in different ways. For example, rather than creating the subnet in a new project and leaving the server in the old one, you might do it the other way around. Or you might make two new projects, one for the subnet and one for the server, completely retiring the original project once everything is moved out.

The advantage of expanding and contracting infrastructure, rather than editing state files, is that you can implement it entirely with CD pipelines, avoiding the need to run any special one-off commands. Create a pipeline for each project involved and then push the code changes through them. The expand, migrate, and contract steps are each a commit to the codebase, pushed through the pipeline. The pipeline tests each change in an upstream environment before being applying it to production.

Splitting stacks for fun and profit

If you want to learn more, I recommend watching Nicki Watt’s presentation on Evolving Your Infrastructure with Terraform.

I expect we’ll be breaking monolithic stacks apart more often in the next few years. We’re already seeing the first generation of Terraform projects grow and mature, and many people are in this situation. And even once we all get a better handle on how to design and manage modern infrastructure, our systems will continuously evolve, so we’ll always need to refactor and improve them.

December 4, 2019

Day 4 - Successful projects without all the pain

By: Kirstin Slevin (@andersonkirstin)
Edited By: Jessica Ulyate (@julyate)

Projects are an unavoidable part of our work in tech and we’ve all seen examples of the good and the bad. Good projects look like a cohesive effort towards a clear goal, a reasonable amount of planning, meeting the challenges that come up, and camaraderie all along the way. In contrast, poor projects look disjointed and disorganized. They become even more frustrating when given the dreaded ‘behind schedule’ designation, when there is confusion as to why the project even exists, when nobody is sure what they should be working on, and when it devolves into endless meetings. This is only a small set of common project pain points.

While almost all of us will be involved in or leading projects from time to time, most of us are not full time specialists in managing projects nor do we usually have the assistance of a dedicated project manager. So how then can you set a project up for success, while keeping as much time as possible for all the other things you do?

To begin to answer this, we should first start with defining what a “project” is. A project can be defined (and indeed is, by the Project Management Institute) as “a temporary endeavor undertaken to create a unique project service or result.” A project should have a reasonably clear start, middle, and end. Moving an application from a company data center to the public cloud is an example of a project. Establishing and practicing a security incident response team is also project. On-call support is not a project - it’s an on-going activity that is not expected to end. However, establishing a revised on-call structure and response processes to meet holiday business needs could be considered a project. Now that we know what a project looks like, let’s talk about important practices.

  • Start with a charter; A project charter is a simple but critical foundation for a good project. It should answer questions like: Why are we doing this? Who is involved and who are the stakeholders? How will we know when we’re done? How we will we know if we were successful? As the old saying goes, “An ounce of prevention is worth a pound of cure.” A solid charter can do wonders to guide the project and help keep everyone on the same page. But do note that starting with a charter assumes a decent upstream process that prioritized this project in the first place; if you’re not sure you have that, first work to make sure prioritization is clear and in order.
  • Breakdown the work; the charter should give a good idea of what you want to accomplish at a high level and next you want to begin to break that down into smaller pieces of work. Keep this simple to start - an hour of focused attention at a whiteboard can get you far. You’ll continue to refine more as you go, and things will inevitably change, but laying out the broad strokes of how to get towards the goal and more detailed steps for the shorter term work, is a must.
  • Set up some basic work tracking; Now that you’ve broken down the work, you want to make that usable. This means perhaps moving the plan you whiteboarded into a shared document, task tracking tool like Jira, or a project plan. Stick with something that is familiar to you and your team. It’s important that everyone can see the plan, collaborate, and keep it updated; beyond that, advanced features are often less important and an easy way to get distracted.
  • Provide regular updates; Once your project is in motion, it’s important to distill a snapshot of the project regularly and share with stakeholders. Even if no one asks for these updates, even if you’re not sure anyone is looking at them, do this anyway. Providing regular updates ensures that you take a step back from the day to day view of the project and look more widely at how it’s progressing. A simple format should include: key milestone or estimated completion date, likelihood of hitting the date, top priorities for the next one or two weeks, and notable call outs.
  • Celebrate wins; With the project progressing, there should then be some solid achievements along the way. These can be simple moments, like a ‘Hello, world!’ from an application being moved to the public cloud. Take the opportunity to celebrate and bring attention to these achievements. The goal here is to help keep the enthusiasm high. Don’t underestimate the motivation and satisfaction that can be achieved by doing this.
  • Model behaviors, always; No matter what your role on the project is, you have an opportunity to make the project more successful, and certainly a more enjoyable experience, by setting a positive example. Get the team together to break down some more of the work, proactively share an update on your work, or call attention to a win from the team. Often, the difference between a tedious project and a gratifying one is in the way the team acts towards the work and towards each other.
  • Learn something for next time; Ongoing retrospectives are a great practice that gives the team a regular opportunity to look back, reflect, and improve. Good retrospectives though require strong facilitation, time, and follow up which may not always be possible. At the very least, you should look back in some way at the end of the project with the team and generate ideas for how a similar project could go better in the future.

We are all likely to be involved in or managing projects from time to time and we want to do as effectively as possible. A few key practices, used throughout the life of the project, can go far towards setting the project up for success.