December 19, 2012

Day 19 - Modeling Deployments on Legos

This was written by Sascha Bates.
When deployment scripts grow organically, you typically end up with a brittle, poorly documented suite understandable only by the original authors or people who have been working with them for years. The suite often contains repetitive logic in different files, exacerbated by code offering little in the way of documentation or understanding. There are probably several sections commented out, possibly by people who no longer work there as well as several backup files with extensions like .bk, .old, .12-20-2004, .david .david-oldtest. 

This code has probably never been threatened with version control.  It's easy for bugs to lurk, waiting for just the right edge case to destroy your deployment and ruin your night. Deployments are known to be buggy and incident-prone. The natural result of this situation is that all deployments require an enormous spreadsheeted playbook requiring review by a large committee, a monster conference bridge with all possible players and problem solvers online "just in case" and probably overnight deployments in order to reduce impact.  Regardless of root cause, the Ops team probably receives the brunt of abuse and may be considered "not very bright" due to their inability to smooth out deployments.

When you are mired in a situation like this, it's easy to despair. Realistically speaking, it's probably not just the deployment scripts. Deployments and configurations probably vary wildly between environments with very little automation in place. The company probably has canyon-sized cultural divides and a passion for silos.  You've already had the sales pitch on configuration management and continuous integration testing and, while they are critical to systems stability, I'm not here to talk about them. I'm here to talk about Legos!
"LEGOS: interlocking plastic bricks and an accompanying array of gears, mini figures and various other parts. Lego bricks can be assembled and connected in many ways, to construct such objects as vehicles, buildings, and even working robots. Anything constructed can then be taken apart again, and the pieces used to make other objects."
Instead of looking at your deployment as just a linear progression, consider it a collection of interchangeable actions and states. A way to address the quagmire of scripts is to replace them inch by inch with a framework.  And it’s not so much what you use to create the framework, but how you ensure flexibility and extensibility.  Your deployment framework should be like a Lego kit: a collection of interlocking building blocks that can build something yet be disassembled to build something else.

Building blocks, not buildings: A good deployment system should be composed of a frame of flexible, unambiguous code snippets (bricks).  These could be Bash, Ruby, Puppet or Chef blocks or components from a commercial pipeline tool. Configuration management and pipeline tools have a built in advantage of already providing this logic and idempotence.

A brick should do one thing: manage services, copy a file, delete a directory, drain sessions from an application instance. Bricks should connect at action points based on external logic, which brings me to my next point, separation of duties.

Actors and Actions and a Central Authority: In order to keep deployment logic from bloating our bricks, orchestrators and actors should be separate. I think this is the most difficult requirement to achieve, which is why we end up with bloated, brittle deployment scripts and 90 line spreadsheeted playbooks.  Deployments are not just moving files around and restarting Java containers. Deployments can contain database updates, configuration file changes, load balancer management, complex clustering, queue managers, and so much more. Start and stop order of components can be critical and verification of state and functionality must often happen prior to continuing.

For example, I may need to set Apache to stop sending traffic to my A cluster and I need to verify there are no active sessions in the A cluster application instances prior to continue.  If I can make a decision on the state something should be in, I should not also be the executor. You should have a central command and control that understands the desired state of all components during a deployment, which should be making decisions based on the state of the entire system. There are tools for this. Capistrano is famous for its ability to do much of this. Rundeck can manage multiple levels of orchestration and there are several commercial tools. 

Visible, Understandable Flow: Your tool should be able to display workflow in such a way that it doesn’t take a genius to understand what they’re looking at. While the language or implementation you are using should not matter, your implementation style should.  You should not have to be a genius in the app to understand the process flow when looking at a diagram - ie the flow should be obvious from looking at the tool. This is a place where I find Jenkins and other traditional build tools really fall short.  

If you install enough plugins into Jenkins, it will twist itself around and try to do anything you want. The trouble with that is it become impossible to follow the pipeline and decision making process.  Jenkins is good for pushing a button to start a job, but it generally has no view into your application servers or web servers. If you set it up to make orchestration decisions, it's easy to get lost in the pipeline. I've often met people who want to kick off remote shell scripts with Jenkins as part of a deployment and I continue to object to this because you pretty much just launched a balloon off a cliff. You don't know what's happening on the other end and Jenkins will never know unless that node has a Jenkins agent.  

At this point you've removed the central management and decision making process from your deployment orchestration.  I will continue to maintain that Jenkins makes a fantastic build and continuous testing server, but is not meant for more comprehensive orchestration.

Communicative Metrics and Alerts: Dashboards are critical for communicating to stakeholders.  Stakeholders are the appdev team, operations, other teams impacted by deployments, managers, business owners, your team mates. The more you give people pretty pictures to look at, the less likely it is they’ll make you sit on an enormous conference bridge at deployment time

Your system should have the ability to collect metrics, trigger alerts and display dashboards. I don’t necessarily mean that the tools should come with built in dashboarding and reporting, although if you paid for it, it should. If it doesn’t have built-in dashboards, it should have pluggable options for metrics collection and alerting to something like nagios which will allow you to design some dashboards.

Test Execution: Once you have your reporting and metrics collection in place, the first thing you should be looking at is execution of tests and metrics collection on pass/fail status. The deployment framework should be able to incorporate test results into its reporting. 

Pretty and Easy to Use: The deployment framework should be compelling. People should want to use it. They should be excited to use it. You need a simple UI that is pretty with big candy-colored buttons. It should be communicative with real time updates visible. This will often be one of the last things you acheive, but I consider it critical because it helps move deployments out from under just the person(s) who designed the process to entire teams who need to use it.

Source Control and Configuration Management: Of course, with all of this, your framework should be checked into source control. You should comment it heavily. Do I really need to say more about that?  You should deploy to systems already under configuration management. If you are deploying to systems still configured by hand, your deployment success will be at risk until you standardize and automate. 

You're Not In This Alone: It's hard work sometimes, living in chaos. You're not alone. Many of us have been there and the reason I can write about this so much and have so many opinions is because I've seen it done wrong throughout my career. Either because of time and people constraints, lack of concern for the future (we'll scale when we get there) or ignorance, many of us have encountered and even perpetuated less than optimal scripting suites and spent miserable hours on deployment firefighting bridges. 

Open source tooling and communities are all over the internet. Get an IRC client and pop into one of the freenode chat rooms for #puppet, #chef, #capistrano, ##infra-talk and ask questions. People are friendly and often want to help.  Find your local DevOps or tooling meetup and go talk to people.

Existing Tooling: And finally, you don't have to roll your own. Like Legos, there are many custom kits and how-tos for existing kits. There's no reason not to build on the shoulders of giants instead of starting from scratch. Here are some examples of open source tools specifically for deployment or general orchestration.

1 comment :

Valerie - Still Riding Forward said...

"Visible, Understandable Flow:" I love this metaphor and photo! I remember flow charts that looked like deformed Giant Sequoias. You found a way to explain the value of module building and assembly that makes simple sense. Well done!