December 11, 2015

Day 11 - 5 years of Puppet experience

Written by: Julien Pivotto (@roidelapluie)
Edited by: Jan Ivar Beddari (@beddari)

Puppet was game changing for infrastructure automation and config management and is one of most used tools in the area. It is still a fast changing platform, and different ways of using it are evolving at a rapid pace as well. Looking back at the last 5 years, I've been working on many diverse Puppet setups at different sizes. Depending on what you are trying to achieve there are a lot of choices to be made and approaches to consider. Today I'll try sharing some of the experience I've gained.

'Puppet' is the name of the core tool but I'll also use it to reference the larger ecosystem, which includes other tools like PuppetDB, Hiera, and mcollective.

The Puppet way of doing system administration

IT is evolving at seemingly ever-increasing speed but looking closer, infrastructure trends follow a common pattern: more and more systems need to be managed, they need to be highly available, and they need to be deployed and kept up to date faster. Starting out, simple tools like kickstart and bash could be all that is needed, but soon enough choosing specialized tools will make your automation efforts a lot more reliable, faster and ultimately easier.

Puppet is a configuration management tool - it helps you manage the configuration of your systems: which services should be running, which packages should be installed, which files should be present, and so on. It has a lot of features but they have a limited scope. In some areas and use cases, Puppet is used to deploy other complementary services that will help gain the last miles to a completely managed infrastructure. For example, while Puppet can ensure that a service is running it can't give you high availability by itself. Thus, tools like pacemaker are required as well. As another example, if you need to disable a backend on a reverse proxy during an upgrade, you should try to use tools like Consul or Ansible. While Puppet can still be the mechanism that provisions and configures these third-party tools, it also means that it is not the only tool needed to satisfy the requirements sysadmins are facing.

Reproducible infrastructures

One of the main goals to always keep in mind is the ability to fully reproduce a complete infrastructure from scratch. It should be possible to do this within a limited timeframe with no or very little human cost. This is ambitious and requires a lot of work but still is an investment that needs to happen. Executed right, this will avoid bad surprises and prepare any organization for the future. The ability to bootstrap new hosts at any time for any purpose will benefit all. Hardware fails, business requirements change, scope evolves, there are so many reasons to be ready!

In each datacenter/network segment you will want to have basic services like DNS available, as well as software repositories. This is important for the infrastructure itself, to make it possible to rebuild new hosts at any time. For software delivery it is equally important, enabling delivery of a new build even if third party services are down (github, rubygems.org, pypi, etc). Being dependent of upstream is a big problem: there is nothing you can do about what you do not own and host yourself. If there is downtime, you could be blocked for hours. If some input changes on external mirrors, you could well be spending a lot of time adapting and testing that new input - at a moment where you would have had much more value getting things back to normal.

The easy answer to this is to package all your third party dependencies (like Jenkins plugins, anyone?) and store them in tools like pulp or aptly. This category of tools manage software artifacts and repositories in an efficient way and help replicate the content as needed.

Splitting infrastructure into several distinct environments is frequently done. They should be independent to the point where you can run tests on one platform without affecting the others. I commonly see a pattern with at least three of them - a development environment used for integration tests, UAT for user acceptance testing and a production environment. Environments should be used to test Puppet code but also you own software and products. Because your application deployment most likely will live within the lifetime of your infrastructure, this doubling up in process makes sense. In the same way you ship and promote new versions of your software, you should ship and promote Puppet code through the environments.

To ensure that your infrastructure is really reproducible you should practice randomly destroying hosts in your environments. To help onboard your organization, provide Vagrant definitions for your developers using the same code you use in production. Note that both of these approaches requires a certain level of confidence, but don't wait too long!

The automation mindset

At any point in time during the lifetime of an infrastructure you need to keep automation in mind. What will be the cost of automating every new piece of software? If that price is too high, are there any other possible solutions? As soon as you plan to use new software you need to start looking for relevant Puppet modules, and the results of that research should be present in your pros/cons checklists. If you deploy any software, even in a temporary environment, only for 'a few days' or as a proof-of-concept, it should be automated. As the Puppet community puts it, it should be #puppetize'd (a common verb for long-time Puppet users). Some companies tend to go directly from proof-of-concept to production and if you did not properly automate the initial implementation, production will start off having debt you will need to pay off very soon.

Even software installed on a single server should be automated. You might think that it is so small that there is no need, but suddenly you get more of the same. Next, you start forgetting about that special easy fix, and at the end, when a problem occurs, you can't remember anymore how you set up that particular thing and why. Do not put yourself through this.

Not only applications should be automated - support services like backups and monitoring too. An application that comes as production-ready should still be automated, setting up backup and monitoring as well. Puppet comes with handy ways of doing this, like collected resources. Once these three aspects are automated, the application becomes part of your environment and you can start really testing and using it.

The way you bootstrap your hosts also matters. Human interaction after bootstrap should be non existent. You need to find a right balance between what you add to your kickstart files and your Puppet code. Tools like The Foreman will help you with that and could be used to manage Puppet certificates.

Avoid the shortcuts

Each time someone takes a manual action, it increases your technical debt. At some point in time, events outside of your control will force you to pay back: A virtual machine is destroyed, a datacenter migrated, or a 3rd party provider changed. When this happens, if you took shortcuts or never gained enough confidence to fully re-bootstrap your environments, you might sit with a very wrong and false sense of security. You expect that Puppet brings you a kind of rebuild as disaster recovery procedure for free, but that can be all false if you did not do actual recovery tests before. A 5 minute hack made 2 years ago is impossible to remember in a rush. In contrast, if no shortcuts were made, you can look directly at the Puppet code to know all components of a host and in what way they have been installed.

Puppet is not the end of the old way of doing systems administration, it is built on top of it. Packaging is a good example: it has always been there, and using Puppet it stays very important. You should always prefer writing a OS package instead of lower level alternatives like wget+untar. Packaging provide a lot of mature tooling for free - sanity checks, consistent package versioning, keeping track of installed files, and more. Simplified, once you have all the static files in a OS package, Puppet can change the default configuration of the package to fit the requirements of the current environment.

One extra advantage of getting as much as possible delivered as packages is the standardization it represents for configuration management tools - packages can be used with any of them. Each artifact you package will reduce the cost of switching to another tool.

Be part of the community

Puppet is also a community - a lot of people solving the same problems in similar ways. Each time we share code or experiences it has value for many others, creating a positive feedback loop. While the majority of people in the community are just listeners, they could gain a lot by exchanging tips and questions within the community. As part of this community, there are many people who use Puppet in less common, more specialized ways. This could be by using the Puppet Ruby gems directly, by maintaining their own fork of Puppet, or by maintaining their very own stack of modules. There is a lot of hidden power in the community that is not currently expressed. This could be said for most open source projects, but an important difference with Puppet is that it most likely isn't part of your core business. You have nothing to lose by sharing your experience.

Puppetlabs as a company has traditionally given a fair share of attention to the community. Today, as they are mixing open and closed source solutions, they are not always as giving or open as we'd like them to be. Still, they definitely were the drivers behind building the community, and if it continues on as productive and brilliant as today, it is mainly because of them. A lot of events are organized with or by the community, like PuppetConf and the Puppet Camps, which attracts visitors from a wide variety of different user profiles - open source users, Puppet enterprise users, or newcomers. If you want more technical or in-depth talks, look for a local Puppet User Group. These smaller events often connect people who share the same kind of ecosystem. The contributors summits are also recommended for people who directly contribute to Puppet core and the modules.

It is equally important to take part in Puppet development and follow any updates to the roadmap. You can always contribute by giving feedback, test upcoming releases, or fix bugs. If your infrastructure depends on Puppet, your business does as well. Be careful not to underestimate the impact of upstream decisions. Knowing in which direction Puppet is heading will help you to make the right decisions for your infrastructure.

Take care of Puppet

Puppet itself can't be forgotten in your infrastructure. As any other service, Puppet daemons/Puppet runs need to be monitored. When do they happen, what did they change, were they supposed to change something? When you push a change you want to be aware of the result of that change, and immediately take action in case of failure. You want to know the last time your hosts have applied a catalog, because you want to be sure that you infrastructure matches your current version of your Puppet code.

Making sure Puppet reports are backed up for further reviews and storing backup copies of SSL keys can save you a lot of trouble. If Puppet is broken your whole infrastructure will be frozen as a consequence - no changes will be possible and any deployments will be uncertain. Try to prevent this from happening!

This leads to the fact that Puppet itself should be managed by Puppet. This enables you to change the way it is triggered, or to manage it updating itself. It also helps you keep a consistent configuration for that service in all your environments. Having your Puppet servers be highly available is also a good idea - an external CA and DNS round robin could work well for this.

Puppet Trees

A Puppet tree is the code (manifests and modules) needed to build your infrastructure. A mental switch needs to happen here: you, as a sysadmin writing Puppet code, are a developer. This is a fact that you need to fully embrace. You have to get confident with git, Jenkins and other tools developers use. Git, for example, bring you all the flexibility needed to include Puppet modules, with the mechanism of git submodules. As stated before, you obviously want to mirror all the modules as git repositories in your own infrastructure. That will save you from third party downtime and allow you to do some nice stuff like enforcing style checks on every git push.

I do not think that librarian-puppet is useful in the development of an infrastructure. Looking at the requirements, you want to be able to quickly check the content of modules at the exact same version they will be deployed. You don't want to rely on any forge or service, but you do want to be able to patch any Puppet module as quickly as possible. Using librarian-puppet would mean you need a mirror of the forge in your own infrastructure. With git submodules, all the tools know how git work. Adding an extra layer does not bring any value for internal development.

An important principle to understand is that your Puppet tree should be the same between your environments. It should first be deployed to your development environment(s), then to your UAT environment(s), and eventually to your production environment(s). This promotion should be as quick as possible: a feature should not block deployment of newer versions of your tree. If needed, use feature switches that depend on the environment to prevent important changes from propagating. Do not stop deploying code to production for any reason. In some situations you will need to push a time critical security improvement to your production environment - but you want to do it without loosing the ability to test it in your other environments first.

Following the same principles, you should not use git on your Puppet masters, and never push directly to the final environments. As with any codebase, it is required that tests are run by your favorite CI system before deployment to any environment: syntax checks, style checks, and even more in depth checks (populating catalogs with puppet master --compile). You should be able to run any upstream tests with your local Puppet version, and you should write tests for your own modules as well. That allows you to get some consistency and to preserve the 'API' of your modules so you can reuse them between multiple projects.

It also implies that you package up your Puppet tree. It is a trivial operation with tools like fpm and it will allow you to deploy it in a consistent way across multiple servers. It is also more reliable than git as connectivity to the server could drop or a submodule could be missing some commits. More, it helps prevent people change the code on the server directly, which is probably one of the biggest anti-patterns with Puppet.

You might also want to follow the standard methodologies of modern software development: agile methodology, kanban, ... Stand-up meetings and communication are very important and will help you to make the right decisions together with the people in your team. Quality of the code should also be regularly reviewed, in several ways: In addition to style and syntax checks, fix any depreciation warnings, or other bad signs that Puppet could show you. This also implies that you reduce the size of your codebase to the strict minimum. Do not keep old, abandoned code around if you don't need it any more. A tool like Puppet-ghostbuster can help you to find which Puppet modules you are not using.

Puppet Modules

Puppet modules are the base bricks of your Puppet tree. You should always look for upstream modules before starting to write your own. And if really you need to write a Puppet module, then you should consider sharing it with the community. In that way you module will get some visibility and if it is used enough, will gain in quality, test coverage and many other aspects that the community can bring to you.

Upstream modules come in all kinds and forms: you will find a large range in code quality, and you will need to quickly scan the module to see if it matches your way of doing IT: if a module uses wget or curl, then it will more likely not be acceptable for serious infrastructures. In the same way, if a module has been built for one operating system, check if upstream would be willing to support other operating systems before starting to work on this module.

Keep an eye of the versions of your modules. New versions of modules, like any other software, fix (security) bugs, changes defaults, change features or scope, and sometimes work for newer version of the software they manage. So once again, test the new versions in your development environment first before deploying it to a more sensitive environment. It also means that to make that update process more easy, you should try to get your local patches accepted upstream. It requires some extra work, because you will probably need to port your new features for a OS you do not manage, and write extra tests to verify the quality of your code.

You do not need to have full stack modules. Always prefer a module do one and only one thing. A bad example would be to manage Puppet and PuppetDB using the same module, or mix Apache service configuration with website configuration. One module should meet a single purpose or target. Then, using your profile modules, mix different base modules together. The profile module can export your monitoring resources and your Apache vhosts, but that should not belong to the same module as your application. Mixing everything in one module make it difficult to adapt your code to all use cases, if you want change or swap just one part of it it quickly becomes complicated. Don't lose your focus and think about the scope for each piece of code in your Puppet tree.

Separate data from code

Since Puppet 3 and its native Hiera support, splitting data from code is easier and happens almost everywhere now. Some people use Hiera for this, others use an ENC, the idea behind this is that you do not want to change Puppet code and deploy 3 different environments just to do trivial data changes. Examples of data that can easily be externalized are DNS zones, unix users, or ssh keys. Keep in mind that you should test any yaml and JSON data structures using CI before they are packaged and deployed.

As part of data separation it can be useful to implement profile modules (e.g profile_apache) that helps you to bring extra features that can't be merged upstream. Profiles can also provide an API that you maintain as an independent layer, separated from upstream. If you manage different projects for different customers or apps, you can also share these profile modules.

Puppet DSL Tips and Tricks

Read the docs! This is definitely a good trick for any software, including Puppet. Reading other people's modules can show you the most used features and relevant code examples. If you are faced with a particular edge case problem knowing the less known features of Puppet can help. Examples follow.

For example, if you manage files with credentials or private certificates, then knowing the show_diff parameter of the file type is helpful. It removes the diff output when the file changes. You can also prevent those files from being saved to the Puppet filebucket. To hide some resources or give them more visibility, use the log_level parameter. 'Debug' will remove visibility while 'warning' will enable extra visibility.

Collected resources are great, but you have to be careful about the way you collect them. Resources get a lot of tags by default. If you have a file resource in a profile_customer::reverse_proxy::vhost definition, it will get many tags: file, profile_customer::reverse_proxy::vhost, profile_customer, reverse_proxy, vhost. When you export a resource, be sure to manually add a "logical" tag that will not overlap with any other auto-added tag, or you might just collect too many resources.

When something does not match your expectation, you can use the fail() function, this will prevent the catalog from being applied to your host. This could be a much better failure scenario than getting catalogs partially applied.

Some resources types allow you do prune the unmanaged resources, like the yum repositories. It helps you know exactly what your end system will look like and that any extra addition will be removed. This is done with a special 'resources' type.

Ensure that your resources are really idempotent. You should be able to apply the same resource again and again without modification. A bad example of this can be found in the puppetlabs-puppetdb module, where the Ruby function to_yaml is used in a template. With Ruby 1.8.7, that function is not consistent across multiple runs, causing useless service restarts (see MODULES-1335).

Exec resources are special and a continuous source of failure. The refresh/notify pattern is potentially dangerous because if a dependency fail or a Puppet run is interrupted, it can lead to that refresh being lost forever. If the exec itself fails Puppet will not run it again on the next run, by design. To avoid that, make the exec idempotent. Prefer unless over subscribe/notify - you can even use an array as unless or onlyif. Exec resources should be small and fast - too complicated execs should be rewritten as custom types and providers.

Conclusion

Puppet comes with great power and will quickly enable and help you in your day to day work. It will also help large teams work better by providing visibility and change tracking for infrastructure management. One of the best abilities you get is being able to git-bisect a feature or code change, quickly finding exactly why and when it was implemented.

The biggest change I've seen over the years is Puppet getting a lot competitors. This has happened not only at the technology level, but also at a higher, more abstract level. New services and tools allow you to run automated infrastructure in different ways than traditional configuration management. It is always great to have more choice - and the open source world is big enough to welcome competition. However, in most of the cases, the price to switch from one tool or one way of working to another is high. If you invested heavily into automation less than 10 years ago, are you ready for another pivot point?

During these 5 years the Puppet experience and world has been evolving fast. While the basics can be established for a few years at a time, it is not easy to make everyone understand the larger vision you want for your infrastructure. In many cases shortcuts tend to be used as short term quick fixes. It is very easy to ignore or forget about the costs of normalizing all those shortcuts once you have no other choice.

Any Puppet journey starts with a cultural shift more than a technical one. The general concepts of automation are now established knowledge but we should not forgot all the side lessons we've learned besides that. The biggest failures are coming when you do not share your experiences - or when you think that something is not important enough to spend time doing it correctly.

No comments :