December 25, 2014

Day 25 - Windows has Configuration Management?!?

Written by: Steven Murawski (@stevenmurawski)
Edited by: William Shipway (@shipw)

Windows Server administration has long been the domain of “admins” mousing their way through a number of Microsoft and third party management UIs (and I was one of them for a while). There have always been a stalwart few who, by hook or by crook, found a way to automate the almost unautomateable. But this group remained on the fringes of Windows administration. They were labeled as heretics and shunned, until someone needed to do something not easily accomplished by a swipe of the mouse.

The sea winds have shifted and over the past seven or eight years, Microsoft released PowerShell and began focusing on providing a first class experience to the tool makers and automation-minded. The earlier group of tool makers and automators gained traction and began to develop a larger following, as more Microsoft and third party products added support for PowerShell. That intrepid group of early automators formed the core of the PowerShell community and began welcoming new converts - whether they were true believers or forced into acceptance by the lack of some capability in their comfortable management UIs. Now, most Windows Server administrators have delved into the command line and have begun to succumb to the siren call of automation.

Just as the PowerShell community’s evangelism was reaching a fevered pitch, Microsoft added another management tool - Desired State Configuration. The tool-makers and automators were stunned. Cries of “what about my deployment scripts?” and “but, I already built my VM templates!” echoed through the halls. Early adopters of PowerShell v3 lamented “isn’t this what workflows were for?”. Some had already begun to explore the dark arts of configuration management using tools like Chef and Puppet to bring order to their infrastructure management. With the help of those in the community who blazed a trail in implementing configuration management on Windows, those cries of dismay began to turn into rabid curiosity and even envy. The administrators began to read books like the Phoenix Project and hear stories from companies like Stack Exchange, Etsy, Facebook, and Amazon about this cult of DevOps. They wanted access to this new realm of possibilities, where production deployments don’t mean a week of late nights in the office and requests for new servers don’t go to the bottom of the pile to sit for a month to “percolate”.

Read on, dear reader to understand the full story of Desired State Configuration and its place in the new DevOps world where Windows Server administrators find themselves.

An Introduction to Desired State Configuration

With the release of Windows Server 2012 R2 and Windows Management Framework 4, Microsoft introduced Desired State Configuration (DSC). DSC consists of three main components: the Local Configuration Manager, a configuration Domain Specific Language (DSL), and resources (with a pattern for building more). DSC is available on Windows Server 2012 R2 and Windows 8.1 64 bit out of the box and can be installed on Windows Server 2012, Windows Server 2008 R2, and Windows 7 64 bit with Windows Management Framework 4. There is an evolving ecosystem around Desired State Configuration, including support for a number of systems management and deployment projects. To me, one of the most important benefits of the introduction of Desired State Configuration is the awakening of the Windows administration community to configuration management concepts.

A Platform Play

The inclusion of Desired State Configuration may seem like a slap in the face to existing configuration management vendors, but that is not the case. Desired State Configuration is a platform level capability similar to PerfMon or Event Tracing for Windows. DSC is not intended to wholesale replace other configuration management platforms, but to be a base which other platforms can build on in a consistent manner.

The Evolution of DSC

One of the major knocks against administering Windows servers in the past has been the horrendous story around automation. Command-line tools were either lacking coverage or just plain missing. The shell was in a sorry state.

Then, shortly before Windows Server 2008 shipped, PowerShell came about. Initially, PowerShell had relatively poor native coverage for managing Windows, but it worked with .NET, WMI, and COM, so it could do just about anything you needed.

More coverage was introduced with each release of Windows Server. Windows Server 2012 had an explosion of coverage via native PowerShell commands for just about everything on the platform.

PowerShell appeared to be the management API for configuring Windows servers. The downside of a straight PowerShell interface is that PowerShell commands aren’t necessarily idempotent. Some like Add-WindowsFeature are, and do the right thing if the command is run repeatedly. Others are not, like New-Website, which will throw errors if the site already exists.

DSC was introduced to provide a common management API that offers consistent behavior. Under the covers, it is mostly PowerShell that is running, but the patterns the resources follow ensure that only the work that needs to be done is done, and when a resource is in the proper state, that it is left alone.

Being a platform feature means that there is a consistent, supported mechanism for customers and vendors to manage and evolve the configured state of Windows servers.

Standards Based

Desired State Configuration was built using standards already supported on the Windows platform - CIM and WSMAN.

CIM, Common Information Model, is the DMTF standard that WMI is based upon and provides structure and schema for DSC.

WSMAN, WS-Management, is a web services protocol and DMTF standard for management traffic. WinRM and PowerShell remoting are built on this transport as well.

While these might not be the greatest standards in the world, they do provide a consistent manner for interacting with the Desired State Configuration service.

An Evolving API

Though Windows Management Framework (WMF) was just recently introduced (it has been released for just over a year), WMF 5 development is well under way and includes many enhancements and bug fixes. One major change is to make the DSC engine’s API more friendly to use by third-party configuration management systems.

There was also a recent rollup patch for Server 2012 R2 (KB3000850) that contains a number of bugfixes and some tweaks for ensuring compatibility with changes coming in WMF 5.

Diving In

Now that we’ve got a bit of history and rationale for existence out of the way, we can dig in to the substance of Desired State Configuration.

The Local Configuration Manager

The engine that manages the consistency of a Windows server is the Local Configuration Manager (LCM). The LCM is exposed as a WMI (CIM) class (MSFT_DscLocalConfigurationManager) in the Root/Microsoft/Windows/DesiredStateConfiguration namespace.

The LCM is responsible for periodically checking the state of resources in a configuration document. This agent controls

  • whether resources are allowed to reboot the node as part of a configuration cycle
  • how the agent should treat deviance from the configuration state (apply and never check, apply and report deviance, apply and autocorrect problems)
  • how often consistency checks should be run
  • and more…

It has a plugin/extension point with the concept of Download Managers. Download Managers are used for Pull mode configurations. There are two download managers that ship in the box, one using a simple REST endpoint to retrieve configurations and one using a SMB file share. As it currently stands, these are not open for replacement by third parties (but it could be made so - please weigh in to the PowerShell team about that before WMF 5 is done!).

A Quick Note - Push vs. Pull

DSC configurations can be imperatively pushed to a node (via the Start-DscConfiguration cmdlet or directly to the WMI API), or if a Download Manager is configured it can pull a configuration and resources from a central repository (currently either SMB file share or REST-based pull server). If a node is in PULL mode, when a new configuration is retrieved, it is parsed to find the various modules required for the configuration to be applied. If any of the requisite modules and versions are not present on the local node, the pull server can supply those.

DSC Resources

Resources are the second major component of the DSC ecosystem, and are what make things happen in the context of DSC. There are three ways of creating DSC resources: They can be written in PowerShell, as WMI classes, or in Windows Management Framework 5, as PowerShell classes. As PowerShell class-based resources are still an experimental feature and the level of effort to create WMI based resources is pretty high, we’ll focus on PowerShell-based resources here.

DSC resources are implemented as PowerShell modules. They are hosted inside another PowerShell module under a DSCResources folder. The host module needs to have a module metadata file and have a module version defined in order for it to host DSC resources.

The resources themselves are PowerShell modules that expose three functions or cmdlets:

  • Get-TargetResource
  • Test-TargetResource
  • Set-TargetResource

Get-TargetResource returns the currently configured state (or lack thereof) of the resource. The function returns a hashtable that the LCM converts to an object at a later stage.

Test-TargetResource is used to determine if the resource is in the desired state or not. It returns a boolean.

Set-TargetResource is responsible for getting the resource into the desired state. Set-TargetResource is only executed after Test-TargetResource.

The Configuration DSL

Also introduced with Desired State Configuration are some domain specific language extensions on top of PowerShell. Actually, Windows Management Framework 4 added some public extension points in PowerShell for creating new keywords, which is what DSC uses.

Stick with me here, as it may get a bit confusing - I’ll be using “configuration” in two contexts. First is the configuration script. This is defined in PowerShell and can be defined in a script file, a module, or an ad hoc entry at the command line. The second use of “configuration” is in the context of the configuration document. This is the final serialized representation of the configuration for a particular machine or class of machines. This document is in Managed Object Format (MOF) and is how CIM classes are serialized.

The first keyword defined is configuration. The configuration keyword indicates that the subsequent scriptblock will be a configuration document and should be parsed differently. All your standard PowerShell constructs and commands are valid inside of a configuration, as are a few new keywords. There are two static keywords and a series of dynamic keywords in a configuration document.

The first two static keywords are node and Import-DscResource. I’ll deal with the latter first, since it seems very oddly named. Import-DscResource looks in name like a cmdlet or function, but is a keyword that is valid only in a configuration document and only outside of the context of a node. Import-DscResource identifies custom and third-party modules to make available in a configuration document. By default, only DSC resources in modules located at $pshome/modules (usually c:\windows\system32\windowspowershell\v1.0\modules) can be used without using Import-DscResource and specifying which modules to make resources available from. The second static keyword is the node keyword. Node is used to identify the machine or class of machines that the configuration is targeted at. Resources are generally assigned inside node declarations.

The configuration also includes a number of potential dynamic keywords which represent the DSC resources available for the configuration.

An example configuration script looks something like:

configuration SysAdvent
{
    Import-DscResource -ModuleName cWebAdministration

    node $AllNodes.where({$_.role -like 'web'}).NodeName
    {
      windowsfeature IIS
      {
        Name = 'web-server'
      }

      cWebsite FourthCoffee
      {
        Name = 'FourthCoffee'
        State = 'Started'
        ApplicationPool = 'FourthCoffeeAppPool'
        PhysicalPath = 'c:\websites\fourthcoffee'
        DependsOn = '[windowsfeature]IIS'
      }

    }
}

The above configuration script, when run, creates a command in the current PowerShell session called SysAdvent. Running that command will generate a configuration document for every server in a collection that has the role of a web server. The configuration command has a common parameter of ConfigurationData which is where AllNodes comes from (more on that in a bit). The result of this command will be a MOF document describing the desired configuration for every node identified as a web server.

MOF documents created by the command are written in a folder (of the same name as the configuration) created in the current working directory. Files are named for the node they represent (e.g. server1.mof). You can specify a custom output location. Here is our newly created MOF document:

/*
@TargetNode='localhost'
@GeneratedBy=Administrator
@GenerationDate=12/22/2014 04:12:56
@GenerationHost=ARMORY
*/

instance of MSFT_RoleResource as $MSFT_RoleResource1ref
{
SourceInfo = "::7::7::windowsfeature";
 ModuleName = "PSDesiredStateConfiguration";
 ModuleVersion = "1.0";
 ResourceID = "[WindowsFeature]IIS";
 Name = "web-server";

 ConfigurationName = "SysAdvent";

};
instance of PSHOrg_cWebsite as $PSHOrg_cWebsite1ref
{
ResourceID = "[cWebsite]FourthCoffee";
 PhysicalPath = "c:\\websites\\fourthcoffee";
 State = "Started";
 ApplicationPool = "FourthCoffeeAppPool";
 SourceInfo = "::12::7::cWebsite";
 Name = "FourthCoffee";
 ModuleName = "cWebAdministration";
 ModuleVersion = "1.1.1";

DependsOn = {

    "[windowsfeature]IIS"};

 ConfigurationName = "SysAdvent";

};
instance of OMI_ConfigurationDocument
{
 Version="1.0.0";
 Author="Administrator";
 GenerationDate="12/22/2014 04:12:56";
 GenerationHost="ARMORY";
 Name="SysAdvent";
};

Other Tidbits

There are a few other things one should know in preparation for digging into DSC.

ConfigurationData and AllNodes

Configurations have support for a convention-based approach to separating environmental data from the structural configuration. The configuration script represents the structure or model for the machine, and the environmental data (via ConfigurationData) fleshes out the details.

ConfigurationData is represented by a hashtable with at least one key - AllNodes. AllNodes is an array of hashtables representing the nodes that should have configurations generated and becomes an automatic variable that can be referenced in the configuration (like in the example above). The value provided in $ConfigurationData is also referenced and you can create custom keys and reference those in your configuration document. The PowerShell team reserves the right to use any key in the ConfigurationData hashtable that is prefixed with PS.

Example:

$ConfigurationData = @{
  AllNodes = (
      @{NodeName = '*', InterestingData = 'Every node can reference me.'}
      @{NodeName = 'Server1'; Role = 'Web'},
      @{NodeName = 'Server2'; Role = 'SQL'},
  )
}

Sysadvent -ConfigurationData $ConfigurationData
DependsOn

Resources in DSC are not ordered by default and there is no guarantee of ordering. The current WMF 4 implementation and the previews of WMF 5 all seem to serially process resources, but there is NO guarantee that will stay that way. If you need things to happen in a certain order, you need to use DependsOn to tell a resource what needs to happen first before that one can execute.

Node Names

In PUSH mode, the node name is either the server name, FQDN, or IP address (any valid way you can address that node via PowerShell remoting).

In PULL mode, the node name is not the server name. Servers are assigned a GUID and they use that to identify which configuration to retrieve from a pull server. Where this GUID comes from is up to you - you can generate them on the fly, pull one from AD, or use one from another system. Since the GUID is the identifier, you can use one GUID to represent an individual server or a class of servers.

WMF 5 - In Production

If you are running Windows Server 2012 R2, you can stay on the bleeding edge AND get production support. The PowerShell team recently announced that if you are using WMF 5, you can get production support for what they call “stable” designs - those features that either existed in previous versions of the Management Framework or have reached a level that the team is ready to provide support. Other features, which are more in flux, are labeled experimental and don’t carry the same support level. With this change, you can safely deploy WMF 5 and begin to test new features and get the bug fixes faster than waiting for the full release. WMF previews are released roughly quarterly.

With WMF 5, you can dig into new and advanced features like Debug mode, partial configurations, and separate pull servers for different resource types.

Building an Ecosystem

No tooling is complete without a community around it and Desired State Configuration is no different.

PowerShellGet and OneGet

OneGet and PowerShellGet are coming onto the scene with WMF 5 (although after they release they should be available somewhat downlevel too). OneGet is a package manager manager and provides an abstraction layer on top of things like nuget, chocolatey, and PowerShellGet, and eventually tools like npm, RubyGems, and more. PowerShellGet provides a way to publish and consume external modules, including those that contain DSC resources.

Finding new resources becomes as easy as:

Find-Module -Includes DscResource

Third Parties

Chef

Back in July 2014, Chef made a preview of our DSC integration available (video, cookbook) and in September shipped our first production-supported integration (the dsc_script resource) and have more coming. DSC offers Chef increased coverage on the Windows platform.

ScriptRock

The guys at ScriptRock (full disclosure - they are friends of mine) have done a pretty interesting thing by taking a configuration visualization and testing tool and offering an export of the configuration as a DSC script. Very cool.

Puppet

There is a Puppet module on the Forge showing some DSC integration. I’m not too familiar with the state of that project, but it’s great to see it!

Aditi

Brewmaster from Aditi is a deployment tool and can leverage DSC to get a server in shape to host a particular application, allowing you to distribute a DSC configuration with an application.

PowerShell.Org

PowerShell.Org hosts a DSC Hub containing forums, blog posts, podcasts, videos and a free e-book on DSC.

So, What Are You Waiting For?

Start digging in! There’s a ton of content out there. Shout at me on Twitter (@stevenmurawski) or via my blog if you have any questions.

December 24, 2014

Day 24 - 12 days of SecDevOps

Written by: Jen Andre (@fun_cuddles)
Edited by: Ben Cotton (@funnelfiasco)

Ah, the holidays. The time of year when we want to be throwing back the eggnogs, chilling in front our fake fireplaces, maybe catching a funny Christmas day movie… but oh no we can’t, because guess what, a certain entertainment company was held hostage by a security breach the likes of which corporate America has never seen before… and no more movie for you.

It’s an interesting time to be a security defender. The recent Sony breach has just put a period on the worst-of-the-worst scenarios that us tinfoil-hat, paranoid security people have been ranting about all along: one bad breach could be business shattering.

But let’s step back, and look at the theme of this blog: the 12 days of SecDevOps. Besides being a ridiculous title that I’m 90% sure my ops director chose specifically as a troll for me (thanks, Pete), it underlines an important concept. Whether `security` is in your job title or not, operations is increasingly becoming the front-line for implementing security defenses.

Given that reality, and the fact that security breaches are NOT going away, and that most of us don’t have yacht-sized security budgets, I thought it would be interesting to come up with 12 practical, high-impact things that small organizations could be doing to shore up their security posture.

Day 1: Fear and Loathing and Risk Assessment and Hipsters

Risk assessment. It’s not just some big words auditors love to use. It’s simply weighing the probability of bad things happening against the cost to mitigate the risk of that bad thing happening. And using that to make good security decisions as you make day-to-day architecture and ops choices:

risk = (threat) x (probability) x (business impact)*

*whoever told you there would be no math lied to you

You may not be aware of it, but as an ops person you are likely doing risk assessment already, except more likely around things like uptime and reliability. Consider this scenario:

  • John, the web guy, proposes replacing PostgreSQL with SomeNewHipsterDB.
  • You ask yourself, ‘huh, what’s the chances that I’m going to get paged at 3am because writes stop happening and my web site starts screaming in pain?’ You are probably not having warm-fuzzy feelings about this plan.
  • Your development and ops team evaluates the benefits to the engineering team and business for switching to SomeNewHipsterDB and weighs it against the probability that you are going to get woken up all of the time, and the impact it will have on your sunny disposition and decide that yeah… maybe not gonna do it.
  • Or, you do, except you mitigate this risk by saying ‘John, you will be forever paged for all SomeNewHipsterDB issues. Done.’

Cool. Now do this for security. Every time you are making architecture choices, or changing configuration of your infrastructure, or considering some new third-party service SaaS you’ll be sending data to, you should be asking yourself: what’s the impact if that service or system gets hacked? How will you mitigate the risks?

This doesn’t have to be a formal or fancy report. It can be a running text file or spreadsheet with all of the possible points of failure. Get everyone involved with thinking of ways pieces of the infrastructure or organization can be hacked, and ways you are protected against those worst-case scenarios. It can be like ‘ANYONE WHO OWNS OUR CHEF SERVER COULD DESTROY EVERYTHING [but we have uber-monitoring and Jane over there reviews audit logs daily]’. Start with the scenario: what if… ? and have conversations with engineers and business owners defend why what we’re doing is good enough. Make security a fundamentally collaborative process.

Day 2: Shared Secrets: Figure it Out Now

There’s 3 things in life that are inevitable: death, taxes… and the fact that a sales guy left to his own devices will always put all of his passwords in a plain text file (or if fancy, an Excel spreadsheet).

The lesson is this: password management isn’t something that just the technical team decides and manages for itself. We should be advocating organization-wide education on managing credentials, because guess what? Access to Salesforce, Gmail, and all of these SaaS services with sensitive business data are being used by people who are not engineers.

Solution? As part of every employee’s onboarding process, install password management on an employee’s workstation, and show them how to use it (e.g. 1Password or LastPass, or whatever your tool of choice is). Start doing this from the outset, as it’s best to figure this out on Day 1 rather than 200 employees in.

Day 3: Shared Secrets for Infrastructure, Too

When it comes to infrastructure secrets, there are extra concerns because in most cases, systems needs to be able to access these secrets in a non-interactive, automated way (e.g. I need to be able to spin up an app server that knows how to authenticate to my database).

If all of your infra passwords start unencrypted somewhere in a git repo, You Are Going To Have A Bad Time. Noah has a good article on various options for managing shared secrets in your infrastructure.

Day 4: Config Management On All Of The Things (So You Aren’t Sweating from Shell Shocks)

This should be obvious to everyone who drinks from the DevOps Koolaid, but CM has done beautiful things for patch management. It may be tempting to deploy a one-off box used for dev manually without config management installed, but guess what? In the case of Browser Stack, that turned out to be a massive achilles heel.

Making the process easy for devs to get access to the infrastructure they need (while giving you the ability to manage systems) is key. Do this right away.

Day 5: Secure your Development Environments (Because No One Else Will)

If left to their own devices, development environments tend to veer to chaotic. This isn’t just because developers are lazy (and as a developer, I mean this in the nicest possible way) but because of the nature of the prototyping and testing process.

From a security perspective, this all means bad juju (see Browser Stack example above). I can assure you that if you start building your prototype or dev infrastructure exposed to the public internet, deploying it without even the basic config management, it will stay that way forever.

So: if you are using AWS, start with an Amazon VPC with strict perimeter security, and require VPN access for any development infrastructure. Get some config management on everything, even if it’s just for system patches.

Put some bounds around the chaos early on, and this will make it easy to mature the security controls as the product and organization mature.

Day 6: 2-Factor all of the things (well, the important things)

Require 2-factor wherever you can. Google Apps has made enforcing this super easy, and technologies like DuoSecurity and YubiKey make adding 2-factor to your critical infrastructure (e.g., your VPN accounts) far, far less annoying than it used to be.

Day 7: Encrypt your Emails (and other communications)

Encrypt your emails. It’s annoying to set up, but guess what? Hackers just love to post juicy stuff on pastebin. Again, from Day 1, help every single employee configure PGP or SMIME encryption as part of the onboarding process. Once installed, it’s relatively painless to use (as long as you don’t mind archaic mail clients from 1999).

This is especially important to drill into executives because they tend to have more sensitive emails (e.g. their private boardroom chatter), and are particularly susceptible to phishing style-attacks. With the recent Sony email leaks, you now have some leverage. You can throw the ‘Angelina Jolie’ emails in front of them and ask: how much do you think business and reputations would suffer were their entire email archives publically disclosed via a breach?

For many of us, chat is as crucial as email in terms of the type of reputation-critical information we put there. It may not be reasonable to switch to a self-hosted chat solution, but in that case, ensure you are picking a service that helps YOU mitigate your risk. E.g., do you need all of the history? Do you need private history for user chats?

Day 8: Security Monitoring: Start Small, Plan Big

Put the infrastructure in place to collect as much security data as possible, then start slowly making potential security issues visible by adding reports and alerts that deal with threat scenarios you are most worried about.

Start small. Remember that risk assessment list you made? Identify what you are most afraid of (um, that PHP CMS that has hundreds of vulnerabilities reported per year? Your VPN server?) and tackle monitoring for those items first.

Instrumenting your infrastructure from day 1 for security monitoring (even if it’s just collecting all of the system and application logs) puts you in a good position later on to start sophisticated reporting and intrusion detection on that data.

Day 9: Code/Design Reviews

Although there have been a lot of advancement in static and dynamic source code analysis tools (which you can integrate right into your CI process), a good-old fashion code review by a human being goes a long way. If you’re using GitHub, just make it part of the development workflow and testing pipeline. Whenever changes are made to authentication or authorization, have someone look for automated tests that deal with those cases.

Day 10: Test Your Users

Phish yourself regularly. It’s really easy to do, and can be illuminating to the rest of the business which may not be as technical as the operations/engineering side, and not really understand really the impact of opening an attachment in an email or not checking URLs where they are logging into a website. You can use some open source tools, but are also many services now that you can pay to do this for you.

Day 11: Make an Incident Response Plan Now

So, you see something odd in your logs. Like, Bob your DBA ran a Postgres backup on production DB, tar’d it up, and sent it to an FTP server in Singapore. Bob lives in Reston VA, and this is definitely not normal. You start seeing evidence of other weird stuff ‘bob’ is doing that he shouldn’t be.

What now? Do you email Bob and say ‘something weird is happening?’ Do you call the Director of Ops? Do you put a message in a lonely chat room?

Figure out a plan for escalating possible critical security issues. Doesn’t have to be fancy or use specialize ITIL incident response workflow tools. Make a group in PagerDuty. Have an out-of-band channel for communicating details, in case your normal network goes the way of Sony and is totally compromised or just plain is not working. Maybe it’s as simple as an email list that doesn’t use the corporate email accounts, or a conference bridge everyone can hop on.

Day 12: Don’t be the Security ‘A**hole’

You. Yes, you. Don’t be the security a**hole that gets in everyone’s way and loses sight of the real reason for everyone’s existence: to run a business. You can be the security champion without being the blocker. In fact, that’s the only way to be effective. If a user is coming to you and saying ‘this is really really annoying, I don’t want to do it’ - listen to them. Too many security personnel disregard the usability issue of security controls for the sake of security theater, which leads to (unsurprisingly) abandonment, cynicism, and apathy when it comes to real security concerns.

DevOps is really a philosophy: it’s not a job title, or a set of tools, it’s the concept of using modern tools and processes to facilitate collaboration with the engineers who deliver the code and those who must maintain it. Um, that was a lot of words, but the key word is collaboration. It’s no longer acceptable to throw ‘security over the wall’ and expect your users and ops people to just do what you say.

The best security cultures are not prescriptive, they are collaborative. They understand that business needs to get done. They are intellectually honest and admit ‘yeah, we could get hacked’ - but what can we do about this in a way that doesn’t bring everything to a halt? Zane Lackey has a great talk on building a modern security engineering organization that expounds many of these ideas, and more.

December 23, 2014

Day 23 - The Importance of Pluralism, or The Danger of the Letter "S"

Written by: Mike Fiedler (@mikefiedler)
Edited by: Hugh Brown (@saintaardvark)

Prologue: A Concept

One aspect of Chef that’s confusing to people comes up when searching for nodes that have some attribute: just what is the difference between a nodes reported ‘role’ attribute, and its ‘roles’ attribute? It seems like it could almost be taken for a typo – but underlying it are some very deep statements about pluralism, pluralization, and the differences between them.

One definition of the term ‘pluralism’ is “a condition or system in which two or more states, groups, principles, sources of authority, etc., coexist.” And while pluralism is common in descriptions of politics, religion and culture, it also has a place in computing: to describe situations in which many systems are in more than one desired state.

Once a desired state is determined, it’s enforced. But then time passes – days, minutes, seconds or even nanoseconds – and every moment has the potential to change the server’s actual state. Files are edited, hardware degrades, new data is pulled from external sources; anyone who has run a production service can attest to this.

Act I: Terms

Businesses commonly offer products. These products may be composed of multiple systems, where each system could be a collection of services, which run on any number of servers, which run on some amount of hosts. Each host, in turn, provides another set of services to the server that makes up part of the system, which then makes up part of the product, which the business sells.

An example to illustrate: MyFace offers a social web site (the product), which may need a web portal, a user authentication system, index and search systems, long-term photo storage systems, and many more. The web portal system may need servers like Apache or Nginx, running on any number of instances. A given server-instance will need to use any number of host services, such as /o, cpu, memory and more.

So what we loosely have is: products => systems => services => servers => hosts => services. (Turtles, turtles, turtles.)

In Days of Yore, when a Company ran a ‘Web Site’, they may have had a single System, maybe some web content Service, made up of a web Server, a database Server (maybe even on the same host) - both consuming host services (CPU, Memory, Disk, Network) - to provide the Service the Company then sells, hopefully at a profit (right!?).

Back then, if you wanted to enact a change on the web and database at the same time (maybe release a new feature), it was relatively simple, as you could control both things in one place, at roughly the same time.

Intermission

In English, to pluralize something, we generally add a suffix of “s” to the word. For instance, to convey more than one instance, “instance” becomes “instances”, “server” becomes “servers”, “system” becomes “systems”, “turtle” becomes “turtles”.

We commonly use pluralization to describe the concept of a collection of similar items, like “apples”, “oranges”, “users”, “web pages”, “databases”, “servers”, “hosts”, “turtles”. I think you see the pattern.

This extends even in to programming languages and idiomatic use in development frameworks. For example, all tables in a Rails application will typically pluralize a table name for objects named Apple to apples.

This emphasizes that the table in question does not store a singular Apple, rather many Apple instances will be located in a table named apples.

This is not pluralism, this is pluralization - don’t get them confused. Let’s move on to the next act.

Act II: Progress

We’ve evolved quite a bit since the Days of Yore. Now, a given business product can span hundreds or even thousands of systems of servers running on hosts all over the world.

As systems grow, it becomes more difficult to enact a desired change at a deterministic point in time across a fleet of servers and hosts.

In the realm of systems deployment, many solutions perform what has become known as “test-and-repair” operations - meaning that when provided a “map” desired state (which typically manifests in human-written and readable code), that when executed, will “test the current state of a given host, and perform and ”repair" operations to bring the host to the desired state - whether it be installing packages, writing files

Each system calls this map something different - cfengine:policies, bcfg2:specifications, puppet:modules, chef:recipes, ansible:playbooks, and so on. While they don’t always map 1:1, they all have some sort of concept for ‘Things that are similar, but not the same.’ They will have unique IP addresses, hostnames, while sharing enough of a set of common features to become termed something like “web heads” or the like.

Act III: Change

In the previous sections, I laid the groundwork to understand one of the more subtle features in Chef. This feature may be available in other services, but I’ll describe the one I know.

Using Chef, there is a common deployment model where Chef Clients check in with a Chef Server to ask “What is the desired state I should have?” The Chef terminology is ‘a node asks the server for its run list’.

A run list can contain a list of recipes and/or roles. A recipe tells Chef how to accomplish a particular set of tasks, like installing a package or editing a file. A role is typically a collection of recipes, and maybe some role-specific metadata (‘attributes’ in Chef lingo).

The node may be in any state at this point. Chef will test for each desired state, and take action to enforce it: install this package, write that file, etc. The end result should either be “this node now conforms to the desired state” or “this node was unable to comply”.

When the node completes successfully, it will report back to Chef Server that “I am node ‘XYZZY’, and my roles are ‘base’ and ‘webhead’, my recipes are ‘base::packages’, ‘nginx’, ‘webapp’” along with a lot of node-specific metdata (IP addresses, CPU, Memory, Disk, and much more).

This information is then indexed and available for others to search for. A common use case we have is where a load balancing node will perform a search for all nodes holding the webhead role, and add these to the balancing list.

Pièce de résistance, or Searching for Servers

In a world where we continue to scale and deploy systems rapidly and repeatedly, we often choose to reduce the need for strong consistency amongst a cluster of hosts. This means we cannot expect to change all hosts at the precise same moment. Rather we opt for eventual consistency: either all my nodes will eventually be correct, or failures will occur and I’ll be notified that something is wrong.

This changes how we think about deployments, and more importantly, how do we use our tools to find other nodes.

Using Chef’s search feature, a search like this:

webheads = search(:node, 'role:webheads')

will use the node index (a collection of node data) to look for nodes with the webheads role in the node’s run list - this will also return nodes that have not yet completed an initial Chef run and reported the complete run list back to Chef Server.

This means that my load balancer could find a node that is still mid-provisioning, and potentially begin to send traffic to a node that’s not ready to receive yet, based on the role assignment alone.

A better search, in this case might be:

webheads = search(:node, 'roles:webheads')

One letter, and all the difference.

This search now looks for an “expanded list” that the node has reported back. Any node with the role webheads that has completed a Chef run would be included. If the mandate is that only webhead nodes get the webhead role assigned to them, then I can safely use this search to include nodes that have completed their provisioning cycle.

Another way to use this search to our benefit is to search one axis and compare with another to find nodes that never completed provisioning:

badnodes = search(:node, 'role:webheads AND NOT roles:webheads')
# Or, with knife command line:
$ knife search node 'role:webheads AND NOT roles:webheads'

This will grab any nodes with an assignment but not a completion – very helpful when launching large amounts of nodes.

Note: This is not restricted to roles; this also applies to recipe/recipes. I’ve used roles here, as we use them heavily in our organization, but the same search patterns apply for using recipes directly in a run list.

Curtain

This little tidbit of role vs roles has proven time and again to be a confusing point when someone tries to pick up more of Chef’s searching abilities. But having both adjectives describe a state of the node is helpful in making a determination of what state the node is in, and whether it should be included in some other node’s list (such as in the loadbalancer/webhead example from before).

Now, you may argue against the use of roles entirely, or the use of Chef Server and search, and use something else for service discovery. This is a valid argument - but be careful you’re not tethering a racehorse to a city carriage. If you don’t fully understand its abilities, someday it might run away on you.

Epilogue

A surgeon spends a lot of time how to use a sharpened bit of metal to fix the human body. While there are many instruments he or she will go on to master, the scalpel remains the fundamental tool, available when all else is gone.

While we don’t have the same risks involved as a surgeon, the tools we use can be more complex, and provide us with a large amount of power at our fingertips.

It behooves us to learn how they work, and when and how to use its features to provide better systems and services for our businesses.

Chef’s ability to discern between what a node has been told about itself, and what it reports about itself, can make all the difference when using Chef to accomplish complex deployment scenarios and maintain flexible infrastructure as code. This not only lets you accomplish fundamentals of service discovery and less hard-coded configurations, but lets you avoid the uncertainty of bringing in yet another outside tool.

On that note, Happy Holiday(s)!

December 22, 2014

Day 22 - Largely Unappreciated Applicability

Written by: John Vincent (@lusis)
Edited by: Joseph Kern (@josephkern)

I have had the privilege of writing a post for SysAdvent for the past several years. In general these posts have been focused on broader cultural issues. This year I wanted to do something more technical and this topic gave me a chance to do that. It’s also just REALLY cool so there’s that.

Nginx

I’m sure most people are familiar with Nginx but I’m going to provide a short history anyway. Nginx is a webserver created by Igor Sysoev around 2002 to address the C10K problem. The C10K problem isn’t really a “problem” anymore in the sense that is was originally. It’s morphed into the C10M problem. With the rise of the sensors, we may be dealing with a Cgazillion problem before we know it.

Nginx addressed this largely with an event loop (I know some folks think the event loop was invented in 2009). The dominant webserver at the time (and still), Apache, used a model of spawning a new process or a new thread for each connection. Nginx did it differently in that it spawned a master process (handles configuration, launching worker threads and the like) and a pool of workers each with their own event loop. Workers share no state between each other and select from a shared socket to process requests. This particular model works and scales very well. There’s more on the history of Nginx in its section in the second edition of AOSA. It’s a good read and while not 100% current, the basics are unchanged.

Lua

Lua is a programming language invented in 1993. The title of this article is a shout out to how underappreciated Lua is not only as a language but in its myriad uses. Most people who have heard of Lua know it as the language used for World of Warcraft plugins.

Lua is an interesting language. Anyone with experience in Ruby will likely find themselves picking it up very quickly. It has a very small core language, first-class functions and coroutines. It is dynamically typed and has one native data structure - the table. When you work in Lua, you will learn to love and appreciate the power of tables. They feel a lot like a ruby hash and are the foundation of most advanced Lua.

It has no classes, but they can be implemented after a fashion using tables. Since Lua has first-class functions, you can create a “class” by lumping data and function into a table. There’s no inheritence but instead you have prototypes (There’s a bit of sugar to help you out when working with these ‘objects’ - e.g calling foo:somefunc() to imply self as the first argument as opposed to foo.somefunc(self)).

For a good read on the language history - Wikipedia - Lua website

For some basics on the language itself, the wikipedia article has code sample and also the official documentation. There is also a section on Lua in the newest edition of the Seven Languages series - Seven More Languages in Seven Weeks

I’ve also written a couple of modules as well (primary for use with OpenResty):

If you want to see an example of how the “classes” work with Lua, take a look at the github example and compare the usage described in the README with the module itself.

Combining the two

As I mentioned, Lua is an easily embeddable language. I’ve been unable to find a date on when Lua support was added to Nginx but it was a very early version (~ 0.5).

One of the pain points of Nginx is that it doesn’t support dynamically loaded modules. All extended functionality outside the core must be compiled in. Lua support in Nginx made it so that you could add some advanced functionality to Nginx via Lua that would normally require a C module and a recompile.

Much of the Nginx api itself is exposed to Lua directly and Lua can be used at multiple places in the Nginx workflow. You can:

All of these are documented on the Nginx website.

For example, if I wanted to have the response body be entirely created by Lua, I could do the following in Nginx:

location /foo {
  content_by_lua '
  ngx.header.content_type = 'text/plain'
  local username = "bob"
  ngx.say("hello ", username)
  ngx.exit(ngx.HTTP_OK)
  '
}

This example would return hello bob as plain text to your browser when you requested /foo from Nginx.

Obviously escaping could get to be a headache here so most of the *_by_lua directives (which is for inlined Lua code in the Nginx config files) can be replaced with a *_by_lua_file where the Lua code is stored in an external file.

Some other neat tricks you have available are using the cosocket api where you can actually open arbitrary non-blocking network connections via Lua from inside an Nginx worker.

As you can see, this is pretty powerful. Additionally the Lua functionality is provided in Nginx via a project called LuaJIT which offers amazing speed and predicatable usage. By default, Lua code is cached in Nginx but this can be disabled at run-time to help speed up the development process.

Enter the OpenResty

If it wasn’t clear yet, the combination of Nginx and Lua basically gives you an application server right in the Nginx core. Others have created Lua modules specifically for use within Nginx and a few years ago an enterprising soul started bundling them up into something called OpenResty.

OpenResty combines checkpointed versions of Nginx, modified versions of the Lua module (largely maintained by the OpenResty folks anyway), curated versions of LuaJIT and a boatload of Nginx-specific Lua modules into a single distribution. OpenResty builds of Nginx can be used anywhere out-of-the-box that you would use a non-Lua version of Nginx. Currently OpenResty is sponsored by CloudFlare where the primary author, Yichun Zhang (who prefers to go by “agentzh” everywhere) is employed.

OpenResty is a pretty straightforward “configure/make/make install” beast. There is a slightly dated omnibus project on Github from my friend Brian Akins that we’ve contributed to in the past (and will be contributing our current changes back to in the future). Much of my appreciation and knowledge of Lua and OpenResty comes directly from Brian and his omnibus packages are how I got started.

But nobody builds system packages anymore

Obviously system packages are the domain of greyhaired BOFHs who think servers are for serving. Since we’re all refined and there are buzzword quotas to be maintained, you should probably just use Docker (but you have to say it like Benny says “Spaceship”).

Seriously though, Docker as a packaging format is pretty neato and for what I wanted to do, Docker was the best route. To that end I give you an OpenResty tutorial in a box (well, a container).

The purpose of this repo is to help you get your feet wet with some examples of using Lua with Nginx via the latest OpenResty build. It ships with a Makefile to wrap up all the Docker invocations and hopefully make things dead simple. It works its way up from the basics I’ve described all the way to communicating between workers via a shared dictionary, making remote API calls to Github, two Slack chat websocket “clients” and the skeleton of a dynamic load balancer in Nginx backed by etcd:

In addition, because I know how difficult it can be to develop and troubleshoot against code running inside Nginx I’ve created a web based repl for testing out and experimenting with the Nginx Lua API:

To use the basic examples in the container, you can simply clone the repo and run make all. This will build the container and then start OpenResty listening on port 3131 (and etcd on 5001 for one of the demos). The directory var_nginx will be mounted inside the container as /var/nginx and contains all the neccessary config files and Lua code for you to poke/prod/experiment with. Logs will be written to var_nginx/logs so you can tail them if you’d like. As you can see it also uses Bootstrap for the UI so we’ve pretty much rounded out the “what the hell have you built” graph.

Please note that while the repo presents some neat tricks, the code inside is not optimized by any stretch. The etcd code especially may have some blocking implications but I’ve not yet confirmed that. The purpose is to teach and inspire more than “take it and run it in prod”.

Advanced Examples

If you’d like to work with the Slack examples, you’ll need to generate a slack “bot” integration token for use. The Makefile includes support for running an etcd container appropriate for use with the tutorial container. If you aren’t a Slack user then here’s a screenshot so you can see what it WOULD look like:

Wrap up

Maybe this post has inspired you to at least take a look at OpenResty. Lua is a really neat language and very easy to pick up and add to your toolbelt. We use OpenResty builds of Nginx in many places internally from proxy servers to even powering our own internal SSO system based on Github Oauth and group memeberships. While most people simply use Nginx as a proxy and static content service, we treat it like an application server and leverage the flexibility of not requiring another microservice to handle certain tasks (in addition to using it as a proxy and static content service).

The combination of Nginx and Lua won’t replace all your use cases but by learning the system better, you can better leverage the use of Nginx across the board.

December 21, 2014

Day 21 - Baking Delicious Resources with Chef

Written by: Jennifer Davis (@sigje)
Edited by: Nathen Harvey (@nathenharvey)

Growing up, every Christmas time included the sweet smells of fresh baked cookies. The kitchen would get incredibly messy as we prepped a wide assortment from carefully frosted sugar cookies to peanut butter cookies. Holiday tins would be packed to the brim to share with neighbors and visiting friends.

Sugar Cookies

My earliest memories of this tradition are of my grandmother showing me how to carefully imprint each peanut butter cookie with a crosshatch. We’d dip the fork into sugar to prevent the dough from sticking and then carefully press into the cookie dough. Carrying on the cookie tradition, I am introducing the concepts necessary to extend your Chef knowledge and bake up cookies using LWRPs.

To follow the walkthrough example as written you will need to have the Chef Development Kit (Chef DK), Vagrant, and Virtual Box installed (or use the Chef DK with a modified .kitchen.yml configuration to use a cloud compute provider such as Amazon).

Resource and Provider Review

Resources are the fundamental building blocks of Chef. There are many available resources included with Chef. Resources are declarative interfaces, meaning that we describe the state we want the resource to be in, rather than the steps required to reach that state. Resources have a type, name, one or more parameters, actions, and notifications.

Let’s take a look at one sample resource, Route.

route “NAME” do
  gateway “10.0.0.20”
  action :delete
end

The route resource describes the system routing table. The type of resource is route. The name of the resource is the string that follows the type. The route resource includes optional parameters of device, gateway, netmask, provider, and target. In this specific example, we are only declaring the gateway parameter. In the above example we are using the delete action and there are no notifications.

Each Chef resource includes one or more providers responsible for actually bringing the resource to the desired state. It is usually not necessary to select a provider when using the Chef-provided resources, Chef will select the best provider for the job at hand. We can look at the underlying Chef code to examine the provider. For example here is the Route provider code and rubydoc for the class.

While there are ready-made resources and providers, they may not be sufficient to meet our needs to programmatically describe our infrastructure with small clear recipes. We reach that point where we want to reduce repetition, reduce complexity, or improve readability. Chef gives us the ability to extend functionality with Definitions, Heavy Weight Resources and Providers (HWRP), and Light Weight Resources and Providers (LWRP).

Definitions are essentially recipe macros. They are stored within a definitions directory within a specific cookbook. They cannot receive notifications.

HWRPs are pure ruby stored in the libraries directory within a specific cookbook. They cannot use core resources from the Chef DSL by default.

LWRPs, the main subject of this article, are a combination of Chef DSL and ruby. They are useful to abstract repeated patterns. They are parsed at runtime and compile into ruby classes.

LWRPs

Extending resources requires us to revisit the elements of a resource: type, name, parameters, actions, and notifications.

Idempotence and convergenence must also be considered.

Idempotence means that the provider ensures that the state of a resource is only changed if a change is required to bring that resource into compliance with our desired state or policy.

Convergence means that the provider brings the current resource state closer to the desired resource state.

Resources have a type. The LWRP’s resource type is defined by the name of the file within the cookbook. This implicit name follows the formula of: cookbook_resource. If the default.rb file is used the new resource will be named cookbook.

File names should match for the LWRP’s resource and provider within the resources and providers directories. The chef generators will ensure that the files are created appropriately.

The resource and it’s available actions are described in the LWRP’s resource file.

The steps required to bring the piece of the system to the desired state are described in the LWRP’s provider file. Both idempontence and convergence must also be considered when writing the provider.

Resource DSL

The LWRP resource file defines the characteristics of the new resource we want to provide using the Chef Resource DSL. The Resource DSL has multiple methods: actions, attribute, and default_action.

Resources have a name. The Resource DSL allows us to tag a specific parameter as the name of the resource with :name_attribute.

Resources have actions. The Resource DSL uses the actions method to define a set of supported actions with a comma separated list of symbols. The Resource DSL uses the default_action method to define the action used when no action is specified in the recipe.

Note: It is recommended to always define a default_action.

Resources have parameters. The Resource DSL uses the attribute method to define a new parameter for the resource. We can provide a set of validation parameters associated with each parameter.

Let’s take a look at an example of a LWRP resource from existing cookbooks.

djbdns includes the djbdns_rr resource.

actions :add
default_action :add

attribute :fqdn,     :kind_of => String, :name_attribute => true
attribute :ip,       :kind_of => String, :required => true
attribute :type,     :kind_of => String, :default => "host"
attribute :cwd,      :kind_of => String

The rr resource as defined here will have one action: add, and 4 attributes: fqdn, ip, type, and cwd. The validation parameters for the attribute show that all of these attributes are expected to be of the String class. Additionally ip is the only required attribute when using this resource in our recipes.

Provider DSL

The LWRP provider file defines the “how” of our new resource using the Chef Provider DSL.

In order to ensure that our new resource functionality is idempotent and convergent we need the:

  • desired state of the resource
  • current state of the resource
  • end state of the resource after the run
Requirement Chef DSL Provider Method
Desired State new_resource
Current State load_current_resource
End State updated_by_last_action

Let’s take a look at an example of a LWRP provider from an existing cookbook to illustrate the Chef DSL provider methods.

djbdns includes the djbdns_rr provider.

action :add do
  type = new_resource.type
  fqdn = new_resource.fqdn
  ip = new_resource.ip
  cwd = new_resource.cwd ? new_resource.cwd : "#{node['djbdns']['tinydns_internal_dir']}/root"

  unless IO.readlines("#{cwd}/data").grep(/^[\.\+=]#{fqdn}:#{ip}/).length >= 1
    execute "./add-#{type} #{fqdn} #{ip}" do
      cwd cwd
      ignore_failure true
    end
    new_resource.updated_by_last_action(true)
  end
end
new_resource

new_resource returns an object that represents the desired state of the resource. We can access all attributes as methods of that object. This allows us to know programmatically our desired end state of the resource.

type = new_resource.type assigns the value of the type attribute of the new_resource object that is created when we use the rr resource in a recipe with a type parameter.

load_current_resource

load_current_resource is an empty method by default. We need to define this method such that it returns an object that represents the current state of the resource. This method is responsible for loading the current state of the resource into @current_resource.

In our example above we are not using load_current_resource.

updated_by_last_action

updated_by_last_action notifies Chef that a change happened to converge our resource to its desired state.

As part of the unless block executing new_resource.updated_by_last_action(true) will notify Chef that a change happened to converge our resource.

Actions

We need to define a method for each supported action within the LWRP resource file. This method should handle doing whatever is needed to configure the resource to be in the desired state.

We see that the one action defined is :add which matches our LWRP resource defined actions.

Cooking up a cookies_cookie resource

Preparing our kitchen

First, we need to set up our kitchen for some holiday baking! Test Kitchen is part of the suite of tools that come with the Chef DK. This omnibus package includes a lot of tools that can be used to personalize and optimize your workflow. For now, it’s back to the kitchen.

Kitchen Utensils

Note: On Windows you need to verify your PATH is set correctly to include the installed packages. See this article for guidance.

Download and install both Vagrant, and Virtual Box if you don’t already have them. You can also modify your .kitchen.yml to use AWS instead.

We’re going to create a “cookies” cookbook that will hold all of our cookie recipes. First we will use the chef cli to generate a cookbook that will use the default generator for our cookbooks. You can customize default cookbook creation for your own environments.

chef generate cookbook cookies
Compiling Cookbooks...
Recipe: code_generator::cookbook

followed by more output.

We’ll be working within our cookies cookbook so go ahead and switch into the cookbook’s directory.

$ cd cookies

By running chef generate cookbook we get a number of preconfigured items. One of these is a default Test Kitchen configuration file. We can examine our kitchen configuration by looking at the .kitchen.yml file:

$ cat .kitchen.yml

---
driver:
  name: vagrant

provisioner:
  name: chef_zero

platforms:
  - name: ubuntu-12.04
  - name: centos-6.5

suites:
  - name: default
    run_list:
      - recipe[cookies::default]
    attributes:

The driver section is the component that configures the behavior of Test Kitchen. In this case we will be using the kitchen-vagrant driver that comes with Chef DK. We could easily configure this to use AWS or any other cloud compute provisioner.

The provisioner is chef_zero which allows us to use most of the functionality of integrating with a Chef Server without any of the overhead of having to install and manage one.

The platforms define the operating systems that we want to test against. Today we will only work with the CentOS platform as defined in this file. You can delete or comment out the Ubuntu line.

The suites is the area to define what we want to test. This includes a run_list with the cookbook::default recipe.

Next, we will spin up the CentOS instance.

Preheat Oven

Note: Test Kitchen will automatically download the vagrant box file if it’s not already available on your workstation. Make sure you’re connect to a sufficiently speedy network!

$ kitchen create

Let’s verify that our instance has been created.

$ kitchen list

➜  cookies git:(master) ✗ kitchen list
Instance             Driver   Provisioner  Last Action
default-centos-65    Vagrant  ChefZero     Created

This confirms that a local virtualized node has been created.

Let’s go ahead and converge our node which will install chef on the virtual node.

$ kitchen converge

Cookie LWRP prep

We need to create a LWRP resource and provider file and update our default recipe.

We create the LWRP base files using the chef cli included in the Chef DK. This will create the two files resources/cookie.rb and providers/cookie.rb

chef generate lwrp cookie

Let’s edit our cookie LWRP resource file and add a single supported action of create.

Edit the resources/cookie.rb file with the following content:

actions :create

Next edit our cookie LWRP provider file and define the supported create action. Our create method will log a message that includes the name of our new_resource to STDOUT.

Edit the providers/cookie.rb file with the following content:

use_inline_resources

action :create do
 log " My name is #{new_resource.name}"
end

Note: use_inline_resources was introduced in Chef version 11. This modifies how LWRP resources are handled to enable the inline evaluation of resources. This changes how notifications work, so read carefully before modifying LWRPs in use!

Note: The Chef Resource DSL method is actions because we are defining multiple actions that will be defined individually within the providers file.

We will now test out our new resource functionality by writing a recipe that uses it. Edit the cookies cookbook default recipe. The new resource follows the naming format of #{cookbookname}_#{resource}.

cookies_cookie "peanutbutter" do
   action :create
end

Converge the image again.

$ kitchen converge

Within the output:

Converging 1 resources
Recipe: cookies::default
  * cookies_cookie[peanutbutter] action create[2014-12-19T02:17:39+00:00] INFO: Processing cookies_cookie[peanutbutter] action create (cookies::default line 1)
 (up to date)
  * log[ My name is peanutbutter] action write[2014-12-19T02:17:39+00:00] INFO: Processing log[ My name is peanutbutter] action write (/tmp/kitchen/cache/cookbooks/cookies/providers/cookie.rb line 2)
[2014-12-19T02:17:39+00:00] INFO:  My name is peanutbutter

Our cookies_cookie resource is successfully logging a message!

Improving the Cookie LWRP

We want to improve our cookies_cookie resource. We are going to add some parameters. To determine the appropriate parameters of a LWRP resource we need to think about the components of the resource we want to modify.

Delicious delicious ingredients parameter

There are some basic common components of cookies. The essential components are fat, binder, sweetner, leavening agent, flour, and additions like chocolate chips or peanut butter. The fat provides flavor, texture, and spread of a cookie. The binder will help “glue” the ingredients together. The sweetener affects the color, flavor, texture, and tenderness of a cookie. The leavening agent adds air to our cookie changing the texture and height of the cookie. The flour provides texture as well as the bulk of the cookie structure. All of the additional ingredients differentiate our cookies flavoring.

A generic recipe would involve combining all the wet ingredients and dry ingredients separately and then blending them together adding the additional ingredients last. For now, we’ll lump all of our ingredients into a single parameter.

Other than ingredients, we need to know the temperature at which we are going to bake our cookies, and for how long.

When we add parameters to our LWRP resource, it will start with the keyword attribute, followed by an attribute name with zero or more validation parameters.

Edit the resources/cookie.rb file:

actions :create  

attribute :name, :name_attribute => true
attribute :bake_time
attribute :temperature
attribute :ingredients

We’ll update our recipe to incorporate these attributes.

cookies_cookie "peanutbutter" do
   bake_time 10
   temperature 350
   action :create
end

Using a Data Bag

While we could add the ingredients in a string or array, in this case we will separate them away from our code. One way to do this is with data bags.

We’ll use a data_bag to hold our cookie ingredients. Production data_bags normally exist outside of our cookbook within our organization policy_repo. We are developing and using chef_zero so we’ll include our data bag within our cookbook in the test/integration/data_bags directory.

To do this in our development environment we update our .kitchen.yml so that chef_zero finds our data_bags.

For testing our new resource functionality, add the following to the default suite section of your .kitchen.yml:

data_bags_path: "test/integration/data_bags"

At this point your .kitchen.yml should look like this.

$ mkdir -p test/integration/data_bags/cookies_ingredients

Create peanutbutter item in our cookies_ingredients data_bag by creating a file named peanutbutter.json in the directory we just created:

{
  "id" : "peanutbutter",
  "ingredients" :
    [
      "1 cup peanut butter",
      "1 cup sugar",
      "1 egg"
    ]
}

We’ll update our recipe to actually use the cookies_ingredients data_bag:

search('cookies_ingredients', '*:*').each do |cookie_type|
  cookies_cookie cookie_type['id'] do
    ingredients cookie_type['ingredients']
    bake_time 10
    temperature 350
    action :create
  end
end

Now, we’ll update our LWRP resource to actually validate input parameters, and update our provider to create a file on our node, and use the attributes. We’ll also create an ‘eat’ action for our resource.

Edit the resources/cookie.rb file with the following content:

actions :create, :eat

attribute :name, :name_attribute => true
# bake time in minutes
attribute :bake_time, :kind_of => Integer
# temperature in F
attribute :temperature, :kind_of => Integer
attribute :ingredients, :kind_of => Array

We’ll update our provider so that we create a file on our node rather than just logging to STDOUT. We’ll use a template resource in our provider, so we will create the required template.

Create a template file:

$ chef generate template basic_recipe

Edit the templates/default/basic_recipe.erb to have the following content:

Recipe: <%= @name %> cookies

<% @ingredients.each do |ingredient| %>
<%= ingredient %>
<% end %>

Combine wet ingredients.
Combine dry ingredients.

Bake at <%= @temperature %>F for <%= @bake_time %> minutes.

Now we will update our cookie provider to use the template, and pass the attributes over to our template. We will also define our new eat action, that will delete the file we create with create.

Edit the providers/cookie.rb file with the following content:

use_inline_resources

action :create do

  template "/tmp/#{new_resource.name}" do
    source "basic_recipe.erb"
    mode "0644"
    variables(
      :ingredients => new_resource.ingredients,
      :bake_time   => new_resource.bake_time,
      :temperature => new_resource.temperature,
      :name        => new_resource.name,
    )
  end
end

action :eat do

  file "/tmp/#{new_resource.name}" do
    action :delete
  end

end

Try out our updated LWRP by converging your Test Kitchen.

kitchen converge

Let’s confirm the creation of our peanutbutter resource by logging into our node.

kitchen login

Our new file was created at /tmp/peanutbutter. Check it out:

[vagrant@default-centos-65 ~]$ cat /tmp/peanutbutter
Recipe: peanutbutter cookies

1 cup peanut butter
1 cup sugar
1 egg

Combine wet ingredients.
Combine dry ingredients.

Bake at 350F for 10 minutes.

Peanut Butter Cookie Time

Let’s try out our eat action. Update our recipe with

search("cookies_ingredients", "*:*").each do |cookie_type|
  cookies_cookie cookie_type['id'] do
    action :eat
  end
end

Converge our node, login and verify that the file doesn’t exist anymore.

$ kitchen converge
$ kitchen login
Last login: Fri Dec 19 05:45:23 2014 from 10.0.2.2
[vagrant@default-centos-65 ~]$ cat /tmp/peanutbutter
cat: /tmp/peanutbutter: No such file or directory

To add additional cookie types we can just create new data_bag items.

Cleaning up the kitchen

Messy Kitchen

Finally once we are done testing in our kitchen today, we can go ahead and clean up our virtualized instance with kitchen destroy.

kitchen destroy

Next Steps

We have successfully made up a batch of peanut butter cookies yet barely touched the surface of extending Chef with LWRPs. Check out Chatper 8 in Jon Cowie’s book Customizing Chef and Doug Ireton’s helpful 3-part article on creating LWRP. You should examine and extend this example to use load_current_resource and updated_by_last_action. Try to figure out how to add why_run functionality. I look forward to seeing you share your LWRPs with the Chef community!

Feedback and suggestions are welcome iennae@gmail.com.

Additional Resources

Thank you

Thank you to my awesome editors who helped me ensure that these cookies were tasty!

December 20, 2014

Day 20 - The Pursuit of Learning through Bad Ideas

Written by: Michael Stahnke (@stahnma)
Edited by: Michelle Carroll (@miiiiiche)

I have a confession: I love terrible ideas. I really enjoy trying to think of the absolute worst way to solve problems, largely because being a contrarian is fun. Then I realized something — coming up with the exact wrong way to solve a problem is not only a good time, but can actually be helpful.

My love for sharing terrible ideas was codified when one of my teams (and several people from other areas inside engineering) decided to embrace this behavior and create “Bad Idea Monday.” After participating in several debates fueled by the the worst ideas available, some tangible benefits emerged.

Happy employees do better work. This has been proven countless times. What makes employees happy? Fun things, perks, benefits, and pay are up there, but in my experience, what really gets people engaged is learning. Encouraging and embracing new ways of learning are paramount to building the culture you want. Capturing the desire to talk about the worst ways to solve your problems provide a lot of fresh opportunities to learn.

The worst can make you better

As you throw out the absolute worst idea possible to solve something, several outcomes can occur.

  1. Your idea, while terrible, just isn’t bad enough. Somebody else in the discussion thinks they can do better (worse). They try to one-up you. They often succeed, and it’s amazing. This sport of spouting bad ideas leads to collaboration, as one person’s idea gets picked up and added to by others.

  2. A terrible idea isn’t understood by everybody to be terrible. This often happens when there’s a wide range of experience, either in the job, or within this specific problem domain. The discussion can help spread knowledge, as a more experienced team member explains why your solution of “install head mounted GoPro cameras for auditing purposes” might not actually make your audits any cleaner.

  3. Experienced people get a new viewpoint on problems. The problems you face today may be similar to ones you’ve seen before. Trying to think of the worst possible solution forces you to deviate from your usual viewpoint, and can lead to another level of understanding. It can also lead to you reaching for tools or solutions that you’d normally not have considered.

  4. You come up with a real, legitimate solution. It’s likely one you and your team would not have arrived at without getting creative and trying to think of the worst idea. For example, choosing a Google spreadsheet[1] as the back end for an internal service. It sounds like a terrible idea. A spreadsheet isn’t really a database. It doesn’t really have a great query language, it can’t handle lots of updates per second, but it has access control, it’s a familiar interface for non-technical folks, and doesn’t require significant upgrades or maintenance.

  5. The team learns to debate and discuss ideas. This is important. Because these ideas are intentionally terrible, people don’t get offended when somebody shoots down the idea (or builds on it to come up with something worse). It helps the team learn how to debate properly. Learning how to dismantle ideas without judgment is a much healthier and more productive practice than attacking the person with the idea.

How does it work?

Bad Idea Monday doesn’t have to be a Monday, but it works well when it is. Because, let’s be honest, Mondays are the day of the week that people normally dread. There are copious jokes, cartoons, and comics about how much we all hate the first day back at the work after a nice weekend. Capitalize on Monday’s bad reputation, and use it to get your team to generate the worst possible ideas.

How do you get started? First, you need a problem. This problem could come from your ticketing system, a chat conversation, or a face-to-face discussion of something just not working the way it should. The input queue is more or less limitless. After you have a situation, don’t try to solve it — at least not the way you normally would. Turn it on its head. This doesn’t require a meeting. It can happen in any medium, and occur numerous times throughout the day.

Allow me to walk through an example.

Bad Idea Monday in practice

When Puppet Labs was moving our server-side stack from a Ruby-based solution to Clojure and JRuby, we uncovered a new set of problems. We knew we needed a JRE, but that was about all we knew. Did we need a specific JRE? Did we want to compile a JVM for the ~30 permutations of platforms supported as masters on Puppet Enterprise? Were we going to have to package it? Did we want to require that the end-user brings in libalsa because that’s what normal JVMs do?

So the fundamental problem: how do we ship/bundle a JVM to our enterprise customers? What’s the worst answer to this? We could just unzip a binary of the JVM and somehow work it into our filesystem path — that solution was rejected because it wasn’t bad enough. We could use netcat and dd for distribution, but that wasn’t that interesting enough. Then we got an idea. An awful idea. We got a wonderful, awful idea!

the grinch gets a bad idea

We ship the JVM as a gem. Rubygems allows you to compile things on the fly. Rubygems is cross platform. Rubygems is available over the network. Sure, this content wasn’t Ruby, but why should that stop us?

This is a terrible idea. Why? Well, you would need way too many dependencies. You have to have Ruby on the box already. You have to be connected to a network for a successful installation. You can’t express C-header dependencies in Rubygems. You have to have a compiler on the target system. You have to wait something like 35 minutes for the JDK to compile during a Rubygems installation. In most cases, you actually need a JVM in order to bootstrap and compile a JVM. You have to write a mkmf file to instruct the machine how to do that. At the time, signing gems was basically unheard of. You probably don’t want the JVM in your Ruby load path, but maybe you could move the files in a gem postinstall with enough finagling.

This conversation ended shortly after it started, with the team providing these counterexamples, in addition to others not covered here. We knew it was doomed. It was fun though.

We ended up shipping a version of OpenJDK that we built and optimized for our workload using the native package manager for the platforms. However, when we were dealing with some pretty hairy Ruby problems in subsequent releases, we were able to build on our knowledge of the limitations (and advantages) of the more esoteric features of Rubygems — stuff we’d looked into while identifying why it was the worst way to deliver a Java solution. When we needed to bundle some Ruby content with our distribution, that earlier discussion was extremely useful.

What did we learn from the conversation?

  • Knowledge of some of the newer (and esoteric) features of Rubygems. By the end, we’d figured out answers to questions like. What does the postinstall situation really look like? What’s the state of signing a package? What type of compiler manipulation can reasonably be done and expected on an end-user’s system?
  • Why library managers are bad general purpose package managers.[2] This may seem obvious, but it’s a good discussion for those who haven’t really thought about it.
  • Bootstrapping a JVM is a hard problem.

We also had a great time thinking of ways to bend Rubygems to our will.

The rest of the week

The team liked Bad Idea Monday so much, they created theme days for the rest of the week. I’ll walk through them quickly:

Positive Tuesday. This is a day to be positive. The original intent was to offset the perceived negativity perpetuated with bad ideas that happened on Monday, but it’s really not needed for those reasons. The thing I like about it is the ‘find something you like about it’ attitude, which sometimes can help. Everything is not always wonderful. When it’s not, at least on a Tuesday, we can try to improve our outlook by identifying the good parts (or potentially decent outcomes) of an otherwise less-than-awesome situation. This assists in scenarios where you may have lost a debate, but need to move forward. It can bolster a “disagree and commit” interaction paradigm.

Noncommittal Wednesday. Why make a decision today when you could put it off until tomorrow? I think this started as the neutral leg of to balance the bad (Monday) and good (Tuesday). Since then, this day hasn’t done much. I mean, I could tell you more about it, but I just can’t seem to commit to it.

Troll Thursday. Trolling your coworkers can be fun. We keep it pretty clean and innocent, but some days, you just have to see if you can engage the team on something ridiculous, believe some crazy story, or convince them that DECnet[3] really is the one true networking protocol. I enjoy Troll Thursday because it can be used for learning rather than simply for my own amusement. Also, I am not immune to being trolled. ABT.

FriDre. On Friday, two things happen. One, somebody will forget. Two, we will remind them. Heck, our chat bot will remind you. I’ll admit that Not Forgetting About Dre[4] is a little less fun now that he’s the first billionaire in hip hop. Nonetheless, remembering Dre is something that’s been a part of the culture at Puppet Labs for a long time — nearly as long as I’ve been on board. What purpose does it serve? Other than being fun, I have no idea. I’m even pretty sure I’m the one who decided we shouldn’t forget about Dre.

Conclusion

These theme days have made it easier for me to demonstrate three things: the team is creative, they have fun while they work, and they’re an awesome group. We have a wide variety of people, ranging from their mid-twenties to mid-forties. We have people who have worked in tech for years, and people in their first technical role. Some live the US, and at least one doesn’t. We’re not all men. We’re not all packaging geeks. In short, it’s a good mix. A big part of building this team and culture has been finding ways to keep things fun and by driving learning, even as the organization grows and faces new sets of challenges. I encourage you to take an unorthodox look at encouraging learning, management styles, and the non-technical ideas your teammates are bringing to the table — maybe you’ll find something new to dive into.

Footnotes

[1] If you’re wondering, www.puppetlabs.com/beer is backed by a Google spreadsheet.

[2] An excellent talk by Ryan McKern called “Packaging is the Worst Way to Distribute Software, Except for Everything else."

[3] http://en.wikipedia.org/wiki/DECnet

[4] This can help you remember.

Further Learning