sysadvent: Day 20 - Distributed configuration data with etcd

Written by: Kelsey Hightower (@kelseyhightower)
Edited by: Ben Cotton (@funnelfiasco)

Intro

I’ve been managing applications for a long time, but I’ve never stopped to ponder why most application configurations are managed via files. Just about every application deployed today requires a configuration file stored in the correct location, with the proper permissions, and valid content on every host that runs the application.

If not, things break.

Sure configuration management tools provide everything you need to automate the process of constructing and syncing these files, but the whole process is starting to feel a bit outdated. Why are we still writing applications from scratch that rely on external tools (and even worse, people) to manage configuration files?

Think about that for a moment.

The state of application configuration seems a bit stagnant, especially when compared to the innovation happening in the world of application deployment. Thanks to virtualization we have the ability to deploy applications in minutes, and with advances in containerization, we get the same results in seconds.

However, all those application instances need to be configured. Is there a better way of doing this, or are we stuck with configuration files as the primary solution?

Introducing etcd

What is etcd? Straight from the docs:
A highly-available key value store for shared configuration and service discovery. etcd is inspired by zookeeper and doozer, with a focus on:

Simple: curl'able user facing API (HTTP+JSON)
Secure: optional SSL client cert authentication
Fast: benchmarked 1000s of writes/s per instance
Reliable: Robustly distributed using Raft

On the surface it appears that etcd could be swapped out with any key/value store, but if you did that you would be missing out on some key features such as:

Notification on key changes
TTLs on keys

But why would anyone choose to move configuration data from files to something like etcd? Well for the same reasons DNS moved away from zone files to a distributed database: speed and portability.

Speed

When using etcd all consumers have immediate access to configuration data. etcd makes it easy for applications to watch for changes, which reduces the time between a configuration change and propagation of that change throughout the infrastructure. In contrast, syncing files around takes time and in many cases you need to know the location of the consumer before files can be pushed. This becomes a pain point when you bring autoscaling into the picture.

Portability

Using a remote database of any kind can make data more portable. This holds true for configuration data and etcd -- access to configuration data stored in etcd is the same regardless of OS, device, or application platform in use.

Hands on with etcd

Lets run through a few quick examples to get a feel for how etcd works, then we’ll move on to a real world use case.

Adding values

curl -X PUT -L http://127.0.0.1:4001/v2/keys/url -d value="db.example.com"

Retrieving values

curl -L http://127.0.0.1:4001/v2/keys/url

{
   "action":"get",
   "node":{
   "key":"/url",
   "value":"db.example.com",
   "modifiedIndex":1,
   "createdIndex":1
  }
}

Deleting values

curl -L -XDELETE http://127.0.0.1:4001/v2/keys/url

That’s all there is to it. No need for a database library or specialized client, we can utilize all of etcd’s features using curl.

A real world use case

To really appreciate the full power of etcd we need to look at a real world example. I’ve put together an example weather-app that caches weather data in a redis database, which just so happens to utilize etcd for configuration.

First we need to populate etcd with the configuration data required by the weather app:

/weather_app/city
/weather_app/interval
/weather_app/redis_url
/weather_app/weather_url

We can do this using curl:

curl -XPUT -L http://127.0.0.1:4001/v2/keys/weather_app/city -d value="Portland"
curl -XPUT -L http://127.0.0.1:4001/v2/keys/weather_app/interval -d value="5"
curl -XPUT -L http://127.0.0.1:4001/v2/keys/weather_app/redis_url \
  -d value="127.0.0.1:6379"
curl -XPUT -L http://127.0.0.1:4001/v2/keys/weather_app/weather_url \
  -d value="http://api.openweathermap.org/data/2.5/weather"

Next we need to set the etcd host used by the weather app:

export WEATHER_APP_ETCD_URL="http://127.0.0.1:4001"

For this example I’m using an environment variable to bootstrap things. The prefered method would be to use a DNS service record instead, so we can avoid relying on local settings.

Now with our configuration data in place, and the etcd host set, we are ready to start the weather-app:

 ./weather-app 
2013/12/18 21:17:36 weather app starting ...
2013/12/18 21:17:37 Setting current temp for Portland: 30.92
2013/12/18 21:17:42 Setting current temp for Portland: 30.92

Things seem to be working. From the output above I can tell I’m hitting the right URL to grab the current weather for the city of Portland every 5 seconds. If I check my the redis database, I see that the temperature is being cached:

redis-cli 
redis 127.0.0.1:6379> get Portland
"30.916418"

Nothing too exciting there. But watch what happens if I change the value of the /weather_app/city key in etcd:

curl -XPUT -L http://127.0.0.1:4001/v2/keys/weather_app/city -d value="Atlanta"

We end up with:

./weather-app 
2013/12/18 21:17:36 weather app starting ...
2013/12/18 21:17:37 Setting current temp for Portland: 30.92
2013/12/18 21:17:42 Setting current temp for Portland: 30.92
2013/12/18 21:17:48 Setting current temp for Portland: 30.92
2013/12/18 21:17:53 Setting current temp for Atlanta: 29.84
2013/12/18 21:17:59 Setting current temp for Atlanta: 29.84

Notice how we are now tracking the current temperature for Atlanta instead of Portland. The results are cached in Redis just as expected:

redis-cli 
redis 127.0.0.1:6379> get Atlanta
"29.836407"

etcd makes it really easy to update and watch for configuration changes; then apply the results at run-time. While this might seem a bit overkill for a single app instance, it’s incredibly useful when running large clusters or when autoscaling comes into the picture.

Everything we’ve done so far was pretty basic. We used curl to set some configuration, then had our application use those settings. But we can push this idea even further. There is no reason to limit our applications to read-only operations. We can also write configuration data to etcd directly. This unlocks a whole new world of possibilities. Web applications can expose their IP addresses and ports for use by load-balancers. Databases could expose connection details to entire clusters. This would mean making changes to existing applications, and perhaps more importantly would mean changing how we design new applications. But maybe it’s time for AppOps -- lets get out of the way and let the applications configure themselves.

Conclusion

Hopefully this post has highlighted how etcd can go beyond traditional configuration methods by exposing configuration data directly to applications. Does this mean we can get rid of configuration files? Nope. Today the file system provides a standard interface that works just about anywhere. However, it should be clear that files are not the only option for modern application configuration, and viable alternatives do exist.

December 20, 2013

Day 20 - Distributed configuration data with etcd