December 11, 2010

Day 11 - A Journey to NoSQL

Written by Michael Stahnke (@stahnma)

The N00b

When I was first learning about being a Unix Admin, I just wanted to know what systems my team supported, so that when I got called at 2 AM, I could either make some weak attempt at getting online and fixing a problem (I was new...very new), or promptly help that application analyst find the correct support team pager number. It was the week before I first went into our pager rotation that I realized something was very wrong. I had no idea what systems we actually supported. I wasn't the only one.

There had recently been some form of reorganization right before I hired in at this company. What was once four teams (IBM AIX, HP-UX, Sun Solaris and Red Hat Linux), was becoming three teams (Capacity Planning, Systems Implementation and Systems operations). However, there were still other server teams at other sites, plus Unix workstation support, and some IRIX somewhere out there. The fundamental problem, though, was, "Do I have the ability to help the person who has paged my team?"

A solution...sort of

I found this state to be extremely non-desired, so I started writing a Unix server tracking system. It started out as a basic web application utilizing a MySQL back-end. It worked great. The teams loved it. They knew what we supported and what we didn't. Then, the requests for enhancement came in. I needed to add MAC addresses, world wide port names, cluster licensing terms, customer information, out-of-band management URLs, etc. This quickly grew, but I was still happy with it. We designed several workflow automations through the tool as well. However, as the tool grew larger, and less maintainable, I was starting to get extremely frustrated with it.

While problems for this application were abundant, there were two issues that made it less of an operational platform than I desired. The first problem was that in order to do any type of CRUD actions, you have to have database drivers on the client. This was a big challenge. We had an extremely heterogenous environment, multiple firewalls, and some ancient operating systems that probably couldn't have had a MySQL driver loaded on them without sacrificing some type of domesticated animal and praying to a deity that was anything but righteous.

The other problem was flexibility of schema. Each time we added a new piece of data to track, it had to be analyzed, and then added into the schema. Normalization was great for one:many and many:many relationships, but then made the SQL queries much more complex with joins or sub-queries, especially for unix admins without much or any SQL background. In short, the relational portion of the RDBMS system was in the way.

Another solution...getting warmer

I left that shop before that problem was really solved, but since I had an opportunity again at my next assignment to solve a similar problem, I decided I would try some things in a different way. My first thoughts were around putting some form of web-services infrastructure in front of a basic RDBMS backed web application. I thought that speaking HTTP would be easier than MySQL, Oracle or even DBI for most clients. I toyed with it and did some mock-ups, but I still felt like the data model was complicated and required many calls and client-side parsing to really get the data into usable formats for automation, updates, or to generate Nagios configuration, etc. It was time for something completely different.

NoSQL. It was obvious. Of course, at this time (2006) I had never heard of the term NoSQL, but looking back on it, that was the epiphany I had. If relationships are difficult to model and manage, maybe some other model would work. Then it hit me: LDAP. The LDAP container is designed for easy replication, extremely granular security controls, and availability. On top of that, those features were all there out of the box. Schemas could be programatically deployed, and many of the data model questions were things like 'should this be single-valued or multi-valued'. Those questions were quite simple when compared to joining 17 tables to see a complete system configuration in the old RDBMS I had authored. As an added bonus, using LDAP didn't introduce a new source of truth for the environment since it was in use for account management.

LDAP also had a good solution for the driver problem. We were using LDAP for user authentication, so our systems already had LDAP client libraries loaded. Even the few that didn't, the client-side libraries were readily available, even on my less-than-favorite flavors of Unix.

We modified schema, populated data by hand, and then with some simple scripts. Life was least for a while. After a couple years operating in this mode, the schema became a bit more problematic. Extending schema at will was not the greatest idea I've ever had. We also had a problem where some admins would make new objectClasses rather than extend one, or inherit from one. This led to conflicts in schema and some data integrity issues. None of it was absolutely horrible, but in the end it smelled like a chilli dog left in a desk drawer overnight.

The search continues

I had a lot of discussion about this problem with a group of my friends (and eventual business partners). We spent hours going back and forth on how to model host information and metadata and expose that information to our configuration management, monitoring, accounting, chargeback, and provisioning systems. It always came back to a discussion on discrete math: use Set Theory. The best, and possibly only sane way, to keep this data organized was to use set theory.

Luckily, we had a greenfield to play with as we forming a new company. We tried it out. We tried to not extend or customize schema for host information beyond loading in well-known IANA referenced schemas. The basic premise, obviously, is that everything can be grouped into sets. We created an OU=Sets at the top level our LDAP directory. Under OU=Sets, we created a DN of of 'set name' for example dn: cn=centos5,ou=Sets,dc=websages,dc=com is an entry in our directory. It is setup as a groupOfUniqueNames and contains the DN of each host that is in fact a CentOS 5 host. The nice thing about OU=Sets is you can just keep adding things into it, without extending schema.

It may seem a bit backward at first to have the attribute as the set name and then the host dn as the entry, but it seems to work. LDAP also allows groups within groups, so nesting works perfectly. As an example, if cn=ldap_servers,ou=sets exists, it may contain cn=ldap_write_servers,ou=sets and cn=ldap_replicas,ou=sets. Grouping in this manner allows one change to cascade through the directory.

Of course, with every good solution, there are more problems to be solved. In this case it's recursion. OpenLDAP and 389/RHDS/Fedora-DS/SunOne/iPlanet/et all don't seem to automatically recurse nested groups, though I have heard that some LDAP implementations do. Luckily, it's not that big of a problem.


In this example, I'll be looking for all LDAP servers. Our directory information tree is setup such that we have three groups:

  • ldap_write_servers
  • ldap_replicas
  • ldap_servers

The ldap_servers entry is a groupOfUniqueNames whose uniqueMembers are the other two groups. To traverse this, we'll need some recursion.

Sample Code

In my code, I most often use ruby. When working with LDAP, I've used the classic ldap bindings heavily, but recently I've really taken a liking to activeldap. Activeldap borrows heavily from the Active Record design pattern and applies it to LDAP. It is not a perfect translation of active record, but it is quite nice for most operations on a directory server.

Activeldap requires some minimal setup to be useful. You can install it with gems or your favorite package manager.

require 'rubygems'
require 'active_ldap'

class Entry < ActiveLdap::Base

  :host => '', :port => 636, :method => :ssl,
  :base => 'dc=websages,dc=com',
  :bind_dn => "uid=stahnma,ou=people,dc=websages,dc=com",
  :password => ENV['LDAP_PASSWORD'], :allow_anonymous => false)

This is a simple setup section for some code using activeldap. Require the library (and rubygems unless your environment will load them, or you installed activeldap in some other method). Then you run setup_connection. The Websages directory server requires SSL and does not allow anonymous bind, so a few more parameters are used than you might see on a clear-text, anonymous setup.

From there, it's really not very difficult to recurse through groups and find the entries.

# Returns the members of a ldap groupOfUniqueNames
def find_members(search, members = [])
  Entry.find(:all , search).each do |ent|
    # Ensure the search result is a group
    if ent.classes.include?('groupOfUniqueNames')
       # Check to see if each member is a group
       ent.uniqueMember.each do |dn|
         members << find_members(dn, members)
    # Add the results to the members array
     members <<  search
  # clean up the array before returning

The above code will find all members of a groupOfUniqueNames including entries of groups within groups.

My calling function is just:

puts find_members('cn=ldap_servers')

Another excellent feature of activeldap is that if you simple puts an activeldap object, the LIDF text for the object is displayed on standard out.

Entry.find(:all , "cn=ldap_servers").each do |h|
  puts h

Produces a simple LDIF output:

version: 1
dn: cn=ldap_servers,ou=Sets,dc=websages,dc=com
cn: ldap_servers
description: Hosts acting as LDAP Servers
objectClass: groupOfUniqueNames
objectClass: top
uniqueMember: cn=ldap_replicas,ou=sets,dc=websages,dc=com
uniqueMember: cn=ldap_write_servers,ou=sets,dc=websages,dc=com

LDAP is a good answer

Now I can basically apply set theory for system management of meta data and configuration information. At Websages, we use our LDAP directory for nearly everything and integrate it into our fact generation for puppet, our backup schedules, our controlling IRC bots, and our broadcast SMSing while acting like idiots at the bar.

So next time you're faced with storing a bunch of host information or meta-data, you might turn back to a technology that is non-relational, scales horizontally, offers extensive ACL options, and is lightweight and fast. LDAP was NoSQL before the term was coined and often loses out on today's NoSQL discussions, but it's track record is proven.

When I see the term NoSQL, I am reminded of a classic Dilbert, "I assure you, it has a totally different name."


1 comment :

Anonymous said...

Got any reading recommendations for getting your head around structuring your LDAP tree?