Tuesday, 29 July 2014

Why We Started With Chef

I like to think that we are reasonably mature with our release processes at TAL, we release to production regularly (every 2 - 4 weeks depending on the application), our code is built once and deployed many times, and it has been a while since we have had to roll back a production release.

All that said, it is still more painful than it should be, more could be automated, our outage windows could be reduced further, and our management of configuration items (web.config etc) isn't great. We are continually improving most of these, and I have no doubt that with the introduction of Octopus Deploy (coming soon) we'll get a nice jump ahead on these issues.

One thing that isn't done well though, and we haven't had an answer to for quite a while, is server configuration management. For base server config, patches, service packs etc. our services team do an excellent job at keeping things up to date and consistent across the fleet. But for things like enabling MSMQ, web roles, applying permissions, and changing system default configurations, we apply them manually to each VM we spin up and use.

This means that when we need a QA, Dev, or other environment, we spend at least a day setting up the VM and installing/configuration, attempting to run our app, realising something was missed, installing/configuring, repeat.

It also means that we don't have an inventory of what customisations are required for our servers before they are fit for our apps to run successfully on them. We have a wiki page which is manually updated (when people find issues AND remember to update it), but anything manual like that inevitably ends up out of sync eventually.

We did some proof of concept work a little while ago with both Chef and Puppet, and for us Chef came out in front. Our latest work has got us to the stage where we can get a workable VM set for one of our apps up and running in well under an hour. That is an awesome development, and we are super excited.

We do still have a few things to sort out. For starters we are guilty of most of these anti-patterns. We are trying to move away from these and follow The Berkshelf Way, utilising Test Kitchen, ChefSpec, etc. This has led to a long list of things to learn and try out including Vagrant and Packer. There is also a whole lot of frustration around getting all of this working with Windows as most of the doco is very Linux focused.

We are confident that we'll find the process that suits our team soon, and even if it isn't perfect, we are still way ahead of where we were, and incremental improvement is much better than no improvement, some people may even argue that incremental improvement is the best kind.


No comments:

Post a comment