The answer came: we should use both.
At Botify, the infrastructure relies on 2 core principles: immutability and blue / green deployment.
Immutability means that once you’ve built something, you never change it. For every deployment, we built images of our virtual machines from scratch, then deploy them. If something gets wrong on a machine, we trash it and replace it with a new one, launched from the same image.
Immutability means longer builds, but it also means more consistency, no upgrade conflicts, no forgotten virtual machine still running the old code. In a word, immutability means no alarms and no surprises.
Blue / green deployment means that for every deployment, we build the new infrastructure (the green one), then switch from blue to green. Blue / green deployment is very powerful because it allows to rollback if something gets wrong.
To ensure a perfect deployment, we need to ensure that a build goes well from start to end, and that the build virtual machines reach the desired state before making the image.
That’s where you realize choosing between Puppet and Ansible gets tricky.
Puppet works in 2 mode. A daemon or crontab launched script can query a server, the Puppet Master, for new updates, or you can run Puppet standalone and call the modules you need. Since we’re building immutable machines, using Puppet master is useless as we never update the machines state.
Puppet is literally a state machine. It tries to reach the most complete possible state, even though it fails here ans there. To achieve this, it orders the tasks the best it can, even though it may, at some point, reach an inconsistent state. To avoid this, its DSL provides a dependency system that works quite well.
This is both a great and a real problem. Puppet won’t stop if something fails. It will just skip everything depending on what just failed and goes further. When you start automatizing your builds, this is critical as there are no ways to check what went wrong, and you can easily build inconsistent, buggy virtual machines.
To avoid this, you either keep an eye on your build, hoping you won’t miss a single error, or you rely on things like Server Spec. Server Spec looks like Ruby Rspec: it provides a humanly readable language to test your server state.
Unfortunately, that sucks. Really. First, you write a complete description of the state your machine should achieve with Puppet, then you write another complete description of the state if should have achieved thanks to Puppet. There’s something wrong here: you can’t rely on Puppet to achieve the state you want since it won’t stop when something goes wrong.
Then, you have Ansible.
Ansible is very different from Puppet. Ansible is not a state machine, as it has no state notion. Ansible runs a sequential series of tasks, and stop when it fails. This is great in many ways, as you know for sure when something got wrong. There is no need of a dependency system, since Ansible just runs the tasks one after the other: install a package, push a file…
Instead of a master / slave architecture, Ansible runs with a concept of inventory: machines belong to groups, groups depend on
roles, and the
roles include one or many tasks. Tasks are ran sequentially on every host of the inventory. That part is awesome when you maintain a bunch of machines, but is totally useless when going immutable. If a task fails on a host, Ansible stops processing that host but keeps working on the others, which is exactly what you’d expect too when running immutable.
Edit: Fixed this: useless applied to inventory, not, about the fail bit when going immutable, thank you @laserllama for noticing
Just like Puppet, Ansible has its goods and bad sides. Since you don’t reach a state, using Server Spec to check if you reached the desired state is almost mandatory. But at least, you know when something went wrong.
So, why would you need both of them?
From my experience, 99.5% errors in Puppet comes from package installation. Either a package does not exist in the required version anymore, or Node.js index is down once again, or Pypi timeouts…
Because of that, and because of Puppet main limitation, package installation should not be done by Puppet, but by something else. Puppet is great at managing configuration, when it has everything it needs. Indeed, you can still ensure package XYZ is installed before running the configuration part, but you should not let Puppet install it.
Until now, I’ve mostly been using Ansible for EC2 orchestration. Ansible has a bunch of nice AWS modules (I’ve contributed to some of them) to help building a new platform: start an instance, build an AMI, create a security group, a launch config or an autoscaling group…
I’m more and more thinking to move the whole install part, which is managed by Puppet, into Ansible to ensure that missing consistency. I’d then probably add a Docker layer somewhere to make the new machines build faster as some parts don’t move that often. Booting a new machine would then download the Docker images it needs, limiting even more the risks of errors by rebuilding only small parts of them.