Enjoying Deploying: Using Ansible & Capistrano for a Smooth Process

May 25, 2017
Development, NMC, Footer

Deployment is a messy subject. Between process managers, caching, proxies, databases, etc, trimming 'it works on my machine' down simply to 'it works' is no small task.

There are a ton of ways to go about deploying responsibly - I use git hooks for smaller personal projects, and store my database dumps in the repository itself. Since I'm the only maintainer for these sites, I make database, CMS, and code changes at the same time locally, and then can deploy with one push to get an exact live copy of what I see on my machine.

For NMC as a whole, though, this process isn't viable. When the roles of developer and content editor are decoupled, as they most often are, the two jobs are at odds. The staging or production database is more accessible, and being able to post content immediately to your blog is the whole point of using a content management system - therefore, production is king. There is some debate around whether developing against live data is good practice - in many cases, production data is sensitive and should be isolated from devs. Here at NMC, however, our policy is to develop as closely as possible to the live environment, and that means pulling down realistic data in cases where data sensitivity is not a concern.

So – we need to test code before we deploy to live, but we prefer testing with live data, and our testing should happen in an environment that is as close to production as possible. How do we do it?

Pairing Ansible & Capistrano in NMC's Deploy Process

Let's start with Ansible.

Ansible is all about automated environment duplication. The concept is simple but powerful: author a 'playbook' with a list of steps that you'd like automated, and let Ansible handle the boring stuff. In the context of deployment, we use Ansible to provision our local development environments and our production environments, to ensure that everything from nginx configurations to php settings to databases are repeatable and predictably created and configured. Each of our projects has a dedicated set of playbooks, which are applied whenever our VMs are provisioned, or code is pushed to staging or production (it's important to note that these playbooks are idempotent by design - more on that later).

Automating these steps gets us closer towards one of our main goals - identical dev and production environments. We're trying to ensure that when we develop on a local virtual machine, we're getting results that will be consistent across each dev's local virtual machine, along with the staging and production servers.

Here's a development playbook at a glance:

- hosts: 127.0.0.1
connection: local
become: yes
vars:
app: "APP_NAME"
domain: "APP_NAME.nmc"
ssl: true
devMode: true
env:
"MYSQL_HOST": 127.0.0.1
"MYSQL_PORT": 3306
"MYSQL_USER": "orbeck_of_vinheim"
"MYSQL_NAME": "sorceries"
php:
"max_execution_time": 60
"max_input_time": 60
"memory_limit": 128M
"post_max_size": 50M
"upload_max_filesize": 50M
"max_file_uploads": 2
"timezone": "UTC"
"opcache_memory": 64
"opcache_max_files": 4000
user: vagrant
group: www-data
roles:
- mysql_db
- website
#- { role: wordpress }
#- { role: craft }

The playbook syntax is written in yaml, and is simple to follow - this playbook is for our VM.

First, we're setting variables that will be used later down the line in the various 'roles' that the playbook will run. The roles - mysql_db and website, located towards the bottom - are defined in further yaml files. The mysql_db role does what it says on the tin: handles the creation of both a database and user for the project (if they don't already exist). The website role installs composer dependencies, puts the php and nginx configuration templates into place, and restarts/reloads those services as necessary. We even have dedicated roles for extra steps depending on the type of project we're creating (Wordpress, Craft, etc).

Ansible includes modules for common tasks, like file system manipulation and database installation. Here's a snippet from our mysql_db role, for example:

---
- name: Create db
mysql_db: name=sorceries state=present
notify:
- restart mysql
- name: Create db user and grant privs
mysql_user:
name: "{{ user }}"
password: "{{ pw }}"
priv: "{{ db }}.*:ALL,GRANT"
host: "%"
state: present
notify:
- restart mysql

Easy! We define a database name and a state, and then a user, settings, and state.

Similarly, we also have playbooks for our staging and production servers. Just as the development playbooks are run whenever our VM is provisioned, these playbooks are run whenever code is deployed. The other benefit to this approach is that environment configuration becomes easily accessible, and a part of the repository. Changes to the environment become as simple, testable, and deployable as changes to the code base.

So now we've covered how Ansible enables us to automate our environments - but what about the actual act of deploying? And how do we tackle the challenge of developing locally, but against live data?

Enter the other half of our dynamic duo: Capistrano.

Capistrano is also useful for task automation, but we use it primarily to handle deployment and remote code execution on our staging and production servers, through simple ruby tasks. Everything Capistrano tasks do is handled over SSH. The idea is to write a set of steps to handle deployment, and then parametrize the steps for different environments (called 'stages' in Capistrano jargon).

Capistrano's deploy task handles a lot under the hood - when given variables such as these:

set :application, 'application_name'
set :application_type, 'custom'
set :repo_url, "git@gitlab.com:user/application.git"
set :deploy_to, "/var/apps/#{fetch(:application)}"
set :scm, :git
set :keep_releases, 5

Capistrano will deploy to the remote server (defined elsewhere), pulling the specified repo into the :deploy_to location, under the assumption that the repo is a git repository. The last setting, :keep_releases, is for some of Capistrano's secret sauce - when it does a deploy, capistrano keeps track of several 'releases' - previous deploys. The up-to-date, 'live' release ends up in a 'current' directory by default, and older ones are kept in adjacent directories labeled by date. This is tied into another one of Capistrano's advantages - rollbacks.

Deploys are done like so:

cap [stage] deploy. 

But what if a new release breaks in the live environment, for whatever reason? Simple -

cap [stage] deploy:rollback. 

This task takes care of re-linking the previous release, restoring your app to its last (presumably) working state, so downtime is minimized.

Another cool feature, and one that illustrates how Capistrano and Ansible work together, is hooks - hooks can be used to define whether additional tasks should happen before or after a task is run. For example:

after "deploy:finished", "Ansible:apply"

Straightforward, right? After a deploy happens, run the project's Ansible playbook.

Practically, what that means is a new release of code is pulled from the repository and placed into the 'current' release directory on deploy. Then, Ansible takes care of installing composer dependencies, project-specific php and nginx configurations, creating and databases or users, etc as defined in the playbook.

I'm merely scratching the surface here - Capistrano is extremely powerful, but it's a deep magic, especially for those unfamiliar with Ruby. Luckily, the documentation is quite good.

Beyond automating deploys, Capistrano tasks can also handle one-off tasks on remote servers. Remember that issue with needing to develop locally against live data? To tackle this, we wrote a Capistrano task that's called like so:

cap [stage] db:pull

This command does two things - first, it dumps, zips, and downloads the production database to the local project directory. Then, assuming the VM is running, it installs it into the VM's mysql server. Couldn't be simpler.

Another example,

cap [stage] drupal:files:pull

Pulls down drupal uploads and puts them in place locally on our VM -  we also have a similar task for pulling down wordpress uploads. Again, the idea is to simulate the production environment as closely as possible, so that our code works as expected no matter where it's running. The other bonus, of course, is that when a bug is reported on the live site, simulating the live environment this closely makes diagnosing issues much simpler and more efficient.

Closing Thoughts

 

Remember - the implementations above are just an illustration for how we choose to use these tools. We've got a great development pipeline here at NMC that works for our needs, but our needs aren't everyone's needs. Hopefully, these examples have whet your appetite for some DevOps improvement or experimentation, and piqued your interest in one or both of the tools I've discussed. They're both really powerful, and can and should be applied, with or without other tools, in whatever way suits your needs. Whatever your situation, though, it'd be a mistake to overlook them.

Comments

Dee's avatar
Dee
Automation is the act of taking a set of manual tasks and writing a program to perform them in a fraction of the time. As a programmer, I believe that if you are performing the same tasks again and again then they should probably be automated by a script. At Enigma we are constantly needing to deploy new projects or updates to projects to our servers so the decision to automate these tasks seems like a no-brainer.

Leave a comment