Using Vagrant in Production

After presenting on 'Fearless Development with Drush, Vagrant and Aegir' at DrupalCamp Montreal, I was approached about a rather interesting project. It involved synchronizing data between a couple dozen Drupal sites running on Windows systems. To be able to better manage the Drupal components on these systems, we decided to deploy Aegir on Vagrant-based virtual machines (VMs). While Vagrant is generally considered a tool for development and testing, in this case, it made sense to use it in a limited production setting.

We're only part way into this project, so there'll likely be a follow-up post describing the details of data synchronization component. But so far, we've implemented a fairly robust technique for managing lots of remote systems, automatically deploying new Drupal code-bases, and updating applications (sites) accordingly.

Background

The company in question maintains a fleet of a couple dozen mobile laboratories mounted in trailers. These are dispatched to client sites, which are often very remote. This can result in partial or complete interruption of Internet connectivity for days or weeks at a time. However, these satellite labs need to be able to synchronize data regularly with a central hub.

The data in question is not that produced by the lab equipment itself, which can grow well into the gigabytes, and gets transported via large-capacity removable media. Rather it is information about this data, which tends to be relatively small, on the order of a few MBs. Due to their frequent lack of connectivity, this meta-data will need to be transported via USB key, and sync'ed with the hub manually.

Both the hub and the application collecting this meta-data are built on Drupal. The hub functions essentially as an ERP for them, collecting data on the work they're doing with their various clients. One challenge we faced, however, was that the mobile labs are equipped with Windows PCs, making for a less-than-ideal hosting environment. Another challenge was that the operators of these systems, while experts in their field, could not be relied on to install or debug these applications, and so just about everything had to be fully automated.

Enter Vagrant, YAML and Puppet

While not quite as full-featured on Windows (e.g., limited availability of ssh), Vagrant provides a relatively simple command-line interface, and native support of provisioning tools such as Puppet. For those familiar with Ruby, Vagrant's configuration file, a Vagrantfile, is a great way to centralize project-specific setup. On the other hand, Ruby is fairly foreign to most Drupal developers, who are generally more comfortable with PHP.

A Detour into YAML

Thankfully, I already had quite a bit of experience wrapping Vagrantfiles in simpler configuration formats. With Drush Vagrant Integration, I had maintained a Ruby-based config file, but tried to simplify it to look more like the PHP .ini style. But even for drush-vagrant, I had been thinking of moving towards YAML-based config files, largely inspired by it's use in Ariadne, the Configuration Management Initiative's adoption in Drupal 8.

hostname: satellite.local
ip_address: '192.168.33.34'

YAML is great, since it's about as straight-forward as you can get. In addition, it can be annotated or documented with inline comments, and easily extended to support more complex data structures. Config files such as these are compiled at the beginning of the Vagrantfile for this project:

  1. require "yaml"
  2. current_dir = File.dirname(__FILE__)
  3. if File.directory?("#{current_dir}/active")
  4. Dir.chdir("#{current_dir}/active")
  5. config_files = Dir.glob("*.yml")
  6. vms = {}
  7. config_files.each do |config_file|
  8. vms.update({ config_file.sub('.yml', '') => "#{current_dir}/active/#{config_file}"})
  9. end
  10. if config_files.empty?
  11. vms.update({ 'hub' => "#{current_dir}/configs/hub.yml"})
  12. vms.update({ 'satellite' => "#{current_dir}/configs/satellite.yml"})
  13. end
  14. end

First, we include the YAML library, and ensure that paths are relative to the project root. Then, we can our active/ directory for YAML config files. We initialize an empty hash onto which we can push our config files and then build up a list of the VMs we'll provision, along with the paths to their config_files. Since this is intended to function both in production and for development , we provide defaults for development environments.

The entire Vagrant project is under Git version control, but the contents of the active/ directory are ignored (via a .gitignore). The idea here is that a config file for each of the satellites can be maintained in the configs/ directory. Then on each PC, a symlink (or 'shortcut', I suppose, on Windows) can be created in the active/ directory pointing to the relevant file in the configs/ directory. This lets us maintain a single repository containing the configuration for all the satellites, making it easy to build dev and test environments with any combination we like.

Back to Vagrant

The Vagrant configuration itself becomes pretty straight-forward:

  1. Vagrant::Config.run do |config|
  2. config.vm.box = "debian-LAMP-2012-03-29"
  3. config.vm.box_url = "http://ergonlogic.com/files/boxes/debian-LAMP-current.box"
  4. vms.each do |vm,config_file|
  5. yml = YAML.load_file config_file
  6. hostname = yml['hostname']
  7. ip_address = yml['ip_address']
  8. config.vm.define vm do |vm_config|
  9. vm_config.vm.host_name = hostname
  10. vm_config.vm.network :hostonly, ip_address
  11. vm_config.vm.provision :puppet do |puppet|
  12. puppet.manifests_path = "manifests"
  13. puppet.manifest_file = "site.pp"
  14. puppet.module_path = "modules"
  15. puppet.facter = { "fqdn" => hostname }
  16. end
  17. end
  18. end
  19. end

First, we add settings common to all VMs, in this case just the base boxes (borrowed from drush-vagrant/aegir-up). Then we iterate over the list of VMs we built above, applying VM-specific configuration loaded from our YAML config files. So far, these are only used to set our IP address and hostname, but can be extended to support arbitrary settings.

On to Puppet

At this point, we get to the Puppet provisioning. I've used a pretty standard layout for Puppet manifests, and any Puppet modules dropped into the project's modules/ directory automatically become available within those manifests. Since we're running a Debian VM, we'll dropped the 'apt' and 'common' modules into modules/, along with the 'drush' and 'aegir' modules to manage the Drupal-based application.

Note the Facter fact that sets the FQDN. This is due to an oddity with Factor's method of determining a system's fully qualified domain name. Rather than run 'hostname -f', it queries resolv.conf, and so can end up with a value other than the one set with Vagrant's vm.host_name = hostname. Facter is an incredibly powerful way of injecting what are essentially global variables into Puppet, and so it'll probably be worthwhile to add support for arbitrary custom 'Facts', as we'd done in Drush Vagrant.

The base box already has a LAMP stack installed, but this could still be managed via Puppet, or a custom one built using Veewee. So far, though, I've been focusing on the Aegir-based Drupal site and platform management. So, the first thing to do in our Puppet manifests is to define our 'nodes' .

node "hub" {
  include client::hub
}
node default {
  include client::satellite
  client::platform {'Drupal7':
       makefile => '/vagrant/makefiles/d7.make',
  }
  client::site {'www.example.com':
    platform => 'Drupal7',
  }
}

So here we see two node definitions: one for the 'hub', and the catchall 'default'. This latter one will match any node without a specific definition, including all our satellites. We can then add client-specific settings in their YAML config files.

The hub's config should be shared by the actual production system (probably an AegirVPS git-cron deployment, or similar setup) and our dev environments. While our satellites will be running within Vagrant, whether is development or production environments, our hub will not. So, the YAML config file won't be run in production for the hub, and should thus be absolutely minimal in dev.

We'll take a more detailed look at the custom puppet classes, resource definitions and facts shortly. From a high level though, we've defined a Drupal platform based on custom makefile included in the project itself. As we'll see in the definition of client::platform, adding a Git tag of the pattern 'platform_<release>' is all that's required for Puppet to build an updated platform on its next run.

Similarly, client::site specifies a site to be built, on the platform we just added. Whenever the platform gets updated, the site itself is migrated to the new platform. This essentially allows us to tag the commit of a updated makefile, and have the applications running on all the remote systems automatically updated.

Note that we're providing a path to the makefile that is in the vagrant/ directory, which is how our project directory gets shared into the VM automatically by Vagrant. First off, a default for this is provided in client::platform, but I wanted to be explicit about it here. This wouldn't work in production for a hub though, as it won't be running inside a Vagrant VM. However, in our Vagrantfile, we can specify other shared directories, or even different mount points for the same directory. So, instead of pointing to a makefile in '/vagrant/makefiles/', we could point it to '/etc/aegirvps/makefiles', or wherever else we like; thus allowing the same Puppet config to work in both production and dev/testing.

A bit of Facter & Git magic

For a closer look at how we accomplish this, we'll need to dive into the custom 'client' Puppet module. It all starts with a custom fact:

  1. Facter.add("platform_tag") do
  2. setcode do
  3. Dir.chdir("/vagrant")
  4. all_tags = Facter::Util::Resolution.exec("git tag -l 'platform_*'")
  5. all_tags.split("\n").last.sub('platform_', '')
  6. end
  7. end

Now, I started programming in Ruby relatively recently, but I have to say that, after the verbosity of PHP, Ruby's compactness is refreshing. Anyway, first we go to the project root, and get all the tags from the git repo. Again, for this to work on the hub in production, we'll need to change this path. Next, we split the text, which is just one big chunk, into an array, at the ends of lines. We then return the last (i.e. most recent) one, and remove the prefix. This gets us the release tag we'll then use to build our platform, and trigger site updates.

Then comes Aegir

  1. class client {
  2. $aegir_version = "6.x-1.9"
  3. class{'aegir':}
  4. class{'aegir::queue_runner':}
  5. }
  6. class client::hub {
  7. include client
  8. }
  9. class client::satellite {
  10. include client
  11. }

So far we aren't doing much here. We're installing Aegir, and the queue-runner, via the common 'client' class, and that's it. As this system matures, though, we'll probably want to move the configs from the 'default' node in manifests/nodes.pp into the client::satellite class, and possibly likewise for the hub. If we go with a common platform for satellites and the hub, then we can move client::platform into the 'client' class. Site names for satellites will likely end up being set in their YAML config file, whereas we can check against a fact (in site.pp) to determine whether we're in a production or development environment for the hub:

  1. $hub_dev = $virtual ? { 'virtualbox' => true, default => false }

In client::platform we provide a thin wrapper around aegir::platform.

  1. define client::platform (
  2. $makefile = '/vagrant/makefiles/d7.make') {
  3.  
  4. aegir::platform {"${name}-${platform_tag}":
  5. makefile => $makefile,
  6. require => Class['aegir::queue_runner']
  7. }
  8. }

We take a makefile as an argument, which we, in turn, pass on to aegir::platform. We also provide a default makefile, to simplify usage. Then we reference our platform_tag fact to construct a unique platform name. When the fact changes due to a new 'platform_XXX' tag being set, we automatically build a new platform. This is due to how aegir::platform uses a 'creates' parameter to avoid trying to rebuild the same platform. We'll see this again, directly, in client::site.

Makefiles can get very large, and by default Puppet will time-out after 5 minutes without feedback from a resource. If the build times out, drush make continues to build the platform, which will then block further attempts, since the target directory will already exist (again, checking that creates parameter). Since the import of the platform from Aegir's backend to the web-frontend subscribes to that 'drush make>/a>' exec<code> resource, it won't run either. To overcome this, should it ever become a proeblem, we can extend the time-out, or disable it entirely, by providing a 'build_timeout' parameter to <code>aegir::platform.

Finally, we come to the site installation and updates themselves:

  1. define client::site ( $platform ) {
  2. $platform_name = regsubst("${platform}-${platform_tag}", '[!\W]', '', 'G')
  3. $platform_alias = "platform_${platform_name}"
  4. Exec { path => [ "/bin/", "/sbin/" , "/usr/bin/", "/usr/sbin/" ],
  5. cwd => '/var/aegir',
  6. user => 'aegir',
  7. group => 'aegir',
  8. environment => "HOME=/var/aegir",
  9. logoutput => true,
  10. }

First off, we pass in a platform name, so we know where to install the site, and when and where to migrate it during updates. I think it'd be worthwhile to add a 'profile' parameter, to control which profile a site gets installed with, but we're hard-coding this for now. Also, adding an 'update' parameter, would allow us to flag whether a site should be updated automatically or not.

Next, we clean up the platform name for use as an alias, in the same manner as Aegir does it in hosting_platform_insert()

Then we provide a bunch of defaults required whenever we run Drush commands as the 'aegir' user. Of particular importance is environment => "HOME=/var/aegir", since this helps Drush find all the aliases Aegir uses to store data about platforms, sites, &c. I also have logging on by default, since we're still in development. In the long run, I'd probably swithc this to 'on_error', and add verbose logging to a file throughout, by adding '--debug' and stream redirection to all the exec calls to Drush that follow.

Site Installation & Import

Aegir's backend, Provision, provides us with all of the Drush commands we need to accomplish the installation of a Drupal site:

  1. exec {"Provision-save ${name}":
  2. command => "drush --debug --context_type='site' --master_url='http://${fqdn}' --uri='${name}' --db_server='@server_localhost' --platform='@${platform_alias}' --language='en' --profile='standard' --client_name='admin' provision-save '@${name}'",
  3. creates => "/var/aegir/.drush/${name}.alias.drushrc.php",
  4. require => Client::Platform[$platform],
  5. }
  6. exec {"Provision-install ${name}":
  7. command => "drush @${name} provision-install",
  8. creates => "/var/aegir/platforms/${platform_name}/sites/${name}",
  9. refreshonly => true,
  10. subscribe => Exec["Provision-save ${name}"],
  11. }
  12. exec {"Hosting platform verify ${platform_name} (for install)":
  13. command => "drush @hostmaster hosting-task @${platform_alias} verify",
  14. refreshonly => true,
  15. subscribe => Exec["Provision-install ${name}"],
  16. }

First, we create our site context, which generates a Drush alias file used to store data about the site, including its URI and the platform on which it is to be installed. Note that we ensure it runs after the platform is create by having it require => Client::Platform[$platform]. And it won't run on subsequent Puppet runs, since it'll check for the existence of the alias file it creates first.

Next, we trigger the actual install itself, which creates a database, vhost, &c. The site is actually fully functional after this point. Here we have it checking for the existence of the site's directory, so as not to try and re-install an existing site. In addition, we've added 'refreshonly' and 'subscribe' parameters. These ensure that the installation will only be triggered following the creation of the site's context file. That is, only once.

Finally, we 'verify' the platform in the front-end (hostmaster) to trigger the import of our site. Strictly speaking, the Aegir front-end isn't required for most of the to work. However, it provides a simple(r) interface, should there be a need to intervene manually. We use the same refreshonly/subscribe technique to only run this import after initial site installation.

Site Migration & Update

  1. exec {"Provision-migrate ${name} to ${platform_name}":
  2. command => "drush @${name} provision-migrate @${platform_alias}",
  3. creates => "/var/aegir/platforms/${platform_name}/sites/${name}",
  4. require => Exec["Provision-verify ${platform_name} (for install)"],
  5. }
  6. exec {"Provision-verify ${platform_name} (for migration)":
  7. command => "drush @hostmaster hosting-task @${platform_alias} verify",
  8. refreshonly => true,
  9. subscribe => Exec["Provision-migrate ${name} to ${platform_name}"],
  10. }
  11. exec {"Hosting-import of ${name}":
  12. command => "drush @hostmaster hosting-import @${name}",
  13. refreshonly => true,
  14. subscribe => Exec["Provision-verify ${platform_name} (for migration)"],
  15. }
  16. }

Here, we migrate the site to a new platform, if there's a one available. We check against the same file in the 'creates' parameter that we do with the site installation. But in this case, we don't provide a 'refreshonly' parameter, as we'll want it to always check for the existence of that file. Since the $platform_name is built using the $platform_tag fact, a new tag will have this pointing to a file that doesn't yet exist, and thus trigger the migration.

Finally, to have the front-end recognize this change, we need to trigger a platform verification, and re-import the site context anew. With that the site is now fully migrated to the new platform, and thus running on a new code-base. The plan is to build the Drupal site around features and its install profile, as per Miguel Jacq's classic article Drupal deployments & workflows with version control, drush_make, and Aegir. The only piece missing here is a post-migrate hook to run 'drush -y features-revert all' on the site, so that it picks up any changes made to the profile's features.

Pulling it all together

Once we get towards running this system in production, we'll need to add some cronjobs to regularly pull down changes from the Git repo, and run 'puppet apply' to actually trigger all this. Since cron also supports automatically sending email when there's output from the commands it's running, we can essentially confirm that the updates are proceeding as expected.

These systems were put in place to support a data sync'ing effort, which is the phase of the project that we're moving into now. The original plan was to have two import/export systems, one direct for when systems are connected online, and one via USB upload. This would have involved maintaining separate mechanisms for import, which would have to kept in sync with changes to the application itself. So instead, we're looking at View Data Export as a way to dump these data files regularly, probably to XML. These can be called from Drush via an additional cronjob. If the system is online, then we can sync it to the hub via rsync over SSH. Otherwise, they can fallback to the USB method, and just SFTP the relevant file(s). Then we'll probably use Migrate to handle the data import.

There will sometimes be a need to 'seed' the satellites with data relevant to a client project prior to dispatching them to the field. But more regularly, we'll be looking for a way to reliably import the data exported from the remotes. This will include de-duplication of any seed data, and possibly other cleansing.

I'll post a follow-up with more details on this data synchronization component, once it's more fleshed out.