Using gems with Chef

Chef, Gems, Ruby Posted on

Installing gems with Chef is relatively painless. Most of the time, you can use the gem_package resource, which behaves very similarly to the native package resource:

gem_package 'httparty'

You can even specify the gem version to install:

gem_package 'httparty' do
  version '0.12.0'
end

You may have also seen the chef_gem resource. What's the difference?

The chef_gem and gem_package resources are both used to install Ruby gems. For any machine on which the chef-client is installed, there are two instances of Ruby. One is the standard, system-wide instance of Ruby and the other is a dedicated instance that is available only to the chef-client. Use the chef_gem resource to install gems into the instance of Ruby that is dedicated to the chef-client. Use the gem_package resource to install all other gems (i.e. install gems system-wide).

In short - if you want Chef to use it, use chef_gem, otherwise use gem_package. Most of the time, you will want to use gem_package, unless the gem you are installing will be used by Chef core, such as gems that are used in libraries or heavy-weight resources, gems used in report handlers, or extensions to Chef itself (like Chef Sugar).

It is not uncommon to require the use of a third-party gem in a Chef library or heavy-weight resource. For example, you may desire the use of Nokogiri to parse XML or HTTParty to easily make web requests. But installing and then using these gems is often a hassel and "chicken-and-egg" problem. In order to use the resource, the gem must be installed on the system. In order for the gem to be installed, a particular recipe must be executed. And things get exponentially more complex with the gem installation requires native extension.

Let's say you have a very simple library that parses some XML:

require 'nokogiri'

module Helper
  def read(url)
    Nokogiri::HTML(open(url))
  end
end

If you try to run the Chef Client, you'll get a nasty error like:

================================================================================
Recipe Compile Error in /var/chef/cache/cookbooks/bacon/libraries/helper.rb
================================================================================


LoadError
---------
cannot load such file -- nokogiri


Cookbook Trace:
---------------
  /var/chef/cache/cookbooks/bacon/libraries/helper.rb:1:in `<top (required)>'


Relevant File Content:
----------------------
/var/chef/cache/cookbooks/bacon/libraries/helper.rb:

  1>> require 'nokogiri'
  2:
  3:  module Helper
  4:    def read(url)
  5:      Nokogiri::HTML(open(url))
  6:    end
  7:  end
  8:

This is geek speak for "you haven't installed the Nokogiri gem on your system". Since this gem is going to be used inside of Chef (i.e. in a Chef library), we want to use the chef_gem resource. Logically, you would add something like the following to your default recipe:

chef_gem 'nokogiri'

You would think the default recipe would run, installing the Nokogiri, and then the Chef run would execute successfully. If you execute this Chef recipe, you'll get the same error as above. That's because Chef tries to load libraries earlier than it executes recipes. If you have some experience developing Ruby applications, you probably want to move the require 'nokogiri' statement into the method body, since that delays its loading:

module Helper
  def read(url)
  require 'nokogiri'
    Nokogiri::HTML(open(url))
  end
end

This Chef Client run will execute successfully!

While this is a good "temporary" solution, it has some serious limitations:

  1. Having requires in your def statements is a code smell.
  2. It can easily become repetitive.
  3. It does not permit class methods or module inclusions/extensions.

Specifically, consider if you wanted to use HTTParty to make HTTP requests. You typically use the HTTParty module as follows:

require 'httparty'

class Requester
  include HTTParty
  base_uri 'api.example.com'
end

And now we are faced with another "chicken-and-egg" problem. In order to include HTTParty and set the base_uri class method, we need to require 'httparty'. But we can't actually require httparty until after our class is entirely loaded (as demonstrated with the nokogiri gem again). So our earlier approach will no longer work:

class Requester
  include HTTParty
  base_uri 'api.example.com'

  def initialize
    require 'httparty'
  end
end

It will blow up like this:

================================================================================
Recipe Compile Error in /var/chef/cache/cookbooks/bacon/libraries/helper.rb
================================================================================


NameError
---------
uninitialized constant Requester::HTTParty


Cookbook Trace:
---------------
  /var/chef/cache/cookbooks/bacon/libraries/helper.rb:2:in `<class:Requester>'
  /var/chef/cache/cookbooks/bacon/libraries/helper.rb:1:in `<top (required)>'


Relevant File Content:
----------------------
/var/chef/cache/cookbooks/bacon/libraries/helper.rb:

  1:  class Requester
  2>>   include HTTParty
  3:    base_uri 'api.example.com'
  4:
  5:    def initialize
  6:      require 'httparty'
  7:    end
  8:  end
  9:

Well this sucks. And there are some really complex solutions to work around these and other issues. I'm sure you've seen at least one recipe like this:

# Install build-essential at runtime, so we can install
# Nokogiri before our library is used.
package 'build-essential' do
  action :nothing
end.run_action(:install)

# Install Nokogiri
chef_gem 'nokogiri'

Solution #1 - Push down the stack

This solution involves a deeper understanding of the Chef internals and the Ruby programming language, but it does offer a fairly elegant solution that covers "most" use cases. It involves delaying the loading of classes and libraries until the last possible minute. It also involves more manual work on the part of users of the cookbook.

  1. Move require statements immediately following chef_gem installations in your recipe:

    chef_gem 'httparty'
    require 'httparty'
    
  2. Convert all your classes to perform any setup in the initialize method:

    class Requestor
      def initialize
        self.class.send(:include, HTTParty)
        self.class.send(:base_uri, 'api.example.com')
      end
    end
    

These are Rubyisms - we are modifying the parent's eigenclass at runtime. Normally this is a really bad idea, but it provides a fairly elegant solution in this example.

  1. Update your README/documentation

    # In order to use the foo_resource, you must include the "foo"
    # recipe in your run_list.
    

Advantages:

  • Easy to convert
  • Follows existing patterns

Disadvantages:

  • Hack
  • Not performant
  • Relies on human process (i.e. RTFM)

Solution #2 - Bootstrapping

If you know you are going to need a particular gem on a system, you can create a custom knife bootstrap script to install that gem when Chef is installed on the system. For example:

# ... existing bootstrap
gem update --no-rdoc --no-ri
gem install ohai --no-rdoc --no-ri --verbose
gem install chef --no-rdoc --no-ri --verbose <%= bootstrap_version_string %>

# Add this
gem install nokogiri --no-rdoc --no-ri --verbose

# ... existing bootstrap

Advantages:

  • Simple
  • One-time operation

Disadvantages:

  • Not extensible
  • Only works on new systems

Solution #3 - Vendoring

To the best of my knowledge, this is an entirely new approach. I do not know of any cookbooks that currently use this pattern. I have yet to load test or fully evaluate this approach, but I think it offers the most elegant solution of all those here.

This approach involves packaging the gem inside the cookbook. In the Ruby world, this process is referred to as "vendoring a gem". Bundler, for example, vendors Thor. I recommend vendoring the gem inside of files/default/vendor, since it's semantic and is automatically distributed and packaged with the cookbook.

To install a gem, you normally run the command:

gem install GEMNAME

But the install command also accepts an optional list of arguments:

Usage: gem install GEMNAME [GEMNAME ...] [options] -- --build-flags [options]

So, to vendor a gem inside of files/default/vendor, run the following command from inside your cookbook root:

gem install --install-dir files/default/vendor --no-document GEMNAME
  1. --install-dir tells Ruby to install the gem inside our cookbook.
  2. --no-document tells Ruby to ignore documentation (since we are just packaging this cookbook, documentation is unnecessary).

This will create a few files and folders inside of your cookbook. The gem you installed, as well as any required dependencies, are now packaged in your cookbook! Feel free to inspect files/default/vendor to see everything that is installed. Now we just need to expand our $LOAD_PATH to include this directory at runtime.

At the very top of your library, add the following line:

$:.unshift *Dir[File.expand_path('../../files/default/vendor/gems/**/lib', __FILE__)]
  1. $: is short for $LOAD_PATH, which is the array of paths Ruby searches when you require a file.
  2. unshift is an Array method that puts all elements as arguments at the front of the array
  3. * is the splat operator; it has many uses, but in this instance, it converts the array into a list of parameters to the unshift method.
  4. Dir[] is equivalent to Dir.glob and behaves very similar to ls or dir and supports wildcard expansion; it will return an array of file paths on disk.
  5. File.expand_path converts a relative path to an absolute path; the optional second argument is the location to expand from; in other words, expand the path relative from this current file on disk.
  6. __FILE__ is a Ruby shortcut for the path to the current file on disk.

So if you were to break this down step-by-step:

  1. The function is decomposed from the inside-out:

    $:.unshift *Dir[File.expand_path("../../files/default/vendor/gems/**/lib", __FILE__)]
    
  2. The path is expanded relative to the current file:

    $:.unshift *Dir["/var/chef/cache/cookbooks/bacon/files/default/vendor/gems/**/lib"]
    
  3. Dir expands the ** on disk:

    $:.unshift *[
      "/var/chef/cache/cookbooks/bacon/files/default/vendor/gems/httparty-0.12.0/lib",
      "/var/chef/cache/cookbooks/bacon/files/default/vendor/gems/json-1.8.1/lib",
      "/var/chef/cache/cookbooks/bacon/files/default/vendor/gems/multi_xml-0.5.5/lib"
    ]
    
  4. The splat operator converts the array into method parameters:

    $:.unshift(
      "/var/chef/cache/cookbooks/bacon/files/default/vendor/gems/httparty-0.12.0/lib",
      "/var/chef/cache/cookbooks/bacon/files/default/vendor/gems/json-1.8.1/lib",
      "/var/chef/cache/cookbooks/bacon/files/default/vendor/gems/multi_xml-0.5.5/lib"
    )
    
  5. $: is a shortcut for $LOAD_PATH:

    $LOAD_PATH.unshift(...)
    
  6. The $LOAD_PATH now includes our gem files at the top:

    $LOAD_PATH #=> [
      '/var/chef/cache/cookbooks/bacon/files/default/vendor/gems/httparty-0.12.0/lib',
      '/var/chef/cache/cookbooks/bacon/files/default/vendor/gems/json-1.8.1/lib',
      '/var/chef/cache/cookbooks/bacon/files/default/vendor/gems/multi_xml-0.5.5/lib',
      # Existing $LOAD_PATH
    ]
    
  7. We can now require our gem:

    require 'httparty'
    

Advantages:

  • Self-contained
  • No third-party dependencies
  • Versioned
  • No compile-time madness

Disadvantages:

  • Size (cookbooks are limited in file size)
  • Gem activation errors can still occur

About Seth

Seth Vargo is an engineer at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.