The Cleanroom Pattern in Ruby
I recently gave a talk at Philly.rb, the Ruby meetup in Philadelphia, PA entitled "The Cleanroom Pattern - More safely evaluating DSLs with Ruby". You can watch the full Cleanroom DSL video online, but I decided to also write the contents of the talk into a blog post.
Background on DSLs
Most Ruby-based DSLs are created using a simple instance_eval
. While slightly less dangerous than eval
, instance_eval
still opens the system up to dangerous circumstances. Consider the following DSL file, Project
, which has a name
attribute:
class Project
NULL = Object.new.freeze
def name(val = NULL)
if val.equal?(NULL)
@name
else
@name = sanitize(val)
end
end
private
def sanitize(string)
string.gsub(/\s+/, '-').downcase
end
end
There are a few things to take note of here:
- A new, frozen object is created to represent "NULL". While it is true that Ruby has a native implementation of
nil
, having a default value ofnil
would actually prevent the user from setting the value tonil
(since that would be assumed to have passed "nothing"). You may have seen this problem when working with some Chef resources. - There is a DSL method called
name
, which is essentially overloaded as two methods. When given no parameters, the method simply returns the instance variable@name
. When given a value, the value is sanitized and then set on the@name
instance variable. - There is a private sanitize method that replaces all space-like characters with a dash (because otherwise the world explodes!).
Inside your system, you would likely load this DSL file as such:
path = '/path/to/dsl.rb'
contents = File.read(path)
project = Project.new
project.instance_eval(contents, File.basename(path), 0)
project
So, given a DSL file like:
name "hamlet"
The loading process would result in a #<Project>
object with a name of "hamlet":
project.name #=> "hamlet"
Problem #1 - Private Methods
During an instance_eval
(or instance_exec
), the entire instance is exposed to the user - it is just as if you were writing code directly in project.rb
in a text editor. That means public
, protected
, and private
methods are all accessible by the user:
Project.new.instance_eval do
sanitize("String Here")
end
#=> "string-here"
This is not "terrible", since Rubyists are quite familiar with the ability to use send
(or __send__
) to call these methods anyway. However, it is very unclear to the DSL author what methods are public and private.
Problem 2 - Scope Creep
The sanitize
method has a pretty generic name. Since these are Ruby DSLs, it is feasible that a savvy developer may create a "helper" methods to ease the development process. Consider the following DSL file:
#
# Define a +sanitize+ method that uppercases the value for ...
#
# @param [#to_s] string
# the string to parameterize
#
# @return [String]
#
def sanitize(string)
string.to_s.upcase
end
name "Some String"
The resulting output would be:
project.name #=> "SOME STRING"
The DSL author unintentionally changed the behavior of the instance by making a simple helper method. While this is a contrived example, consider larger DSL-based projects like Chef or Omnibus which have hundreds of tiny helper methods - the possibility of collision is much higher.
Thankfully, since this is instance_eval
, the change to the sanitize
method is scoped to this DSL method (meaning changing the method here does not change it for future evaluations). We only edited the igenclass.
Problem 3 - Bypassing Validation
Consider a user who really wants to have spaces in his/her project name. They could easily bypass the entire system by just setting the instance variable manually:
@name = "My Custom Name"
When this file is evaluated in the context of the Project object:
project.name #=> "My Custom Name"
The user has completely circumvented our sanitize
method by just accessing the instance variable directly. Worse, this is an intentional design in Ruby:
In order to set the context, the variable
self
is set toobj
while the code is executing, giving the code access toobj
's instance variables.
Problem 4 - Persisted Changes
The biggest problem with instance_eval
is that it gives you access to self
, an instance of the Project
class in these examples. self
has access to its parent, so truly malicious code could permanently change the behavior of future instance_evals
A very clever developer could permanently change the behavior of sanitize
for all future instance of this class (until project.rb
is reloaded from disk):
Project.new.instance_eval do
self.class.class_eval do
def sanitize(val)
val.upcase
end
end
end
Uh oh!
Project.new.sanitize("foo") => "FOO"
Project.new.sanitize("foo") => "FOO"
Project.new.sanitize("foo") => "FOO"
Project.new.sanitize("foo") => "FOO"
This code has permanently changed the behavior of the instance's sanitize
method (note how I am creating a new instance). If you are writing a Ruby application that accepts a user-given DSL or dealing with a long-running Ruby proceess, a malicious user could alter the underlying state of the sytem in memory.
Explaining the Cleanroom Pattern
The cleanroom pattern is an idiomatic way evaluate Ruby DSLs in an isolated environment while restricting the methods and level of access a user has. I want to be clear: I did not invent the cleanroom pattern! It can be found in Metaprogamming Ruby books, various blog posts, and popular community projects. I actually learned of the cleanroom pattern from my good friend and fellow Berkshelf-core-team member Jamie Winsor, so thanks!
The general pattern for a cleanroom looks like this:
- The class defines which methods should be exposed on its DSL instances.
- During evaluation a new, anonymous instance, which only has those defined methods is created. This object is created in the top-level
Object
space to prevent leaking. - The Ruby file is
instance_eval
ed against this anonymous instance which has very restricted access to the parent instance. - The anonymous instance then proxies data back to the original instance using
public_send
.
Thus there are three areas of protection:
- The class defines which values are public within the DSL. Only those methods exist on the anonymous instance, thus preventing namespace collisions.
- The anonymous instance is created fresh, each time. Even if a malicious attacker is able to craft something to permanently modify the class, it would only persist for that anonymous instance, which is cleaned up during the next GC run.
- The anonymous instance proxies back to the "real" instance using
public_send
. So, even if an attacker was able to bypass all the existing mitigations, they would only be able to call public methods on the instance.
The code for creating the cleanroom object is a bit complex and meta:
def cleanroom
Class.new(Object) do # <1>
define_method(:initialize) do |instance| # <2>
define_singleton_method(:__instance__) do # <3>
unless caller[0].include?(__FILE__) # <4>
raise Cleanroom::InaccessibleError.new(:__instance__, self)
end
instance # <5>
end
end
exposed.each do |exposed_method| # <6>
define_method(exposed_method) do |*args, &block|
__instance__.public_send(exposed_method, *args, &block)
end
end
end
end
- First we create a new anonymous class inheriting for
Object
. Next we dynamically define an
#initialize
method on the class which accepts an instance as the parameter. In normal-Ruby:def initialize(instance) # ... end
During initialization, a new singleton method is created on the igenclass of the instance. This is basically the same as a regular
def
method, but it only exists inside the context of this instance. Furthermore, we create it during initialization, thus allowing us to bind to the parent, giving us access to the giveninstance
parameter. Basically we are doing this:def initialize(instance) @instance = instance end def __instance__ @instance end
But since this anonymous class is what gets
instance_eval
ed, exposing the real instance as an instance variable would allow an attacker to completely bypass the system (remember, instance variables are within scope during aninstance_eval
)!Instead, we are creating a dynamic method at runtime that refers to the parameter given to the
#initialize
method. This allows us to "store" the value in a method, but not expose it in an instance variable.Inside the aforementioned method, we add an extra guard that only permits the method to be called from inside
self
. This is a major hack, but we inspect thecaller
object and make sure the person who called the__instance__
method is the name of the file we are currently running (not a DSL file).If an error was not raised, we return the instance that was given to us in the
#initialize
method.For each exposed method (which I have just called
exposed
in the code snippet), we define a method and public send to__instance__
.
Using the Cleanroom
Fortunately you do not need to understand all of this to utilize a cleanroom in your projects! I have wrapped all this logic, plus tests and custom RSpec matchers into the cleanroom gem. The gem is already in use in popular projects like Omnibus and Berkshelf, and you can easily use it too!
After you have added the cleanroom
gem to your Gemfile and executed the bundle
command to install, simply require and include the Cleanroom module in any DSL:
# my_dsl_file.rb
require 'cleanroom'
class MyDSLFile
include Cleanroom
end
Immediately, without writing any code, you have been given access to the following methods:
MyDSLFile.evaluate_file
- evaluate a file against an instanceMyDSLFile.evaluate
- evaluate raw Ruby (as a String) or a block against an instanceMyDSLFile#evaluate_file
- evaluate a file against this instanceMyDSLFile#evaluate
- evaluate raw Ruby (as a String) or a block against this instance
For example:
dsl = MyDSLFile.new
dsl.evaluate_file('/path/to/file.rb')
dsl #=> #<MyDSLFile:0xabc123>
For each method you want to be exposed as part of the DSL API (which may be separate from the public API), simply call expose
:
require 'cleanroom'
class MyDSLFile
include Cleanroom
def some_dsl_method
# ...
end
expose :some_dsl_method
end
With just that one additional line of code for the expose
method, you get all of the features and magic described before. Go ahead and try it out!
- Example cleanroom method
Project#name
in Omnibus - Example cleanroom method
Berksfile#extension
in Berkshelf
Final Thoughts
The slides and video from my talk are linked above, but I have included them here as well:
On a final note - there is still much exploration to be done in this area. Perhaps the DSL evaluation should set Ruby's $SAFE
level or prevent against system
or%x
calls... maybe it should not. The cleanroom pattern and gem has been especially useful in my daily work, and I really hope you benefit from it as well!
About Seth
Seth Vargo is a Distinguished Software Engineer at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.