Metaprogramming in Ruby

Ruby is a programming language created by Yukihiro Matsumoto (better known as Matz) a form of compilation of everything he liked the best about his favorite languages: Perl, Smaltalk, Eiffel, Ada, and Lisp. Matz was motivated to create a new language by balancing functional with imperative programming.

One of the first reactions people have when first interacting with Ruby is to say: “Wow, this is very simple!” Matz, however, states that his goal is to make Ruby natural, not simple. Matz remarks, “Ruby is simple in appearance, but is very complex inside, just like our human body.”

On its official page, Ruby is described as a “dynamic, open-source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write.” There is much more that can be said about Ruby, even in an introductory fashion, but this initial description is, in my view, spot on.

When I use Ruby, I am not thinking about some of the mechanics of programming. Instead, I am mostly thinking about the result I seek to produce. Matz wanted Ruby code to be easily read by humans. Ruby code is meant to be very elegant and simple, which makes it my favorite language for prototyping.

What Is Mettaprogramming?

If elegance, simplicity, and the natural aspect of its syntax are already great ingredients for prototyping, my favorite thing about Ruby is something yet more intriguing: metaprogramming!

Informally, metaprogramming is often refferred to as “writing code that writes code”. If you search online, this is the most popular definition of metaprogramming: “Code that writes code.” Well, I’m not too fond of this definition. The reason is straightforward. Consider the following C++ code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// example.cpp

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main()
{
  ofstream ofs("main.cpp");
  string code = "#include <iostream>\n"
                "using namespace std;\n\n"
                "int main()\n"
                "{\n"
                "  int a = 2;\n"
                "  int b = 3;\n\n"
                "  cout << \"a = \" << a << endl;\n"
                "  cout << \"b = \" << b << endl;\n"
                "  cout << \"a + b = \" << a + b << endl;\n"
                "  cout << \"a * b = \" << a * b << endl;\n\n"
                "  return 0;\n"
                "}";
  ofs << code;

  return 0;
}

When I run g++ example.cpp -o example --std=c++11 && ./example, a new file main.cpp will be created, which contains the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// main.cpp

#include <iostream>
using namespace std;

int main()
{
  int a = 2;
  int b = 3;

  cout << "a = " << a << endl;
  cout << "b = " << b << endl;
  cout << "a + b = " << a + b << endl;
  cout << "a * b = " << a * b << endl;

  return 0;
}

When I run g++ main.cpp -o main --std=c++11 && ./main, I obtain:

a = 2
b = 3
a + b = 5
a * b = 6

This is a naive example of “code that writes code.” Ok, maybe too naive, but the idea here is to illustrate the limitations of this popular definition of metaprogramming. A code that writes code is not interesting in itself. How code writes code and what you can do with that is an entirely different story.

Paolo Perrotta wrote a wonderful book about metaprogramming in Ruby. Perrotta describes Ruby source code as “a world teeming with vibrant citizens including variables, classes, and methods.” These citizens are language constructs. Therefore a more technical (and much more meaningful) definition of metaprogramming is writing code that manipulates language constructs at runtime. This concept is so important that I will break it down for better visibility:

  • What: writing code that manipulate language constructs.
  • When: at runtime.

I like the second definition much better. Not every language can do that, and the way Ruby achieves this dynamic manipulation of language constructs makes it incredibly elegant and powerful.

All I can do in a single blog post is to scratch the surface of metaprogramming in Ruby. For that, I invite you to look at five, amongst other building blocks of metaprogramming in Ruby: Dynamic Dispatch, Dynamic Methods, Ghost Methods, Dynamic Proxy, and Blank Slate.

Language Constructs

For all the examples in this post, I used Ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin19].

When we create a class in Ruby, that class inherits properties and behaviors from other default classes unless we decide otherwise. These classes are called “ancestors.” They provide fundamental functionalities for any custom class in their lineage. We can check what are the ancestors of my class as follows:

1
2
3
4
5
class MySimpleClass
end

MySimpleClass.ancestors
# => [MySimpleClass, Object, PP::ObjectMixin, Kernel, BasicObject]

We can check what each of these ancestors is by checking their associated classes:

MySimpleClass.ancestors.map(&:class)
# => [Class, Class, Module, Module, Class]

We can also list what “fundamental functionalities” are inherited when we create a class in Ruby by executing:

BasicObject.methods
# => [:allocate,  :superclass,  :subclasses,  :new,  :instance_method,  :public_instance_method,  :<=>,  :define_method,  :<=,  :>=,  :==,  :===,  :included_modules,  :include?,  :ancestors,  :attr,  :attr_reader,  :attr_writer,  :attr_accessor,  :instance_methods,  :public_instance_methods,  :protected_instance_methods,  :private_instance_methods,  :constants,  :freeze,  :inspect,  :const_set,  :const_get,  :const_source_location,  :const_defined?,  :class_variable_set,  :class_variables,  :remove_class_variable,  :class_variable_get,  :const_missing,  :class_variable_defined?,  :<,  :private_constant,  :>,  :singleton_class?,  :public_constant,  :deprecate_constant,  :prepend,  :include,  :module_exec,  :to_s,  :module_eval,  :class_exec,  :class_eval,  :pretty_print_cycle,  :remove_method,  :undef_method,  :alias_method,  :method_defined?,  :public_method_defined?,  :private_method_defined?,  :name,  :protected_method_defined?,  :public_class_method,  :private_class_method,  :autoload,  :autoload?,  :pretty_print,  :pretty_print_instance_variables,  :pretty_print_inspect,  :singleton_class,  :dup,  :itself,  :taint,  :tainted?,  :untaint,  :untrust,  :untrusted?,  :trust,  :methods,  :singleton_methods,  :protected_methods,  :private_methods,  :public_methods,  :instance_variables,  :instance_variable_get,  :instance_variable_set,  :instance_variable_defined?,  :remove_instance_variable,  :instance_of?,  :kind_of?,  :is_a?,  :display,  :hash,  :public_send,  :class,  :frozen?,  :tap,  :yield_self,  :then,  :extend,  :clone,  :method,  :public_method,  :singleton_method,  :define_singleton_method,  :=~,  :!~,  :nil?,  :eql?,  :respond_to?,  :object_id,  :send,  :to_enum,  :enum_for,  :pretty_inspect,  :__send__,  :!,  :instance_eval,  :instance_exec,  :!=,  :equal?,  :__id__]

You see a long list of methods inherited by the classes/modules in the BasicObject’s lineage. We can ask the list of methods for the module Kernel:

Kernel.methods
# => [:puts,  :readline,  :readlines,  :p,  :Complex,  :Float,  :caller,  :caller_locations,  :set_trace_func,  :sprintf,  :format,  :Integer,  :String,  :Array,  :Hash,  :local_variables,  :fork,  :Pathname,  :exit,  :pp,  :warn,  :test,  :raise,  :gets,  :fail,  :global_variables,  :__method__,  :__callee__,  :__dir__,  :proc,  :lambda,  :eval,  :iterator?,  :block_given?,  :catch,  :throw,  :loop,  :sleep,  :rand,  :srand,  :trap,  :select,  :`,  :trace_var,  :untrace_var,  :load,  :at_exit,  :require_relative,  :require,  :autoload?,  :autoload,  :binding,  :Rational,  :exec,  :exit!,  :system,  :spawn,  :abort,  :syscall,  :open,  :printf,  :print,  :putc,  :instance_method,  :public_instance_method,  :<=>,  :define_method,  :<=,  :>=,  :==,  :===,  :included_modules,  :include?,  :ancestors,  :attr,  :attr_reader,  :attr_writer,  :attr_accessor,  :instance_methods,  :public_instance_methods,  :protected_instance_methods,  :private_instance_methods,  :constants,  :freeze,  :inspect,  :const_set,  :const_get,  :const_source_location,  :const_defined?,  :class_variable_set,  :class_variables,  :remove_class_variable,  :class_variable_get,  :const_missing,  :class_variable_defined?,  :<,  :private_constant,  :>,  :singleton_class?,  :public_constant,  :deprecate_constant,  :prepend,  :include,  :module_exec,  :to_s,  :module_eval,  :class_exec,  :class_eval,  :pretty_print_cycle,  :remove_method,  :undef_method,  :alias_method,  :method_defined?,  :public_method_defined?,  :private_method_defined?,  :name,  :protected_method_defined?,  :public_class_method,  :private_class_method,  :pretty_print,  :pretty_print_instance_variables,  :pretty_print_inspect,  :singleton_class,  :dup,  :itself,  :taint,  :tainted?,  :untaint,  :untrust,  :untrusted?,  :trust,  :methods,  :singleton_methods,  :protected_methods,  :private_methods,  :public_methods,  :instance_variables,  :instance_variable_get,  :instance_variable_set,  :instance_variable_defined?,  :remove_instance_variable,  :instance_of?,  :kind_of?,  :is_a?,  :display,  :hash,  :public_send,  :class,  :frozen?,  :tap,  :yield_self,  :then,  :extend,  :clone,  :method,  :public_method,  :singleton_method,  :define_singleton_method,  :=~,  :!~,  :nil?,  :eql?,  :respond_to?,  :object_id,  :send,  :to_enum,  :enum_for,  :pretty_inspect,  :__send__,  :!,  :instance_eval,  :instance_exec,  :!=,  :equal?,  :__id__]

Now you see an even longer list of methods than before. We can check the sizes of these lists:

1
2
BasicObject.methods.size # => 118
Kernel.methods.size # => 175

We can also see precisely what the methods that belong to Kernel but not to BasicObject:

Kernel.methods - BasicObject.methods
# => [:puts,  :readline,  :readlines,  :p,  :Complex,  :Float,  :caller,  :caller_locations,  :set_trace_func,  :sprintf,  :format,  :Integer,  :String,  :Array,  :Hash,  :local_variables,  :fork,  :exit,  :pp,  :Pathname,  :warn,  :test,  :raise,  :gets,  :fail,  :global_variables,  :__method__,  :__callee__,  :__dir__,  :proc,  :lambda,  :eval,  :iterator?,  :block_given?,  :catch,  :throw,  :loop,  :sleep,  :rand,  :srand,  :trap,  :select,  :`,  :trace_var,  :untrace_var,  :load,  :at_exit,  :require_relative,  :require,  :binding,  :Rational,  :exec,  :exit!,  :system,  :spawn,  :abort,  :syscall,  :open,  :printf,  :print,  :putc]

But wait:

Object.methods.size # => 118

What is going on here? Shouldn’t Object have at least 175 methods like Kernel? Actually, no. Ruby does not support multiple inheritances. One of the methods defined in BasicObject is :superclass. Obviously, BasicObject does not have a superclass (parent class):

BasicObject.superclass # => nil

But Object has:

Object.superclass # => BasicObject

So Object inherits from BasicObject. We can even check the following:

Object.methods == BasicObject.methods # => true

What is PP::ObjectMixin and Kernel doing “above” Object when we look at the ancestors of MySimpleClass? Simple: as we saw before, PP::ObjectMixin and Kernel are modules, not classes, and we know that a class can only have one superclass. While we can create a class that inherits from another class (and just one), we can include as many modules we want using the principle of composition. Therefore we can check:

Object.included_modules
# => [PP::ObjectMixin, Kernel]
MySimpleClass.included_modules
# => [PP::ObjectMixin, Kernel]

When I created MySimpleClass, it automatically inherited from Object, which includes the modules PP::ObjectMixin and Kernel, and therefore MySimpleClass also include these modules. The correct way to read the ancestors list is as follows: MySimpleClass inherits from Object, Object includes PP::ObjectMixin and Kernel, and Object inherits from BasicObject.

How are modules included in a class? Consider the following module:

1
2
3
4
5
6
7
8
9
module MyModule
  def my_first_method
    puts "My first method"
  end

  def my_second_method
    puts "My second method"
  end
end

Now we create MySimpleClass and include MyModule as follows:

1
2
3
4
5
6
7
class MySimpleClass
  include MyModule

  def some_method
    puts "Some method"
  end
end

And we can check what instance methods we have now available in MySimpleClass:

MySimpleClass.instance_methods - Object.methods
# => [:some_method, :my_first_method, :my_second_method]

In ruby, the top-level object (something similar to the scope of main in C) is Object, and we know Object includes Kernel. Therefore, the methods defined in Kernel are available to Object and any of its descendants without the need to refer to Kernel explicitly! This includes methods we use without even thinking, such as puts, rand, raise, catch, throw, and all the others defined in Kernel. This is why we can execute:

puts "Hello!"
# => Hello!

instead of

Kernel.puts "Hello!"
# => Hello!

In fact, when you are running IRB (interactive Ruby), a REPL (Read-Eval-Print-Loop) environment, and you type

self
# => main

you get main. We never define main in Ruby. This is just to indicate that the top-level object in Ruby (a language where everything is an object), is an instance of Object:

self.class
# => Object

Manipulating Language Constructs

Let’s take a look at how Ruby allows us to interact with its “vibrant citizens,” as Perrotta describes Ruby’s language constructs. When you hear that Ruby is a dynamic language, you should know that Ruby is pretty serious about that.

As an example, consider the following class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
class AnotherClass
  attr_reader :full_name, :dob
  attr_accessor :email, :phone, :zipcode

  SEPARATOR = "-"

  def initialize(first_name, last_name, dob, email, phone, zipcode)
    @first_name = first_name
    @last_name = last_name
    @email = email
    @dob = parse_date(dob)
    @phone = parse_phone(phone)
    @zipcode = zipcode
  end

  def full_name
    name.join(" ")
  end

  def contact
    "
    #{contact_full_name}
    #{email}
    #{phone}
    #{zipcode}
    "
  end

  private

  def name
    [@first_name, @last_name]
  end

  def contact_full_name
    name.reverse.join(", ")
  end

  def parse_input(positions,input)
    positions.each{|i| input.insert(i,SEPARATOR)}
    input
  end

  def parse_date(date)
    parse_input([4,7],date)
  end

  def parse_phone(phone)
    parse_input([3,7],phone)
  end
end

We can now interact with Ruby’s vibrant citizens in a number of ways. First, we instantiate an object of AnotherClass:

1
2
3
4
5
6
7
8
9
obj = AnotherClass.new "John", "Smith", "19950223", "jsmith@domain.com", "8205550123", "501234"
# => #<AnotherClass:0x00007f98c38e08d8 @first_name="John", @last_name="Smith", @email="jsmith@domain.com", @dob="1995-02-23", @phone="820-555-0123", @zipcode="501234">
obj.full_name
# => John Smith
obj.contact
# =>    Smith, John
#       jsmith@domain.com
#       820-555-0123
#       501234

Then we information from obj and AnotherClass such as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
obj.class
# => AnotherClass
obj.class.ancestors
# => [AnotherClass, Object, PP::ObjectMixin, Kernel, BasicObject]
obj.instance_variables
# => [:@first_name, :@last_name, :@email, :@dob, :@phone, :@zipcode]
obj.public_methods - Object.public_methods
# => [:full_name, :contact, :phone=, :zipcode=, :email, :email=, :dob, :phone, :zipcode]
obj.private_methods - Object.private_methods
# => [:parse_date, :parse_phone, :contact_full_name, :parse_input, :name, :autoload, :autoload?]
# We can list the parameters of any given method, if any
[:parse_date, :parse_phone].map{|m| obj.method(m).parameters.map{|params| {method: m, params: params[1..-1]}}.flatten}
# => [[{:method=>:parse_date, :params=>[:date]}], [{:method=>:parse_phone, :params=>[:phone]}]]
AnotherClass.constants
# => [:SEPARATOR]
AnotherClass.name
# => "AnotherClass"
# We can infer what methods are setters
(AnotherClass.instance_methods - Object.instance_methods).select{|m| m.to_s.include?("=")}
# => [:phone=, :zipcode=, :email=]

The above is far from exhaustive. It is undoubtedly good to interact with the language constructs in Ruby dynamically. How we do, it is even better.

Dynamic Dispatch

Dynamic Dispatch is a technique that allows us to treat a method name as an argument that can be passed to another method that handles its execution. When we create instance methods for any given class in Ruby, we typically call them using the dot notation. As an example, consider the code below:

1
2
3
4
5
6
7
8
class MySimpleClass
  def my_simple_method(string1, string2)
    string1 + " " + string2
  end
end

obj = MySimpleClass.new
obj.my_simple_method("Hello","World") # => "Hello World"

Alternatively, we obtain the same result using the method :send as follows:

obj.send(:my_simple_method,"Hello","World") # => "Hello World"

How dynamic is Ruby? Let’s say that the last definition of MySimpleClass was the very first we created. If in a future moment I do:

1
2
3
4
5
class MySimpleClass
  def some_other_method
    puts "Some other method"
  end
end

and I request the instance methods of MySimpleClass, I obtain:

MySimpleClass.instance_methods - Object.methods
# => [:some_method, :some_other_method, :my_first_method, :my_second_method]

And if we consider the very first definition of MySimpleClass is still accessible in memory, then we obtain:

MySimpleClass.instance_methods - Object.methods
# => [:my_simple_method, :some_method, :some_other_method, :my_first_method, :my_second_method]

Therefore we can modify the struct of a class at the time of the execution of a program.

To show one form of Dynamic Dispatch in action, consider the following class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class AnotherSimpleClass
  def first_method_with_no_arguments
    puts "First method with no arguments"
  end

  def second_method_with_no_arguments
    puts "Second method with no arguments"
  end

  def third_method_with_no_arguments
    puts "Third method with no arguments"
  end

  def first_method_with_two_arguments(string1, string2)
    puts string1 + " " + string2
  end

  def second_method_with_two_argumetns(string1, string2)
    puts string1 + " => " + string2
  end
end

For any instance method in AnotherSimpleClass, we their argumetns:

1
2
3
obj = AnotherSimpleClass.new
obj.method(:first_method_with_no_arguments).parameters # => []
obj.method(:first_method_with_two_arguments).parameters # => [[:req, :string1], [:req, :string2]]

Therefore we can manipulate these language constructs for dynamically calling these methods. Let’s say that I want to call all methods with no arguments. I can do the following:

1
2
3
4
5
6
methods_no_arguments = (obj.methods - Object.methods).select{|m| obj.method(m).parameters.empty?}
# => [:first_method_with_no_arguments, :second_method_with_no_arguments, :third_method_with_no_arguments]
methods_no_arguments.each{|m| obj.send(m) }
# => First method with no arguments
# => Second method with no arguments
# => Third method with no arguments

For invoking only the methods with two arguments, I proceed as follows:

1
2
3
4
5
methods_two_arguments = (obj.methods - Object.methods).select{|m| obj.method(m).parameters.size == 2}
# => [:first_method_with_two_arguments, :second_method_with_two_argumetns]
methods_two_arguments.each{|m| obj.send(m, "Hello", "World")}
# => Hello World
# => Hello => World

The method send will call any method in the respective class, including private methods. If you want to confine the dynamic execution of methods to public methods, you can use public_send instead.

Dynamic Methods

We already saw that we could add methods to an existing class as if we were creating the class for the first time. But there is a shorter way to do that. Consider our existing AnotherSimpleClass. We can dynamically define a new method as follows:

1
2
3
4
5
6
7
8
9
10
11
12
AnotherSimpleClass.define_method :my_new_method do |arg1, arg2|
  puts "arg1 = #{arg1}"
  puts "arg2 = #{arg2}"
end

obj = AnotherSimpleClass.new
obj.my_new_method("Hello", [1,2,3,4])
# => arg1 = Hello
# => arg2 = [1, 2, 3, 4]

obj.methods - Object.methods
=> [:first_method_with_two_arguments, :second_method_with_two_argumetns, :my_new_method, :first_method_with_no_arguments, :second_method_with_no_arguments, :third_method_with_no_arguments]

When it comes to metaprogramming, the advantage of using define_method instead of def method is that we can easily pass the new method’s name as an argument in the same way we call other class’ methods, which can be done at runtime.

Ghost Methods

What happens when we call a method in Ruby? Consider the the instance obj = AnotherSimpleClass. When we call the method :first_method_with_no_arguments, Ruby looks at obj.instance_methods trying to find that method. If it finds it, it will call it. If it does not find it, then it will try to look for an implementation of a private method in BasicObject called :method_missing:

1
2
3
BasicObject.private_methods.size # => 87
BasicObject.private_methods.select{|m| m.to_s.include?("missing")}
# => [:respond_to_missing?, :method_missing]

If Ruby does not find an implementation for :method_missing (I will talk about this later), then it calls :method_undefined. Let’s see this in practice:

1
2
3
obj = AnotherSimpleClass.new
obj.crazy
# => NoMethodError (undefined method `crazy' for #<AnotherSimpleClass:0x00007fe3be15db60>)

We can’t call method_undefined using dot notation since it is a private method, as we can see here:

BasicObject.private_methods.select{|m| m.to_s.include?("undefined")}
# => [:method_undefined, :singleton_method_undefined]

But we call it using :send:

obj.send(:method_undefined, :crazy)
# => NoMethodError (undefined method `method_undefined' for #<AnotherSimpleClass:0x00007fe3be15db60>)

Okay, we know how Ruby calls methods and what happens when it cannot find them. But what does it mean to loo for an implementation for :method_missing?

Method Missing

There is no such thing as a compiler to enforce method calls in Ruby. Crazy, right? Even crazier is the fact that Ruby allows you to call methods that don’t exist! Let me give you one example of how useful this can be. I will create a data set called data:

1
2
3
4
5
6
7
8
data = []
data << {name: "John", age: 25, gender: "M", state: "CO"
data << {name: "Mary", age: 23, gender: "F", state: "CO"}
data << {name: "Gloria", age: 20, gender: "F", state: "FL"}
data << {name: "Paul", age: 23, gender: "M", state: "CA"}
data << {name: "Barb", age: 26, gender: "F", state: "TX"}
data << {name: "Jerry", age: 29, gender: "M", state: "TX"}
# => [{:name=>"John", :age=>25, :gender=>"M", :state=>"CO"}, {:name=>"Mary", :age=>23, :gender=>"F", :state=>"CO"}, {:name=>"Gloria", :age=>20, :gender=>"F", :state=>"FL"}, {:name=>"Paul", :age=>23, :gender=>"M", :state=>"CA"}, {:name=>"Barb", :age=>26, :gender=>"F", :state=>"TX"}, {:name=>"Jerry", :age=>29, :gender=>"M", :state=>"TX"}]

I will now create a class MyDatabase and initialize it passing data as an argument. I will also create an implementation for :method_missing so we can take advantage of the dynamically creating and calling methods in Ruby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class MyDatabase
  attr_reader :data

  def initialize data
    @data = data
  end

  def method_missing(m, *args)
    # I am looking for a pattern like part1_part2_part3 or part1_part2_part3_part4
    # the method split takes some character or string as a separator and creates an array
    parts = m.to_s.split("_")

    # In the first condition:
    # Check if there are three parts
    # Check if all the parts have content
    # Check if the array of hashes includes the informed key

    # In the second condition is similar to the first except this time:
    # Check if there are four parts
    if parts.size == 3 && parts.map{|a| !a.empty? }.uniq == [true] &&
       data[0].keys.include?(parts[2].to_sym)

      data.send(parts[0].to_sym){|d| d[parts[2].to_sym] == args[0]}
      elsif parts.size == 4 && parts.map{|a| !a.empty? }.uniq == [true] &&
        data[0].keys.include?(parts[3].to_sym)

      data.send(parts[0..1].join("_")){|d| d[parts[3].to_sym] == args[0]}
    else
      # if the conditions I specified are not met, I pass control to the
      # original implementation of method_missing, which will not find
      # the method and will call :method_undefined
      super
    end
  end
end

We can instantiate MyDatabase passing the array data as argument:

1
2
3
db = MyDatabase.new data
db.data
# => [{:name=>"John", :age=>25, :gender=>"M", :state=>"CO"}, {:name=>"Mary", :age=>23, :gender=>"F", :state=>"CO"}, {:name=>"Gloria", :age=>20, :gender=>"F", :state=>"FL"}, {:name=>"Paul", :age=>23, :gender=>"M", :state=>"CA"}, {:name=>"Barb", :age=>26, :gender=>"F", :state=>"TX"}, {:name=>"Jerry", :age=>29, :gender=>"M", :state=>"TX"}]

We can now do:

1
2
3
4
5
6
7
8
9
10
11
12
db.find_by_name("Gloria")
# => {:name=>"Gloria", :age=>20, :gender=>"F", :state=>"FL"}
db.find_by_state("TX")
# => {:name=>"Barb", :age=>26, :gender=>"F", :state=>"TX"}
db.find_all_by_age(23)
# => [{:name=>"Mary", :age=>23, :gender=>"F", :state=>"CO"}, {:name=>"Paul", :age=>23, :gender=>"M", :state=>"CA"}]
db.find_all_by_gender("F")
# => [{:name=>"Mary", :age=>23, :gender=>"F", :state=>"CO"}, {:name=>"Gloria", :age=>20, :gender=>"F", :state=>"FL"}, {:name=>"Barb", :age=>26, :gender=>"F", :state=>"TX"}]
db.find_by_country("US")
# => NoMethodError (undefined method `find_by_country' for #<MyDatabase:0x00007fc14e91b3d0>)
db.find_all_by_weight(180)
# => NoMethodError (undefined method `find_all_by_weight' for #<MyDatabase:0x00007fc14e91b3d0>)

This is just a toy example to show what kind of features one can build by dynamically manipulating language constructs in Ruby. Perhaps the most prominent example of dynamic method execution via implementations of :method_missing is the ActiveRecord, an object-relational mapping in Rails. Here is one example.

Dynamic Proxy

In the previous example with MyDatabase, I receive whatever is passed on via method call and try to make sense of the call using pre-defined patterns. If the conditions specified are met, a method is dynamically called, returning the associated result. A similar approach is known as Dynamic Proxy. We still use the idea of Ghost Methods, but this time we forward the call to another method (which can be in another module or class). The most significant difference between Ghost Method and Dynamic Proxy is how to deal with responsibility. When working with Ghost Method in a particular class, you have the responsibility of implementing :method_missing and deciding when and how to give up and let Ruby call :method_undefined. With Dynamic Proxy, you forward the responsibility to another method and treat each situation according to whatever rules are in place.

Here is an example: we have class Person, and we want to “monitor” any call to a method :parse, but we don’t want to implement the logic. Instead, we will forward the logic to JSON.parse. So whatever rule JSON implemented for :parse will take place only when the method :parse is called for an instance of Person.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
require 'json'

class Person
  attr_accessor :name, :age

  def method_missing(m, *args)
    if m == :parse
      data = JSON.parse(args[0])
      if data.keys.map{|d| self.respond_to? d}.uniq == [true]
        self.name = data["name"]
        self.age = data["age"]
        self.to_s
      end
    else
      super
    end
  end

  def to_s
    "Person => name: #{name}, age: #{age}"
  end
end

We can now call :parse in Person:

1
2
person.parse('{"name": "John", "age": "25"}')
# Person => name: John, age: 25

However, when we try to parse a different string, we obtain an error:

1
2
person.parse('{"name" => "John", "age" => "25"}')
# => unexpected token at '{"name" => "John", "age" => "25"}' (JSON::ParserError)

And that decision was made by the JSON’s implementation of :parse.

If we try something different than parse and it is a method that is not present in the list of methods of Person’s ancestors, then we obtain the default behavior for undefined methods:

1
2
person.infuse('{"name" => "John", "age" => "25"}')
# => undefined method `infuse' for #<Person:0x00007f9ba818a380>

Blank Slate

Now let’s assume that for some reason, I thought that it was a great idea to implement a Dynamic Proxy for any method call starting with “display” for a new class called MyNewClass. My goal is to return just the object ID. So I create MyNewClass as follows:

1
2
3
4
5
6
7
8
9
10
class MyNewClass

  def method_missing(m, *args)
    if m.to_s.include?("display")
      "Object ID: #{self.__id__}"
    else
      super
    end
  end
end

So I try calling the method :display_info:

1
2
3
obj = MyNewClass.new
obj.display_info
# => Object ID: 70138108065880

Everything seems to be working nicely, but not quite. When I try calling just :display, the following happens:

obj.display
# => #<MyNewClass:0x00007ffdce0165e8>

This is not what I was expecting. This happens because MyNewClass’s parent class is Object, and Object implements an instance method :display. Therefore, when I call :display, Ruby looks for a method :display in the list of methods, including Object. Ruby will find Object’s implementation of :display, which just prints the bare object and returns nil. This is a simple example of a problem that can occur very frequently when using Dynamic Proxy, especially in larger projects: the name of a “Ghost Method” can match the name of an existing method that belongs to one of the object’s class ancestors.

Most of the time, we need a fully-featured object with all the methods defined in Object. Some other times, we need some simpler. A class with a minimum number of methods is referred to as Blank Slate. One way to solve our problem is to modify MyNewClass to inherit from BasicObject instead of implicitly inheriting from Object.

The class Object has 58 instance methods:

Object.instance_methods
# => [:instance_variable_defined?, :remove_instance_variable, :instance_of?, :kind_of?, :is_a?, :tap, :instance_variable_get, :instance_variable_set, :instance_variables, :singleton_method, :method, :public_send, :define_singleton_method, :public_method, :extend, :to_enum, :enum_for, :<=>, :===, :=~, :!~, :eql?, :respond_to?, :freeze, :inspect, :object_id, :send, :to_s, :display, :nil?, :hash, :class, :singleton_class, :clone, :dup, :itself, :yield_self, :then, :taint, :tainted?, :untaint, :untrust, :untrusted?, :trust, :frozen?, :methods, :singleton_methods, :protected_methods, :private_methods, :public_methods, :equal?, :!, :__id__, :==, :instance_exec, :!=, :instance_eval, :__send__]

The class BasicObject has only 8 instance methods:

BasicObject.instance_methods
# => [:equal?, :!, :__id__, :==, :instance_exec, :!=, :instance_eval, :__send__]

More importantly, BasicObject does not implement :display. So we can modify MyNewClass as follows:

class MyNewClass < BasicObject

  def method_missing(m, *args)
    if m.to_s.include?("display")
      "Object ID: #{self.__id__}"
    else
      super
    end
  end
end

MyNewClass now inherits from BasicObject, which makes MyNewClass a Blank Slate. Of course, we lose most of the functionalities we would need for a more comprehensive class, including all functionality given by the Kernel. But for the sake of this illustration, with the modification above, we can now call all variations of “display”, including :display, and we will obtain the expected result:

1
2
3
obj = MyNewClass.new
puts obj.display
# => Object ID: 70183204985940

Code that Writes Code

I told you before that I don’t like the “code that writes code” definition of metaprogramming, but that doesn’t mean we can’t have fun with it. Here is a simple example of creating classes and instantiating an object for these classes dynamically. Imagine that I have two files: person.csv

name,age,gender,state
John,25,M,CO
Mary,23,F,CO
Gloria,20,F,FL
Paul,23,M,CA
Barb,26,F,TX
Jerry,29,M,TX

and product.csv

code,name,price
T252XL,Ink Cartridge,34.99
A320,Printer,299.32
532A,Monitor,345.62
9932,Mouse,32.95

I will write a code that will read the content of person.csv, create a class Person and define its attributes based on the first line of the file and then instantiate objects of Person with the data in the remainder of the file. In fact, the code will work for any csv file following the same pattern, therefore the same will ocurr for product.csv.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# process_csv.rb

files = Dir["*.csv"]

database = []

files.each do |filename|
  class_name = filename.split(".")[0].capitalize
  lines = File.readlines(filename)
  $attributes = lines[0].strip.split(",").map(&:to_sym)
  data = lines[1..-1].map{|d| d.strip.split(",")}

  new_class = Class.new(Object) do
    attr_accessor *$attributes

    def initialize(*args)
      $attributes.zip(args) do |attribute, value|
        instance_variable_set("@#{attribute.to_s}", value)
      end
    end
  end

  my_class = Object.const_set(class_name, new_class)

  collection = data.map do |d|
    new_class.new(*d)
  end
  database << {class: new_class, data: collection}
end

database.each do |db|
  puts db[:class]
  puts "==============================================================================="
  db[:data].each do |row|
    puts row.inspect
  end
  puts ""
end

Now I can run process_csv.rb, which returns the following:

Person
===============================================================================
#<Person:0x00007fb7a403bea8 @name="John", @age="25", @gender="M", @state="CO">
#<Person:0x00007fb7a403b930 @name="Mary", @age="23", @gender="F", @state="CO">
#<Person:0x00007fb7a403b188 @name="Gloria", @age="20", @gender="F", @state="FL">
#<Person:0x00007fb7a403a648 @name="Paul", @age="23", @gender="M", @state="CA">
#<Person:0x00007fb7a4039360 @name="Barb", @age="26", @gender="F", @state="TX">
#<Person:0x00007fb7a4038e88 @name="Jerry", @age="29", @gender="M", @state="TX">

Product
===============================================================================
#<Product:0x00007fb7a48d52d0 @code="T252XL", @name="Ink Cartridge", @price="34.99">
#<Product:0x00007fb7a48d4f88 @code="A320", @name="Printer", @price="299.32">
#<Product:0x00007fb7a48d4ba0 @code="532A", @name="Monitor", @price="345.62">
#<Product:0x00007fb7a48d4768 @code="9932", @name="Mouse", @price="32.95">

Refactoring with Metaprogramming

Now that we have seen some of the basics of metaprogramming in Ruby let’s review a very interesting example Perrotta discusses in his book (slightly modified here for simplicity). Imagine that you are analyzing a very strange legacy Ruby code full of duplications. Your task is to improve it as much as possible. You receive two files: data_source.rb and duplicated.rb. The data_source.file is partially shown below:

1
2
3
4
5
6
7
8
9
10
11
12
# data_source.rb

class DS
  def initialize # ...
  def get_cpu_info(workstation_id) # ...
  def get_cpu_price(workstation_id) # ...
  def get_mouse_info(workstation_id) # ...
  def get_mouse_info(workstation_id) # ...
  def get_keyboard_info(workstation_id) # ...
  def get_keyboard_price(workstation_id) # ...
  # ... etc
end

The exact logic of DS is suppressed in the display. Just assume that when you pass a workstation_id as an argument to one of the methods in DS, DS will connect to a database and return the required information:

1
2
3
4
5
ds = DS.new
ds.get_cpu_info(42)     # => "2.9 Ghz quad-core"
ds.get_cpu_price(42)    # => 120
ds.get_mouse_info(42)   # => "Wireless Touch"
ds.get_mouse_price(42)  # => 60

And here is duplicated.rb:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Computer
  def initialize(computer_id, data_source)
    @id = computer_id
    @data_source = data_source
  end

  def cpu
    info = @data_source.get_cpu_info(@id)
    price = @data_source.get_cpu_price(@id)
    "CPU: #{info} ($#{price})"
  end

  def mouse
    info = @data_source.get_mouse_info(@id)
    price = @data_source.get_mouse_price(@id)
    "Mouse: #{info} ($#{price})"
  end

  def keyboard
    info = @data_source.get_keyboard_info(@id)
    price = @data_source.get_keyboard_price(@id)
    "Keyboard: #{info} ($#{price})"
  end

  # ...
end

You know where this is going, right? You can now identify the duplications and how we can use the strategies we previously discussed to improve this code. First, we can use Dynamic Methods and Dynamic Dispatch:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Computer
  def initialize(computer_id, data_source)
    @id = computer_id
    @data_source = data_source
    data_source.methods.grep(/^get_(.*)_info$/) { Computer.define_component $1 }
  end

  # Added explanation:
  # Notice that we just need the name of the resource
  # so it suffices to get the name from get_*_info methods since get_*_price
  # repeats the name of the resource.
  # The $1 is just a global variable that words as a type of placeholder for
  # a later use.

  def self.define_component(name)
    define_method(name) do
      info = @data_source.send "get_#{name}_info", @id
      price = @data_source.send "get_#{name}_price", @id
      "#{name.capitalize}: #{info} ($#{price})"
    end
  end
end

Second, we can use Ghost Methods, and a Dynamic Proxy that is also a Blank Slate:

1
2
3
4
5
6
7
8
9
10
11
12
13
class Computer < BasicObject
  def initialize(computer_id, data_source)
    @id = computer_id
    @data_source = data_source
  end

  def method_missing(name, *args)
    super if !@data_source.respond_to?("get_#{name}_info")
    info = @data_source.send "get_#{name}_info", @id
    price = @data_source.send "get_#{name}_price", @id
    "#{name.capitalize}: #{info} ($#{price})"
  end
end

And so, we used all four strategies for metaprogramming in Ruby that we discussed in this post. Notice the method :respond_to? in the Computer’s implementation of :method_missing. When an object calls :respond_to?, Ruby will respond if that object implements the method passed as an argument. You could ask: “But isn’t the idea of :method_missing to dynamically implement a method that does not exist?” Correct. However, we are implementing the logic of method missing in Computer and checking if an associated method exists in DS. We need that method to exist in DS to make this logic work; therefore, we first check if the method exists in DS, and if not, we call the original implementation of :method_missing. Otherwise, we will continue with our implementation.

There is More

I briefly discussed metaprogramming strategies with Ruby in this post, such as Dynamic Dispatch, Dynamic Methods, Ghost Methods, Dynamic Proxy, and Blank Slate. Paolo Perrotta refers to these strategies as “spells.” In his book, many other spells are discussed: Around Alias, Class Extension, Class Instance Variable, Class Macro, Clean Room, Code Processor, Deferred Evaluation, Flat Scope, Hook Method, Kernel Method, Lazy Instance Method, Mimic Method, Monkey Patch, Namespace, Nil Guard, Object Extension, Open Class, Prepend Wrapper, Refinement, Refinement Wrapper, Sandbox, Scope Gate, Self Yield, Shared Scope, Singleton Method, String of Code, and Symbol to Proc. Trust me: I didn’t even scratch the surface. There is much more to metaprogramming in Ruby.

Conclusions

Ruby is a dynamic language by design. Its syntax is concise and elegant, and its constructs are available for meaningful manipulations, which takes object-oriented programming to its full potential and makes metaprogramming in Ruby a delightful experience. For this reason, I find Ruby the best language for prototyping I know.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Bulding and Testing a C++ Project with Bazel
  • The Balanced Brackets Problem - The Ruby Way
  • Breaking the Chain - The Unspoken Truth About Bad Attitudes in the Workplace and How to Eradicate Them for Good