Ruby DSL & metaprogramming, part I

I’ve been working with Ruby for nearly a year now, which means I’m starting to feel the urge to tell people how awesome the language is. One of the most interesting aspects of Ruby to me is metaprogramming, which it seems to have quite a vocation for.

Since college I have a fondness for automata and formal languages theory. One of the topics I particularly like is text generation (if you haven’t already, check out the excellent SCIgen and the Dada engine), so I thought that building a Context-free grammar (CFG)-like text generator in Ruby would be a nice little exercise and an opportunity to use some of the language’s coolest features. Also I’ve implemented one of those using Java several years ago, and it was a mess, so I was curious as to how much of an improvement would Ruby offer.

Suppose the following script:

dictionary 'noun', 'dog', 'bus'
dictionary 'verb', 'barked', 'parked'
dictionary 'preposition', 'at'

rule 'phrase', 'noun', 'verb', 'preposition', 'noun'

codex 'phrase'

We’d like dictionary to store some words according to their classes, and rule to define a specific ordering of words. For now let’s not worry about codex (it’s just a collection of rules).

At this point the seasoned programmer is mentally sketching some kind of text parser. It’s an okay solution, but isn’t there something nicer we can do? Well, there is: DSLs! In fact, Ruby is quite an excellent tool to build a DSL, and many famed Ruby-powered applications such as Rspec (and many others) define some kind of DSL.

Conveniently enough, our little script is actually valid Ruby code (Ruby doesn’t care for parenthesis or semicolons). So lets define the dictionary, rule and codex methods:

#!/usr/bin/env ruby
require_relative 'grammar'
require_relative 'dictionary'
require_relative 'rule'
require_relative 'codex'

@grammar = Grammar.new

def dictionary key, *values
  puts "Read dictionary with: #{key} #{values.to_s}"
  Dictionary.instance.add key, values
end

def rule name, *keys
  puts "Read rule with: #{name} #{keys.to_s}"
  @grammar.rules[name] = Rule.new keys
end

def codex *rulenames
  puts "Read codex with: #{rulenames.to_s}"
  @grammar.codex << (Codex.new rulenames)
end

load 'script.le'
@grammar.generate

Notice the asterisks in the method definitions; they’re called splat operators (good reference here and here). Splats are useful for several things; in our case, we’d like each dictionary entry to have one key (e.g. ‘noun’) and several values: splat takes an arbitrary number of arguments and slurps them into one variable.

So the basic structure is: a Grammar has one Dictionary and several Rules and Codices. Text is generated by going running each codex, which applies each of its rules, which combines words together. Simple enough. Here’s what output looks like:

lbrito@lbrito:~/Documents/ruby_textgen$ ruby lero.rb
Read dictionary with: noun ["dog", "bus"]
Read dictionary with: verb ["barked", "parked"]
Read dictionary with: preposition ["at"]
Read rule with: phrase ["noun", "verb", "preposition", "noun"]
Read codex with: ["phrase"]
Codex is applying phrase
Applying rule with key: noun
Fetching noun from dictionary
Applying rule with key: verb
Fetching verb from dictionary
Applying rule with key: preposition
Fetching preposition from dictionary
Applying rule with key: noun
Fetching noun from dictionary
Final result:
=======
bus barked at dog
=======

Nonsensical, but still pretty cool.

Let’s take a look at our DSL script for a while. It works fine, but isn’t very DRY. Wouldn’t it be nice to add some structure and be less repetitious? Let’s try to define Dictionary and Rule entries as functions:

dictionary
  noun 'dog', 'bus'
  verb 'barked', 'parked'
  preposition 'at'

rule
  phrase 'noun', 'verb', 'preposition', 'noun'

codex 'phrase'

Much better. But what if we wanted different nouns to belong in different dictionary entries, say, animal_nouns and vehicle_nouns? Are we supposed to define every possible word class as a separate method? That’s not very DRY!

Here’s where things start getting beautiful: Ruby has a method called method_missing (good material here and here), which by default raises a NoMethodError, but can be overridden to do some cool stuff. Let’s leverage the power of method_missing:

...

def dictionary
  @state = :dictionary
end

def rule
  @state = :rule
end

def codex *rulenames
  puts "Read codex with: #{rulenames.to_s}"
  @grammar.codex << (Codex.new rulenames)
end

@state = :idle

def method_missing method, *args, &block
  case @state
  when :dictionary
    puts "Read dictionary with: #{method.to_s} #{args.to_s}"
    Dictionary.instance.add method.to_s, args
  when :rule
    puts "Read rule with: #{method.to_s} #{args.to_s}"
    @grammar.rules[method.to_s] = (Rule.new args)
  when :idle
  else
    puts "Boom! Something went wrong. I don't know what to do with #{@state.to_s}."
  end
end

Now method_missing captures undefined methods and the appropriate entries are created depending on which function was last called (i.e. dictionary or rule). This relieves us from defining methods for noun, verb, etc. Pretty great, but we can do better. We still have some very trivial-looking method definitions in our DSL:

def dictionary
  @state = :dictionary
end

def rule
  @state = :rule
end

What if we decide later on to add some new functionality to our DSL that works analogously to rules and dictionaries? We’d have to write more method definitions that just set the @state variable. Once more, Ruby offers us some DRYing magic: we can dynamically define methods just as if they were objects.

...

%w'rule dictionary'.each do |keyword|
  define_method(keyword) { @state = keyword.to_sym }
end

The resulting methods are identical to the ones we defined statically with def. This may not seem like a great advantage over our previous code, but imagine if there were not 2 but 20 similar methods which could be dynamically defined - quite an improvement! There’s still room for improvement using define_method. Instead of storing strings in a Dictionary, we might as well define a function that chooses an entry at random. Here’s the Dictionary class:

require 'singleton'

class Dictionary
  include Singleton


  def add key, values
    @dicts ||= {}
    @dicts[key] = values
  end

  def fetch key
    puts "Fetching #{key} from dictionary"
    @dicts[key].sample
  end

end

We can eliminate the need for that entire class by defining methods dynamically using a single line of code:

...

def method_missing method, *args, &block
  case @state
  when :dictionary
    puts "Read dictionary with: #{method.to_s} #{args.to_s}"
    define_method(method) { args.sample }

...

If you’re wondering, sample picks an array element at random.

So that’s that: we successfully used some of Ruby’s core metaprogramming tools, method_missing and define_method, to improve our little program. Full code is hosted on Github - please not that it may differ slightly from what is presented here for didactic reasons.

In the next installment we’ll continue to improve our DSL using more interesting Ruby features.