Ruby DSL & metaprogramming, part II

In the previous installment we built a simple text generator using some Ruby meta-programming tricks. It was still far from being our desired context-free grammar (CFG) generator, though, since it lacked many CFG prerequisites. Most flagrantly, we had no rule recursion and only one production (rule definition) per rule. Here鈥檚 the what a script that would use both features:

dictionary
  noun 'dog', 'bus'
  verb 'barked', 'parked'
  preposition 'at'

rule 'phrase'
  opt 'The', noun, verb, preposition, 'a', noun
  opt 'Here goes some', phrase, 'recursion.'
  opt 'Meet me', preposition, 'the station.'

grammar phrase: 10

The dictionary section is just as we left it. Let鈥檚 see what changed in the rule section.

Previously we had only one production per rule, so rule definitions such as phrase were captured by the method_missing method. This design would make multiple productions difficult to handle. Here鈥檚 how we re-implemented the rule method:

def rule *args
  verbose "Read rule: #{args.to_s}"
  @last_rule = args.first.to_s
  @grammar.rules[@last_rule] = (Rule.new @last_rule)
  define_method(args.first.to_s) { @grammar.rules[args.first.to_s] }
  @state = :rule
end

Once more we use define_method to dynamically define methods. Consider the rule 'phrase' statement present in our script: this would define a method named phrase which hopefully returns the Rule object within @grammar.rules['phrase']. Note that the returned rule is not evaluated (i.e., it is still a Rule object, not a String object).

Now we keep track of the @last_rule so rule productions (options) are added to the appropriate rule. Options are captures by opt:

def opt *args
  if @state == :rule
    verbose "Read option for rule #{@last_rule}: #{args.to_s}"
    @grammar.rules[@last_rule].options << args
  end
end

Here, args is an array of Rule production symbols (both terminal and non-terminal, i.e., both Strings and Rules). The set of Rule options will ultimately be an Array of Arrays of Rule production symbols corresponding to each opt line written in the DSL script (e.g. in the example above, phrase would have 3 options).

Rules are evaluated by Grammar.generate, which receives a Hash of rules and the amount of times they should be generated (e.g. phrase: 10 in our example):

  def generate args
    text = ''
    args.each do |rulename, qty|
      (1..qty).each { text << @rules[rulename.to_s].to_s }
    end
    puts "Final result: \n========\n#{text}\n========"
  end

How does Rule recursion work, though? Let鈥檚 take a look at the to_s method in Rule:

  def to_s
    randkeys = options.sample
    randkeys.map! { |k| k.to_s }
    verbose "Applying rule #{@name} with keys: #{randkeys}."
    randkeys.join(" ")
  end

Pretty straightforward: a production is chosen at random (e.g., 1 of the 3 options in our example), and each symbol in the production is evaluated into a String and concatenated into the final result. For example, say the first rule, opt 'The', noun, verb, preposition, 'a', noun, is chosen. Then randkeys.map! would call to_s for each key in the production: 'The'.to_s, noun.to_s, etc. Recursion will happen if the key is a method that returns a Rule object (such as the phrase method we mentioned in the beginning of the post).

Let鈥檚 try out a CFG classic: well-formed parenthesis. Here鈥檚 the script:

rule 'par'
  opt '()'
  opt '(', par, ')'
  opt par, par

grammar par: 1

And here鈥檚 some sample output:

$ ruby lero.rb examples.le
Final result:
========
( ( ( ( () ) ) () ) () )
========
$ ruby lero.rb examples.le
Final result:
========
( () )
========
$ ruby lero.rb examples.le
Final result:
========
() ()
========

And now we鈥檙e done! With only 2 classes (Grammar and Rule) and 1 additional file that defines a DSL (lero.rb), we were able to build a CFG-like text generator with the most important CFG properties.

Full code is available in the same repository.