Tuesday, July 10, 2012

Ruby, Perl and Eloquence

In an attempt to make my Ruby code a bit more idiomatic I've been spending a bit of time recently with Russ Olsen's excellent Eloquent Ruby. There are many reasons to love writing Ruby code, not least of which is that Ruby deploys the same terse but expressive power of Perl while employing better overall principles of programming. The effect isn't universal; on occasion my problems with Ruby look quite a bit like my problems with Perl. Given the overall elegance of the language it seems likely that there's a "better" (or at least more idiomatic) way to accomplish my goal. And so I turn to Eloquent Ruby.

As an example of this tension consider the following example.

Perl has a well-deserved reputation for efficiently processing text files with regular expressions. We'll consider an example from another text I've been spending a bit of time with: Hofstadter's seminal Godel, Escher, Bach. A simple implementation of the productions of the MIU system in Perl [1] might look like the following:

Reasonable enough, but there's a lot of magic going on here. We're relying on the "magic" variable $_ to access the current line, and to make things worse we have to obtain those lines using the INFILE identifier that only has meaning due to a side effect of the open() call [2]. There's also those "magic" $1 and $2 variables for accessing capture groups in a regex.

The Ruby version is both slightly shorter and a bit cleaner:

We've made some nice strides here. The use of File.new() allows us to avoid open() and it's side effects. The use of a code block allows us to remove the global $_ in favor of a scoped variable line.

But we're still stuck with $1 and $2 for those capture groups.

One can imagine an elegant object-oriented solution based on match objects. Any such implementation would have to accomplish three things:
  1. The match object will be used as the condition of an if/unless expression so nil should be returned if there's no match
  2. The match object should be bound to a variable name in scope
  3. References to capture groups in the if-clause should use the scoped variable rather than the $1,$2, etc.
But remember, this exercise is only useful if we don't have to compromise on elegance. If all we're after is an explicit object-oriented solution we could go with the Python version:

That's probably not what we want. [3]

After pondering this question for a bit we realize we may not be in such a bad spot. /regex/.match(str) already returns nil if there is no match so our first requirement is satisfied. Assignment is just another expression, so our match object (or nil) will be returned to the if-expression test, helping us with our second goal. And match objects provide access to capture groups using []. So long as the assigned variable is in scope we should have everything we need. A bit of scuffling [4] brings us to the following:

This example is free of any "magic" variables, although we have sacrificed a bit on the clarity front. It's also worth noting that we could have accomplished something very similar in Perl:

This implementation is hardly idiomatic. It's also quite a bit less clear than our earlier efforts in the language.

Where does this leave us? Do we keep in touch with our Perl roots and live with $1 in order to keep things terse and expressive? Do we sacrifice a bit of clarity and go with an object-oriented approach? Or do we do something else entirely?

Answers to this question (and others like it) are what I'm hoping to get out of Eloquent Ruby.

[1] We're ignoring nasty things like error handling and complex edge cases in order to keep the conversation focused.

[2] We could use lexical file handles here but that doesn't really change the underlying point. Even in that case we still have to call open() in order for $fh to be useful.

[3] Python does a lot of things very, very well, but this solution to this problem seems unnecessarily verbose.

[4] The requirement to declare foo in advance when using the modifier form of if was a bit surprising. Shifting to an if expression removed this requirement. The upcoming Perl version also didn't require this advance declaration when using an equivalent to the modifier form. An MRI quirk, perhaps?

1 comment:

  1. I realize this is an older post, but maybe this question will be seen, anyhow. It seems to me that this code doesn't correctly handle the cases where rules three and four may have multiple applicable matches. For instance, the first perl script, when run with an input file containing just MUUIUU prints just


    but not MIUU. How could this code be modified (while maintaining the simplicity) to account for multiple matches?