Heuristic Fencepost: January 2011

Before we proceed permit me a word or two of explanation.

A few weeks ago I began working through Beautiful Code and almost immediately it became clear that this was going to be a problem. The first chapter documents an implementation of regular expressions in C, yet pondering the ideas contained within this code led to an digression on pointers and reference types in Go. That digression forms the meat of the post that follows. A quick review of the remaining chapters suggests that this probably isn't a one-time thing. This post likely represents the first of a series; thus the title. We'll see how that works out.

Thanks for listening. We now return you to your regularly scheduled blog post.

The first chapter of Oram & Wilson's work contains an analysis by Brian Kernighan (yes, that Brian Kernighan) of an elegant bit of C code. The code in question was written by Rob Pike a little over a decade ago as a simple yet useful example for The Practice of Programming. The code combines a number of C's features (strings as pointers to the first character in an array of characters, null-terminated strings, pointer arithmetic) with recursion to create a compact and elegant implementation of a simple regular expression syntax. Kernighan observes that:

"Rob's implementation itself is a superb example of beautiful code: compact, elegant, efficient and useful. It's one of the best examples of recursion that I have ever seen, and it shows the power of C pointers."

That's not hyperbole, folks.

The algorithm implemented by this code is subtle without being cryptic. In order to determine whether a given regex matches a string s with length n and consisting of characters s(0) through s(n - 1) we compare the regex to every substring of s that also ends in s(n - 1). C uses null-terminated character arrays to represent strings and a pointer to the first character of the array to refer to the string so we can enumerate all relevant substrings of s simply by incrementing the pointer to the first character via pointer arithmetic and recursively calling our comparison function with every substring generated by this process.

Go also supports pointers in order to provide call-by-reference semantics so in theory the same algorithm could be implemented in this language as well. Attempting to do so seemed like an interesting experiment, not least because Rob is somewhat involved with the Go language. Unfortunately according to the specification Go does not allow pointers to refer to individual characters in a string and it isn't clear whether pointer arithmetic is supported in any capacity [1]. All is not lost, however; Go does include a reference type for arrays and their sub-arrays in the form of slices. Using slices we can perform the processing described above in an (arguably) more explicit syntax. The results can be found on github.

A curious side effect: Python also supports slices on lists so in the end our implementation looks quite a bit like the Python implementation of this algorithm.

[1] If you don't support referencing individual array elements a lot of the arguments for pointer arithmetic don't make quite as much sense.

Okay, it's not actually alchemy. But it is pretty cool nonetheless.

A brief refresher on terminology: a single abstract method (or SAM) interface or abstract method contains exactly one method that concrete implementations must define. Java includes a number of interfaces that satisfy this property, including Runnable, Callable and Comparable.

The basic idea at play here is that a closure or anonymous function (either will do here) can be used in place of a SAM implementation if the parameters and return type of the closure match up to those of the method. That assumes your language cares about such things, of course; duck typing makes this less of an issue (as we'll see).

JRuby automatically performs this operation via closure conversion (scroll down to the bottom of the page). Stuart Sierra has recently published a macro for doing something similar in Clojure. Even Java is considering including this feature in an eventual closure implementation (see Brian Goetz's writeup for details). Why should Scala miss out on all the fun?

Let's take a look at some code to make this discussion more tangible. A simple example of closure conversion in JRuby can be found here. This is where duck typing helps us out; the only real requirement on our closure is that we actually return something (in order to clearly distinguish between Runnable and Callable). Our objective is to implement a Scala unit test that does something similar. Any such approach will be built around Scala's support for implicit conversion of types, but in this case a bit of care and feeding is required to line up the parameters and return types of the closure with that of the contents of the SAM interface. The basic approach works as follows:

The implicit conversion accepts a function with the same parameter list and return value as the lone method in the SAM interface

The conversion then returns a new concrete instance of the SAM interface. The implementation of the method doesn't need to be anything other than invoking apply() on the input function

The resulting ScalaTest class can be found here.

Heuristic Fencepost

Thursday, January 20, 2011

Beautiful Code 1: On Pointers in Go and Regular Expressions

Friday, January 7, 2011

Transmutation of Scala closures into SAM instances