Thursday, January 20, 2011

Beautiful Code 1: On Pointers in Go and Regular Expressions

Before we proceed permit me a word or two of explanation.

A few weeks ago I began working through Beautiful Code and almost immediately it became clear that this was going to be a problem. The first chapter documents an implementation of regular expressions in C, yet pondering the ideas contained within this code led to an digression on pointers and reference types in Go. That digression forms the meat of the post that follows. A quick review of the remaining chapters suggests that this probably isn't a one-time thing. This post likely represents the first of a series; thus the title. We'll see how that works out.

Thanks for listening. We now return you to your regularly scheduled blog post.

The first chapter of Oram & Wilson's work contains an analysis by Brian Kernighan (yes, that Brian Kernighan) of an elegant bit of C code. The code in question was written by Rob Pike a little over a decade ago as a simple yet useful example for The Practice of Programming. The code combines a number of C's features (strings as pointers to the first character in an array of characters, null-terminated strings, pointer arithmetic) with recursion to create a compact and elegant implementation of a simple regular expression syntax. Kernighan observes that:

"Rob's implementation itself is a superb example of beautiful code: compact, elegant, efficient and useful. It's one of the best examples of recursion that I have ever seen, and it shows the power of C pointers."


That's not hyperbole, folks.

The algorithm implemented by this code is subtle without being cryptic. In order to determine whether a given regex matches a string s with length n and consisting of characters s(0) through s(n - 1) we compare the regex to every substring of s that also ends in s(n - 1). C uses null-terminated character arrays to represent strings and a pointer to the first character of the array to refer to the string so we can enumerate all relevant substrings of s simply by incrementing the pointer to the first character via pointer arithmetic and recursively calling our comparison function with every substring generated by this process.

Go also supports pointers in order to provide call-by-reference semantics so in theory the same algorithm could be implemented in this language as well. Attempting to do so seemed like an interesting experiment, not least because Rob is somewhat involved with the Go language. Unfortunately according to the specification Go does not allow pointers to refer to individual characters in a string and it isn't clear whether pointer arithmetic is supported in any capacity [1]. All is not lost, however; Go does include a reference type for arrays and their sub-arrays in the form of slices. Using slices we can perform the processing described above in an (arguably) more explicit syntax. The results can be found on github.

A curious side effect: Python also supports slices on lists so in the end our implementation looks quite a bit like the Python implementation of this algorithm.

[1] If you don't support referencing individual array elements a lot of the arguments for pointer arithmetic don't make quite as much sense.

No comments:

Post a Comment