Tuesday, August 2, 2011

A Short Folding Note

Or maybe that's "a short note on folding"

While reading through Real World Haskell a short code snippet got me thinking about folds. Chapter 4, devoted to functional programming in general, uses a brief example built around the tails function to illustrate Haskell's as-patterns. This function makes it's home in the Data.List module and, given an input string s, will return a list of all results of tail x where x is a sublist of s. It's a bit convoluted to describe in prose, but the example offered by O'Sullivan et. al. makes it clear:

[@varese ~]$ ghci
*Main> :m +Data.List
*Main Data.List> tails "foobar"

Maybe it was the fact that this function was introduced just after a lengthy discussion on left and right folds in the same chapter. Maybe it was due to an uptick in my use of folds when working with Scala. Whatever the cause, a thought grabbed hold of me: this function should be easy to implement by folding, shouldn't it? Unfortunately my previous attempt to sit down with Haskell code didn't quite work out as expected. Throw in a new FC 15 install (and all the goodies it contains) and there was no end of temptation for the unfocused. Fortunately good sense prevailed: GHC was installed, Chicken Scheme was not and it was time to start.

My intuition turned out to be correct; a version using foldl came together fairly quickly:

The overall algorithm used here is fairly straightforward. Beginning with an empty list we move left to right and do two things with every character c we find:

  • Create a new string in our accumulator list containing c

  • Add c to every string already in the accumulator

Visualizing the "left to right" evaluation of foldl gives us something like the following:

  1. r1 = append 'f' to every string in [] and add string 'f'

  2. r2 = append 'o' to every string in r1 and add string 'o'

  3. r3 = append 'o' to every string in r2 and add string 'o'

The result is exactly what we expect:

*Main> mytailsl "foobar"

So we now have foldl... what about foldr? At this point I must offer a confession; left folds have always seemed intuitive to me while right folds most definitely have not. A left fold appears to be equivalent to a simple traversal of a list from "start" to "finish", gathering data as you move along. A right folds seems artificial somehow, like some kind of inside-out list traversal. And while a simple mental model for right folds remains somewhat elusive working through this sample did ease the pain somewhat. It took a bit of work, but after getting a handle on the order of evaluation a working implementation emerged:

This algorithm is a mirror image of the one used with foldl; since we're moving right to left we have to build member strings from back to front. For every new character c we build a new string by appending c to the largest string in the accumulator. We then return a new list with the new string for a head and the old accumulator for a tail. Note that we have to account for the empty list case when evaluating the first expression. Using this version on our test string we get:

*Main> mytailsr "foobar"

Interestingly, while struggling to come to terms with right folds a related function presented itself. Taking the algorithm used in the left fold case and applying it directly to a right fold yields a function that might be called antitails. More precisely, for an input string s this function returns a list of strings (s - t) for every t in tails s.

*Main> reverse (antitails "foobar")
*Main> mytailsl "foobar"

The logic behind this operation looks something like the following:

Finally, it should be noted that the actual implementation of tails in GHC appears to be a recursive definition built around... as-patterns.