While reading through Real World Haskell a short code snippet got me thinking about folds. Chapter 4, devoted to functional programming in general, uses a brief example built around the tails function to illustrate Haskell's as-patterns. This function makes it's home in the Data.List module and, given an input string s, will return a list of all results of tail x where x is a sublist of s. It's a bit convoluted to describe in prose, but the example offered by O'Sullivan et. al. makes it clear:
[@varese ~]$ ghci
...
*Main> :m +Data.List
*Main Data.List> tails "foobar"
["foobar","oobar","obar","bar","ar","r",""]
Maybe it was the fact that this function was introduced just after a lengthy discussion on left and right folds in the same chapter. Maybe it was due to an uptick in my use of folds when working with Scala. Whatever the cause, a thought grabbed hold of me: this function should be easy to implement by folding, shouldn't it? Unfortunately my previous attempt to sit down with Haskell code didn't quite work out as expected. Throw in a new FC 15 install (and all the goodies it contains) and there was no end of temptation for the unfocused. Fortunately good sense prevailed: GHC was installed, Chicken Scheme was not and it was time to start.
My intuition turned out to be correct; a version using foldl came together fairly quickly:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mytailsl x = (foldl f1 [] x) ++ [""] | |
where f1 acc newchar = (map (f2 newchar) acc) ++ [[newchar]] | |
where f2 currchar currstr = reverse (currchar : (reverse currstr)) |
The overall algorithm used here is fairly straightforward. Beginning with an empty list we move left to right and do two things with every character c we find:
- Create a new string in our accumulator list containing c
- Add c to every string already in the accumulator
Visualizing the "left to right" evaluation of foldl gives us something like the following:
- r1 = append 'f' to every string in [] and add string 'f'
- r2 = append 'o' to every string in r1 and add string 'o'
- r3 = append 'o' to every string in r2 and add string 'o'
The result is exactly what we expect:
*Main> mytailsl "foobar"
["foobar","oobar","obar","bar","ar","r",""]
So we now have foldl... what about foldr? At this point I must offer a confession; left folds have always seemed intuitive to me while right folds most definitely have not. A left fold appears to be equivalent to a simple traversal of a list from "start" to "finish", gathering data as you move along. A right folds seems artificial somehow, like some kind of inside-out list traversal. And while a simple mental model for right folds remains somewhat elusive working through this sample did ease the pain somewhat. It took a bit of work, but after getting a handle on the order of evaluation a working implementation emerged:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mytailsr x = (foldr f1 [] x) ++ [""] | |
where f1 newchar [] = [[newchar]] | |
f1 newchar acc = [newchar:(head acc)] ++ acc |
This algorithm is a mirror image of the one used with foldl; since we're moving right to left we have to build member strings from back to front. For every new character c we build a new string by appending c to the largest string in the accumulator. We then return a new list with the new string for a head and the old accumulator for a tail. Note that we have to account for the empty list case when evaluating the first expression. Using this version on our test string we get:
*Main> mytailsr "foobar"
["foobar","oobar","obar","bar","ar","r",""]
Interestingly, while struggling to come to terms with right folds a related function presented itself. Taking the algorithm used in the left fold case and applying it directly to a right fold yields a function that might be called antitails. More precisely, for an input string s this function returns a list of strings (s - t) for every t in tails s.
*Main> reverse (antitails "foobar")
["","f","fo","foo","foob","fooba","foobar"]
*Main> mytailsl "foobar"
["foobar","oobar","obar","bar","ar","r",""]
The logic behind this operation looks something like the following:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
antitails x = reverse (foldr f1 [] x) ++ [""] | |
where f1 newchar acc = [[newchar]] ++ (map (newchar:) acc) |
Finally, it should be noted that the actual implementation of tails in GHC appears to be a recursive definition built around... as-patterns.