Heuristic Fencepost: December 2010

Tuesday, December 28, 2010

base64 in Clojure: Once more, with feeling

In a previous post we sketched out a simple base64 implementation in Clojure and measured it's performance against clojure-contrib and commons-codec. Our initial implementation used the partition function to split the input into blocks and then applied a sequence of interdependent functions to translate each block into base64-encoded data. This approach worked well enough but it's not terribly efficient; it traverses the input data once to do the partitioning and then that same data is traversed again (in block form) when the encoding function is applied. There's no reason to make multiple passes if we attack the problem recursively rather than iteratively. Time for some further investigation.

Clojure optimizes recursive calls when the loop/recur constructs are used so we deploy them here. In addition we make significant use of destructuring. Each input chunk of data is routed to the appropriate function via a sequence of cond expressions. These functions are implemented as let statements which break the chunk into bytes (via destructuring) and then create bindings for the base64 characters representing these bytes by applying the relevant bit manipulations. The body of the let doesn't need to do much beyond combining these character bindings into an appropriate return value.

This refactoring offers the following benefits:

The code is a lot cleaner and much easier to read and understand

Using the same techniques we should be able to easily implement a decoding function

We should see some increase in performance

So, how'd we do? The following is fairly representative:


bartok ~/git/base64-clojure $ clojure fencepost/test/compare_base64.clj 
Encoding
Commons-codec
"Elapsed time: 85.219348 msecs"
Xm91U1RMbHRAK1ZCcWFqeEsmJEJrfkhIdSsxfStJdGBNc2txUTBkUyQrdCM7Wk9rInN2XFBtKGNTX002aw==
clojure-contrib
"Elapsed time: 896.538841 msecs"
Xm91U1RMbHRAK1ZCcWFqeEsmJEJrfkhIdSsxfStJdGBNc2txUTBkUyQrdCM7Wk9rInN2XFBtKGNTX002aw==
fencepost/base64_letfn
"Elapsed time: 315.582591 msecs"
Xm91U1RMbHRAK1ZCcWFqeEsmJEJrfkhIdSsxfStJdGBNc2txUTBkUyQrdCM7Wk9rInN2XFBtKGNTX002aw==
fencepost-recur
"Elapsed time: 215.516695 msecs"
Xm91U1RMbHRAK1ZCcWFqeEsmJEJrfkhIdSsxfStJdGBNc2txUTBkUyQrdCM7Wk9rInN2XFBtKGNTX002aw==

The recursive encoding operation is just north of a third faster. We still take about twice as long as commons-codec (an improvement on our previous performance) but we're now over four times faster than clojure-contrib.

As referenced above a decoding function was also completed using the same techniques described for the recursive encoding function. The performance increase for that operation was just as stark:


Decoding
Commons-codec
"Elapsed time: 33.148648 msecs"
^ouSTLlt@+VBqajxK&$Bk~HHu+1}+It`MskqQ0dS$+t#;ZOk"sv\Pm(cS_M6k
clojure-contrib
"Elapsed time: 1494.2707 msecs"
^ouSTLlt@+VBqajxK&$Bk~HHu+1}+It`MskqQ0dS$+t#;ZOk"sv\Pm(cS_M6k
fencepost-recur
"Elapsed time: 268.112751 msecs"
^ouSTLlt@+VBqajxK&$Bk~HHu+1}+It`MskqQ0dS$+t#;ZOk"sv\Pm(cS_M6k

Still nowhere near as fast as commons-codec but we're a bit better than five times faster than clojure-contrib.

As always code can be found on github.

Monday, December 20, 2010

A Funny Thing Happened While Writing Some Haskell....

Suppose that you've just finished Graham Hutton's solid introduction to Haskell and now want to take the language for a quick test drive. You decide to stick with something familiar; after noting that the Prelude doesn't have an equivalent to Python's range function you decide to whip one up yourself. Haskell's syntax is still a bit unfamiliar, however, so you decide to implement this function in a different "mostly functional" language that you have some experience with. You've been looking for a reason to dip your toes into Scheme again (and it seems to be a good fit here [1]) so you install Gambit Scheme and Chicken Scheme and dig in.

Perhaps you begin with a straightforward recursive definition in Scheme:


(define (range start stop step)
 (if (< start stop)
  (cons start (range (+ start step) stop step))
  '()))

This works well enough when start < stop but fails completely when start > stop and step is a negative value (a use case supported by Python's range()). To fix this problem we define two procedures via letrec and use whichever is appropriate based on the passed arguments:


(define (newrange start stop step)
  (letrec 
   ((upto (lambda (art op ep) 
     (if (< art op)
      (cons art (upto (+ art ep) op ep)) 
      '())))
    (downto (lambda (art op ep)
     (if (> art op)
      (cons art (downto (+ art ep) op ep)) 
      '()))))
 (if (< start stop)
  (upto start stop step)
  (downto start stop step))))

This function now works for both start < stop (with positive step) and start > stop (with negative step). There's still a problem, though; if we change the sign of step in either of these cases we find ourselves in an infinite loop. Python's range() returns an empty list in this case so we probably should too:


(define (newerrange start stop step)
 (letrec
  ((upto (lambda (art op ep)
    (if (< art op)
     (cons art (upto (+ art ep) op ep)) 
     '())))
   (downto (lambda (art op ep)
    (if (> art op)
     (cons art (downto (+ art ep) op ep)) 
     '()))))
    (cond ((and (< start stop) (< step 0)) '())
          ((and (> start stop) (> step 0)) '())
          ((< start stop) (upto start stop step))
          (else (downto start stop step)))))

While working on these implementations you discover that SRFI-1 includes a function iota which looks similar to Python's range(), although this function takes the number of elements in the return list rather than the start/stop/step values we're looking for. Still, both Chicken Scheme and Gambit Scheme support SFRI-1 as extensions or modules and we should be able to whip up a thin wrapper around iota to get what we need. Of course we first need to figure out how to load up these modules. And after some initial experimentation we see some curious behaviour in the iota implementation on both platforms...

But wait... weren't we supposed to be writing some Haskell code?

Oh, yeah.

Haskell's guarded equations allow for an easy expression of the same logic we were getting from cons in Scheme. We have no need to translate the recursive logic found in the Scheme version, however, since the list-related functions in the Prelude combined with a few sections gives us everything we need:


range art op ep | art < op && ep < 0 = []
                | art > op && ep > 0 = []
                | art < op = takeWhile (<op) (iterate (+ep) art)
                | art > op = takeWhile (>op) (iterate (+ep) art)

In all honesty this is a fairly elegant expression of the algorithm. And while there's clearly a lot to like in Haskell this exercise has rekindled a long-dormant romance with Scheme rather than encouraging exploration of something new.

[1] Yes, I realize Scheme isn't "purely functional". I'm not interested in taking a stand in this particular holy war, but there's little doubt that Scheme's bias against set! and it's kin justify the "mostly functional" qualifier.