Heuristic Fencepost: October 2010

Sunday, October 31, 2010

base64 in Clojure: An initial attempt

After working through most of Stuart Halloway's excellent Programming Clojure I found myself in search of a meaty example to exercise my Clojure chops. Re-implementing the Project Euler examples in another language didn't seem terribly appealing. Maybe a base64 implementation; I hadn't implemented the encoding in a while and it is a reasonable exercise when learning a new language. So what kind of support does Clojure already have?

As expected there's no base64 support in the core. clojure-contrib does include some support, although the documentation doesn't inspire a great deal of confidence:


This is mainly here as an example.  It is much slower
 than the Apache Commons Codec implementation or 
sun.misc.BASE64Encoder.

Intriguing; looks like we have a winner!

The only complexity in a base64 implementation is handling the one and two byte cases. Any code must support returning the last two bits of the first byte (or the last four bits of the second byte) as distinct values in these cases as well as combining these bits with the leading bits of the second and third bytes (respectively) in other cases. My initial implementation was a fairly straightforward adaptation of this principle. The core consists of a letfn definition with functions for handling the one, two and three byte cases. Each function returns two values: a list of all 6-bit elements discovered so far as well a "remainder" value representing the trailing bits in the one and two byte cases. This structure leads to better function re-use; as an example the two-byte function can build on the one-byte function. A wrapper function calls the correct value for a given set of bytes. Now we need only partition our byte array into groups of three or less and apply the wrapper function to each group. A combination of the partition and map functions do exactly that.

So how'd we do? We implement a simple test program to generate a set of random data, map a base64 implementation on to each sample and then realize the lazy sequence by converting it into a vector. We time this conversion (for a common set of sample data) for our implementation, the clojure-contrib implementation and commons-codec. The following run is fairly representative:


bartok ~/git/base64-clojure $ source set_classpath.sh 
bartok ~/git/base64-clojure $ clojure fencepost/test/compare_base64.clj 
Commons-codec
"Elapsed time: 69.074926 msecs"
RFJlYWkqIF10RCc1PXlpO3FcIDY6dkJuQkhHNWxRJkwkOFdgciFGfHwhYz5EYG15PjxWdFJ9Ml83TGFoeltHSTs+ST9mdj0rfSZrcVNIKn5oKSdTI3U5a1FqIzBvIkRIV0BkK3xCZjtrSWYuTiM1XWB9UW14W2dVLFM=
clojure-contrib
"Elapsed time: 1221.124746 msecs"
RFJlYWkqIF10RCc1PXlpO3FcIDY6dkJuQkhHNWxRJkwkOFdgciFGfHwhYz5EYG15PjxWdFJ9Ml83TGFoeltHSTs+ST9mdj0rfSZrcVNIKn5oKSdTI3U5a1FqIzBvIkRIV0BkK3xCZjtrSWYuTiM1XWB9UW14W2dVLFM=
fencepost/base64_letfn
"Elapsed time: 375.829153 msecs"
RFJlYWkqIF10RCc1PXlpO3FcIDY6dkJuQkhHNWxRJkwkOFdgciFGfHwhYz5EYG15PjxWdFJ9Ml83TGFoeltHSTs+ST9mdj0rfSZrcVNIKn5oKSdTI3U5a1FqIzBvIkRIV0BkK3xCZjtrSWYuTiM1XWB9UW14W2dVLFM=

These results do confirm that that clojure-contrib implementation is very, very slow in relation to commons-codec. We've reduced that runtime by roughly a third but we're still five or six times slower than commons-codec. Surely we can do better; in the next installment we'll see how to do so.

As usual all code can be found on github.

Friday, October 22, 2010

Concurrent prime factorization in Go

For a little while now I've been looking for a problem that would allow me to sink my teeth into Google's Go language and see if it's as tasty as it appears to be. After coming across Alex Tkachman's request for concurrency benchmarks it was clear that I had what I needed. Alex is the driving force behind Groovy++ and his proposal was to implement a common benchmark (a concurrent version of prime factorization using the Sieve of Eratosthenes) in multiple JVM languages in order to compare and contrast what works and what doesn't. An implementation in Go wouldn't help much in benchmarking JVM languages but the highly concurrent nature of the problem does map naturally onto Go's feature set. The language distribution also includes an elegant implementation of the Sieve of Eratosthenes in it's tests so we already have a solid foundation to build from.

Let's begin with sieve.go. The implementation consists of chained goroutines. Each time a candidate makes it through the sieve and is returned as a new prime number a new goroutine is created to check future candidates and reject them if they divide evenly by the new prime. The implementation cleanly demonstrates the concept but isn't useful as a standalone application since it includes no termination condition. Let's begin by adding that in:


[varese factorization_benchmark]$ cd go
[varese go]$ ls
factor.go  sieve.go
[varese go]$ gobuild sieve.go
Parsing go file(s)...
Compiling main (sieve)...
Linking sieve...
[varese go]$ ./sieve -max 20
2
3
5
7
11
13
17
19

That seemed to work nicely. Adding in the factorization code should give us everything we need:


import "fmt"
import "flag"
import vector "container/vector"
...
var target *int = flag.Int("target",1,"Number we wish to factor")
flag.Parse()

t := *target
var rv vector.IntVector

// Retrieve a prime value and see if we can divide the target
// evenly by that prime.  If so perform the multiplication 
// and update the current value.
ch := make(chan int) // Create a new channel.
go Generate(t,ch)      // Start Generate() as a subprocess.
for prime := <-ch; (prime != -1) && (t > 1); prime = <-ch {

 for ;t % prime == 0; {
  t = t / prime
  rv.Push(prime)
 }

 // Create a goroutine for each prime number whether we use it or
 // not.  This performs the daisy chaining setup that was being
 // done by the Sieve() function in sieve.go.
 ch1 := make(chan int)
 go Filter(ch, ch1, prime)
 ch = ch1
}

fmt.Printf("Results: %s\n",fmt.Sprint(rv))

Simple enough. This leaves us with something that looks fairly robust:


[varese go]$ gobuild factor.go 
Parsing go file(s)...
Compiling main (factor)...
Linking factor...
[varese go]$ ./factor -target 60
Results: [2 2 3 5]

Examples here employ gobuild. This tool has proven to be very useful when building and testing Go applications. Complete code for both examples can be found on github. There is some interest in comparing a solution based on goroutines and channels to one based on actors and messages so in the future some Scala code might join the existing Go code.

Sunday, October 10, 2010

Default arguments and implicit conversions in Scala

In a previous post we went in search of an implicit conversion from Int to List[Int] such that each member of the list corresponds to the value at an equivalent position in the input Int (i.e. 987 = List(9,8,7)). At the time we mentioned that a properly tail recursive implementation proved to be a bit more complicated than one might expect. In this post we'll examine these problems in some detail.

A properly tail recursive implementation of this conversion function makes use of an accumulator array to store state as we recurse.


package org.fencepost.defaults

import org.scalatest.Suite

class ImplicitDefaultTest1 extends Suite {

  def int2list(arg:Int, acc:List[Int]):List[Int] = {

    val newmember = arg % 10
    if (arg <= 9)
      List(newmember) ::: acc
    else
      int2list(arg / 10,List(newmember) ::: acc)
  }
  
  implicit def toList(arg:Int) = int2list(arg,List())

  def testImplicit() = {

    assert(0.length == 1)
    assert(0(0) == 0)
    assert(5.length == 1)
    assert(5(0) == 5)
    assert(12345.length == 5)
    assert(12345(0) == 1)
    assert(12345(2) == 3)
    assert(12345(4) == 5)
    assert(98765432.length == 8)
    assert(98765432(4) == 5)
  }
}

The test above passes so everything looks good so far. On a second look, however, we note that the wrapper function toList() is less than ideal. The accumulator needs to be initialized to the empty list in order for the function to work correctly but defining a second function just to pass in an extra arg looks like unnecessary cruft. Scala 2.8 introduced default arguments to address situations such as this; perhaps we can deploy default arguments here to clean up our test:


package org.fencepost.defaults

import org.scalatest.Suite

class ImplicitDefaultTest2 extends Suite {

  implicit def int2list(arg:Int, acc:List[Int] = List()):List[Int] = {

    val newmember = arg % 10
    if (arg <= 9)
      List(newmember) ::: acc
    else
      int2list(arg / 10,List(newmember) ::: acc)
  }

  def testImplicit() = {

    assert(0.length == 1)
    assert(0(0) == 0)
    assert(5.length == 1)
    assert(5(0) == 5)
    assert(12345.length == 5)
    assert(12345(0) == 1)
    assert(12345(2) == 3)
    assert(12345(4) == 5)
    assert(98765432.length == 8)
    assert(98765432(4) == 5)
  }
}

When we attempt to execute the test above we're greeted rather rudely by sbt:


[info] Compiling test sources...
[error] .../src/test/scala/org/fencepost/defaults/ImplicitDefaultTest2.scala:21:
 value length is not a member of Int
[error]     assert(0.length == 1)
[error]              ^
[error] .../src/test/scala/org/fencepost/defaults/ImplicitDefaultTest2.scala:22:
 0 of type Int(0) does not take parameters
[error]     assert(0(0) == 0)
...

Clearly the implicit conversion of Int to List[Int] wasn't in play when this test was executed. But why not? Logically int2list(arg:Int, acc:List[Int] = List()) will convert Ints to List[Int] everywhere int2list(arg:Int, acc:List[Int]) does. We can demonstrate the validity of this claim by fooling the compiler using a variation on the front-end function we used before:


package org.fencepost.defaults

import org.scalatest.Suite

class ImplicitDefaultTest3 extends Suite {

  def int2list(arg:Int, acc:List[Int] = List()):List[Int] = {

    val newmember = arg % 10
    if (arg <= 9)
      List(newmember) ::: acc
    else
      int2list(arg / 10,List(newmember) ::: acc)
  }

  implicit def toList(arg:Int) = int2list(arg)

  def testImplicit() = {

    assert(0.length == 1)
    assert(0(0) == 0)
    assert(5.length == 1)
    assert(5(0) == 5)
    assert(12345.length == 5)
    assert(12345(0) == 1)
    assert(12345(2) == 3)
    assert(12345(4) == 5)
    assert(98765432.length == 8)
    assert(98765432(4) == 5)
  }
}

As expected this test passes without issue.

My suspicion is that this issue is a side effect of the fact that default arguments apparently aren't represented in the type system. It's not surprising that int2list(arg:Int, acc:List[Int]) isn't available as an implicit conversion; there's no way for the runtime to supply the required "acc" argument for an input Int instance. This is not true for int2list(arg:Int, acc:List[Int] = List()); in that case the default value of "acc" could be used to perform the translation. Note, however, that these two functions are represented by the same type in the Scala runtime:


$ scala
Welcome to Scala version 2.8.0.final (OpenJDK Client VM, Java 1.6.0_18).
Type in expressions to have them evaluated.
Type :help for more information.

scala>   def int2list(arg:Int, acc:List[Int]):List[Int] = {
...
int2list: (arg: Int,acc: List[Int])List[Int]

scala>   def int2list2(arg:Int, acc:List[Int] = List()):List[Int] = {
...
int2list2: (arg: Int,acc: List[Int])List[Int]

If the type system is unaware that default arguments are available for all arguments other than the type to convert from then it's not at all surprising that a function would be excluded from the set of valid implicit conversion functions.

Results tested and verified on both Scala 2.8.0 and 2.8.1 RC2.