Monday, March 7, 2011

A Few More Words on Laziness

A few updates to the recent post on lazy evaluation. While most of the discussion below is directed at the Ruby implementation the same principles could also be applied to the Scala and/or Clojure versions. The implementations in these two languages are already fairly speedy, however, so for now we'll focus on bringing the Ruby version into line.

So, without further ado...

Don't blame JRuby

Of the three implementations discussed in the previous post the Ruby version was by far the worst in terms of performance. This fact should in no way be interpreted as a criticism of JRuby itself; the fault rests entirely on the code being executed (and by extension the abilities of the programmer who wrote it), not the platform it runs on. To prove the point, consider the following:

[@varese ruby]$ more Euler10Primes.rb
require 'prime.rb'

puts Prime.each(2000000).reduce(0) { |i,total| total + i }
[@varese ruby]$ time jruby --1.9 Euler10Primes.rb

real 0m27.737s
user 0m20.807s
sys 0m6.181s

Clearly we have work to do.

Remember that part about testing fewer candidates?

Our intial algorithm tested a candidate set of all odd numbers beginning with three. The underlying logic is two-fold:

  • Any even number is by definition divisible by two (a prime) and therefore not prime

  • We can exclude these numbers by generating candidates in a clever way (starting from an odd value and incrementing by two) rather than testing every candidate to see if it's even or odd

As a first optimization we consider this process for generating candidates. Is there any room for improvement here? As it turns out there is; approximately 20% of the time we're testing a candidate that could be excluded with zero computation. Permit me to explain.

Multiples of five (a prime) always end in zero or five. Anything ending in zero must be even (and therefore already excluded) but so far we haven't excluded anything ending in five. And since one out of every five of our current candidate pool ends in five we're doing exactly 20% more work than we need to. Okay, so how do we keep these values out of our set of candidates? We know that odd numbers must end with a fixed range of values (1,3,5,7 or 9). If we exclude five from that list and then add the remaining values to successive multiples of ten we'll generate the full set of potential candidate values.

This optimization alone yields some benefit:

[@varese ruby]$ time jruby Euler10.rb

real 2m42.012s
user 0m14.394s
sys 2m23.462s

There's still plenty of room for improvement.

Laziness can be fleeting

Languages that fully support lazy evaluation try not to return values until absolutely necessary. Suppose you want to return the first three items from a lazily defined sequence in Clojure. You might use "(take 3 coll)", but note that this doesn't return an array containing three values; it returns another lazy sequence. You could thus chain operations together without completely evaluating (or "realizing" as the Clojure folks are fond of saying) anything. Given a predicate "pred", for example, you could identify which of the these values satisfy pred using "(filter pred (take 3 coll))". This expression also returns a lazy sequence; if you want to return the values you need something like "(doall (filter pred (take 3 coll)))"

Ruby does not support lazy evaluation. We've been using "yield" to approximate lazy evaluation but it's only an approximation. In Ruby "coll.take 5" will return an array containing the first five items in the collection rather than another Enumerable that will generate these values as needed. As a result we're once again doing more work than we need to. And unfortunately this is exactly what we find in our initial Ruby implementation:

if @primearr.take_while {|i| i <= root }.all? {|i| @nat % i != 0}
@primearr << @nat
yield @nat

On the plus side we're only returning all primes up to the square root of our candidate. On the negative side we completely negate the fail-fast functionality of the "all?" method. Wouldn't it be better if we could exit this process as soon as we find a prime that the candidate divides evenly into?

This problem is actually a bit trickier than it looks. We can exit our test by finding a prime that our candidate divides into (indicating that the candidate isn't prime) or by finding a prime greater than the square root of the candidate (indicating that the candidate is prime). How can we distinguish between these two cases? One way to implement this test is to combine both conditions into a single predicate and find the first prime that satisfies it. If that prime is greater than the square root of the candidate we have a prime, otherwise we don't. We know this find operation will always be successful since eventually we'll find a prime greater than the square root of the candidate.

The resulting code (including the candidate generation fix mentioned above) looks like this:

So, how'd we do?

[@varese ruby]$ time jruby Euler10.rb

real 0m43.730s
user 0m33.408s
sys 0m9.040s

That's more like it! This code overtakes the original Clojure implementation to move into second place. We're still slower than the implementation using the prime number generator that comes with JRuby but we've made up quite a bit of ground.

You don't need a JVM to be lazy

If you're interested in lazy evaluation you could do a lot worse than spending some time with Haskell. The language fully embraces lazy evaluation which, according to some people, matters a great deal to the utility of functional programming.

An initial implementation of our algorithm in Haskell looks something like this:

Unfortunately the performance is dreadful:

[@varese haskell]$ ghc -o euler10 euler10.hs
[@varese haskell]$ time ./euler10

real 4m50.253s
user 0m8.976s
sys 4m35.131s

Once again it's important to realize that these numbers don't reflect on Haskell or the GHC in any way. They do reflect on my utter lack of experience and/or skill with this language. There's almost certainly a better (and probably more idiomatic) way to implement this algorithm in Haskell, although in my defense even this performance is orders of magnitude better than what we get using the Sieve of Eratosthenes implementation from Hutton's text:

[@varese haskell]$ more euler10sieve.hs
module Main where

primes = sieve [2..]
sieve(p:xs) = p:sieve[x|x <- xs, x `mod` p /= 0]

main = print (foldr (+) 0 (takeWhile (<2000000) primes))
[@varese haskell]$ ghc -o euler10sieve euler10sieve.hs
[@varese haskell]$ time ./euler10sieve

real 401m25.043s
user 0m0.069s
sys 394m13.061s

In the end, of course, this probably indicates only that the Sieve of Eratosthenes is probably not an ideal solution to this particular problem.

No comments:

Post a Comment