Tuesday, July 10, 2012

Ruby, Perl and Eloquence

In an attempt to make my Ruby code a bit more idiomatic I've been spending a bit of time recently with Russ Olsen's excellent Eloquent Ruby. There are many reasons to love writing Ruby code, not least of which is that Ruby deploys the same terse but expressive power of Perl while employing better overall principles of programming. The effect isn't universal; on occasion my problems with Ruby look quite a bit like my problems with Perl. Given the overall elegance of the language it seems likely that there's a "better" (or at least more idiomatic) way to accomplish my goal. And so I turn to Eloquent Ruby.

As an example of this tension consider the following example.

Perl has a well-deserved reputation for efficiently processing text files with regular expressions. We'll consider an example from another text I've been spending a bit of time with: Hofstadter's seminal Godel, Escher, Bach. A simple implementation of the productions of the MIU system in Perl [1] might look like the following:



Reasonable enough, but there's a lot of magic going on here. We're relying on the "magic" variable $_ to access the current line, and to make things worse we have to obtain those lines using the INFILE identifier that only has meaning due to a side effect of the open() call [2]. There's also those "magic" $1 and $2 variables for accessing capture groups in a regex.

The Ruby version is both slightly shorter and a bit cleaner:



We've made some nice strides here. The use of File.new() allows us to avoid open() and it's side effects. The use of a code block allows us to remove the global $_ in favor of a scoped variable line.

But we're still stuck with $1 and $2 for those capture groups.

One can imagine an elegant object-oriented solution based on match objects. Any such implementation would have to accomplish three things:
  1. The match object will be used as the condition of an if/unless expression so nil should be returned if there's no match
  2. The match object should be bound to a variable name in scope
  3. References to capture groups in the if-clause should use the scoped variable rather than the $1,$2, etc.
But remember, this exercise is only useful if we don't have to compromise on elegance. If all we're after is an explicit object-oriented solution we could go with the Python version:



That's probably not what we want. [3]

After pondering this question for a bit we realize we may not be in such a bad spot. /regex/.match(str) already returns nil if there is no match so our first requirement is satisfied. Assignment is just another expression, so our match object (or nil) will be returned to the if-expression test, helping us with our second goal. And match objects provide access to capture groups using []. So long as the assigned variable is in scope we should have everything we need. A bit of scuffling [4] brings us to the following:



This example is free of any "magic" variables, although we have sacrificed a bit on the clarity front. It's also worth noting that we could have accomplished something very similar in Perl:



This implementation is hardly idiomatic. It's also quite a bit less clear than our earlier efforts in the language.

Where does this leave us? Do we keep in touch with our Perl roots and live with $1 in order to keep things terse and expressive? Do we sacrifice a bit of clarity and go with an object-oriented approach? Or do we do something else entirely?

Answers to this question (and others like it) are what I'm hoping to get out of Eloquent Ruby.

[1] We're ignoring nasty things like error handling and complex edge cases in order to keep the conversation focused.

[2] We could use lexical file handles here but that doesn't really change the underlying point. Even in that case we still have to call open() in order for $fh to be useful.

[3] Python does a lot of things very, very well, but this solution to this problem seems unnecessarily verbose.

[4] The requirement to declare foo in advance when using the modifier form of if was a bit surprising. Shifting to an if expression removed this requirement. The upcoming Perl version also didn't require this advance declaration when using an equivalent to the modifier form. An MRI quirk, perhaps?

Tuesday, April 24, 2012

Small Facets of Ruby: redcar and rvm

An occasional series of brief discussions on some small aspect of Ruby and/or it's ecosystem. Some bits may turn into gems in the future, but in general topics discussed here will likely be small enough that a gem would be overkill.

I spend most of my time in the terminal, but when a graphical text editor is useful I'm a fan of redcar. Early versions were a bit rough but the project has progressed nicely and the editor now appears to be quite stable. Redcar has always been solid in terms of syntax highlighting: out-of-the-box we find support for Ruby, Clojure, Scala and Haskell (!) and many others. The only slight hiccup is that redcar requires JRuby, and while I'm often using JRuby anyway it would be nice to have redcar accessible even when working with MRI or Rubinius.

It seems like rvm should be able to help us out here. We should be able to install redcar into it's own gemset and create a script that switches to JRuby with that gemset and starts up the editor. Any such script shouldn't interfere with the Ruby interpreter in use in the current shell. A dead simple approach doesn't get very far:

[@varese ~]$ more bin/redcar
rvm use jruby-1.6.7@redcar
redcar
[@varese ~]$ bin/redcar

RVM is not a function, selecting rubies with 'rvm use ...' will not work.
You need to change your terminal settings to allow shell login.
Please visit https://rvm.io/workflow/screen/ for example.
...

Fortunately Wayne includes some excellent documentation about sourcing the necessary functions into your scripts in order to setup the environment as expected. A small change gives us the following:



A quick test verifies that we're up and running:

[@varese ~]$ which ruby
~/.rvm/rubies/ruby-1.9.3-p194/bin/ruby
[@varese ~]$ bin/redcar
Using /home/mersault/.rvm/gems/jruby-1.6.7 with gemset redcar
Redcar 0.13 ( java )
...
[@varese ~]$ which ruby
~/.rvm/rubies/ruby-1.9.3-p194/bin/ruby

Monday, February 20, 2012

Ruby and Concurrency: Maintaining Purity in a World of Actors

It's time to come clean.

The previous post on designing actors with Akka and JRuby took a bit of a shortcut. Not a big one, mind you, but a shortcut nonetheless. Fortunately the use of this shortcut provides a window into a few interesting aspects of actor design, so despite my embarrassment it's worth spending a bit of time here.

Let's start with the constraints imposed when designing with actors. We've already seen a few of these constraints: all mutable state must be contained within an actor, the outside world (including other actors) can only interact with an actor by sending messages and these messages are handled one at a time so there's no contention on access to mutable state. All this talk of message handling might lead one to wonder; exactly what can an actor do when it receives those messages?

To no one's great surprise there are some constraints here as well. Generally speaking the message handler in an actor consists of some combination of the following actions:


  • Creating other actors

  • Sending messages to other actors

  • Retrieving or updating managed state



There's certainly nothing horrible in that list either. None of these constraints seem terribly onerous. So where did we go wrong?

The problem starts with the Controller's handling of :next messages. Each candidate value is sent off in a message to every model for evaluation, in each case returning a future that can be used to access the model's response when it's provided. The first candidate that gets a positive response from all the models is our next prime. The implementation returns the expected value, but there's a big problem here: how did we observe the response from any of the models if we're still in the message handler for a :next message? Remember that messages are processed one at a time and we're not done with the current message yet. The model is an actor and actors communicate with messages; it seems reasonable to expect the response from the model to be delivered by a message sent to the controller. So how did we see that message if we're not done with the current message? If we can generalize a bit: how do futures (or a request-response messaging pattern in general) fit into an actor-based design?

The short answer appears to be that they don't really fit at all. A better approach would be for the controller to send messages to the models for the current candidate and then return. The models would then compute their answer and send a message containing that answer back to the controller. The message handler at the controller would then determine if an all responses have been received. If every model has answered, the controller computes an overall answer and takes action: if the candidate is prime it's returned to the caller, otherwise a new candidate is generated and the process starts all over again. [1]

So what does this new approach look like in Ruby? [2] Our controller is now a bit longer but after a bit of cleanup the code reads reasonably well:



The model doesn't really change much:



Complete code (including tests) can be found on github.

[1] Viktor Klang mentioned a non-blocking solution in comments on the original post. Any non-blocking implementation seemed to be constrained by a variation on the theme mentioned above; how do we preserve a communciation channel to the original caller (a caller who is not an actor) if the message handler for the :next message doesn't return a response directly? It wasn't until I came across Akka's notion of a reply channel (and the ability to preserve it as an instance variable in our actor) that this problem was resolved.

[2] I briefly considered explicitly modelling the actors as state machines using the intriguing statemachine gem but decided against it; formally defining the state machines involved didn't add much value to actors that are as lightweight as ours. Larger systems with more complex actors would very likely benefit from the use of this gem.

Thursday, January 5, 2012

Ruby and Concurrency: Design with Actors and Akka

In a semi-recent post we looked at how we might define actors in JRuby using a combination of Akka and code blocks. That post was very focused on the process of creating actors; we didn't necessarily consider how to build systems with actors and/or whether Ruby might be helpful in doing so. There are plenty of issues to ponder here, but before we dig in let's take a step back and talk briefly about some of the characteristics of actors.

Actors implement a shared-nothing model for concurrency. Mutable system state should be contained by one or more actors in the system. This state is not exposed to external entities directly; it can only be accessed or modified via messages sent to the actor. Since the actor is the only player in this drama that can access the state it contains there is no need to lock or synchronize state access. It should be fairly clear that an actor consists of some amount of mutable system state and some logic for handling incoming messages... and not much more.

Okay, sounds great... but how does it work in practice? We'll consider this question by way of an alogrithm that by now should be pretty familiar: prime number generation using the Sieve of Eratosthenes. We considered this chestnut (or perhaps you prefer "war horse" at this point) in a previous post discussing an implementation of this algorithm in the Go language. Let's review that implementation and see what else we can do with it.

The support of lightweight goroutines in Go encouraged a pipeline implementation with one goroutine per discovered prime. We can think of the "state" of this system as the total set of discovered primes. Each goroutine knows only about the prime it contains and the channel where candidates should be sent if they "pass" (i.e. do not divide evenly into it's prime number). Once the goroutine is created it's state doesn't change. New state is added by creating a new goroutine for a newly-discovered prime. State is never deleted; once a prime is discovered removing it from consideration is non-sensical. As a consequence state is completely distributed; no entity in the system knows about all discovered primes.

We also note that the algorithm described here isn't very friendly to parallel decomposition [1]. Candidate values are compared to previously discovered primes one at a time: if the candidate passes the test it's allowed to move on, otherwise no further evaluation occurs. This technique is referred to as "fail-fast" behaviour; if a value fails a test it doesn't lead to any wasteful extra work. The concurrent approach is quite different: compare the candidate to all discovered primes at once and return success only if all tests pass. Comparisons are done independently, so even if the "first" [2] comparison fails all other comparisons still execute. We lose our fail-fast behaviour but gain the ability to decompose the entire process into smaller jobs that can execute in parallel. A trade-off in engineering... surprising, I know.

We'll set out to implement the Sieve of Eratosthenes in Ruby using JRuby and Akka. Our new implementation will have more of a bias towards parallelism; this time we're okay with throwing away some work. Clearly we'll need an actor to compare a candidate to one or more discovered primes; that is, after all, why we're here. We can think of these actors as maintaining the state of our system (just like the goroutines in the Go implementation) so we'll borrow a term from MVC and call these "model" actors. Concurrently evaluating candidates against these models implies an organizing actor that is aware of all models; it seems natural to call this the "controller" actor. To keep things simple we don't want to support an unlimited number of models so we create a fixed-size "pool" and distribute system state (i.e. the set of discovered primes) between these models. When a new prime is discovered the controller will be responsible for adding that prime to an existing model.

The implementation in Ruby looks like this:



Ruby offers a number of features which help us out here. As shown in this implementation messages can be as simple as a list with a leading symbol (used to indicate the message "type") and a payload contained in the remainder of the list. This follows a similar convention found in Erlang, although that language uses tuples rather than lists. Support for destructuring/multiple assignment makes working with these messages quite simple.

Our previous work building actors from code blocks doesn't apply here due to an implementation detail so we define both the controller and model actors as distinct classes. As it turns out this change isn't much of a problem; we're able to implement both classes, a helper Enumerable and a second Enumerable wrapping the controller in less than 140 lines with comments. Full code (including tests) can be found on github.

Spend a little time with JRuby and Akka and it becomes clear that they work well together and form a good basis for building concurrent applications in Ruby.

[1] This is a bit of a loaded statement; you will get some parallelism as multiple candidates move through the "stages" (i.e. goroutines) of the pipeline. That said, this notion of parallelism applies to the system as a whole. The process of determining whether a single candidate is or is not a prime number is still very sequential.

[2] "First" here means nothing more than "whichever test happens to complete and return a value first"