Tuesday, May 24, 2011

Cassandra and Clojure: The Beginning of a Beautiful Friendship

Soon after I began working with Cassandra it became clear to me that if you were in the market for a platform for creating applications that interact with this database you could do a lot worse than Clojure. The lack of a query language [1] suggests that filtering and slicing lists of keys and columns might be a fairly common activity for apps powered by Cassandra. And while many languages support the map/filter/reduce paradigm Clojure's use of sequences throughout the core suggest a natural means to integrate this data into the rest of your application.

Cassandra itself provides an API that uses the Thrift protocol for manipulating data. We'll use this interface to implement a simple proof-of-concept application that might serve as a testbed for manipulating data managed by Cassandra in idiomatic Clojure. Note that the Clojure ecosystem already includes several open-source projects that connect Clojure to Cassandra: these include clj-cassandra and clj-hector, the latter leveraging the Hector Java client to do it's work. In order to keep things simple we choose to avoid any of these third-party libraries; it's not as if the Thrift interface imposes a heavy burden on us. Let's see how far we can get with what's already in the packaging.

That sounds great... so what exactly are we trying to do? Beginning with the database generated during our previous work with Cassandra we should be able to access sets of keys within a keyspace and a set of columns for any specific key. These structures should be available for manipulation in idiomatic Clojure as sequences. Ideally these sequences would be at least somewhat lazy and transparently support multiple datatypes. [2]

Using the Thrift interface requires working with a fair number of Java objects representing return types and/or arguments to the various exposed functions. My Clojure isn't yet solid enough to hash out Java interop code without flailing a bit so we kick things off with a Scala implementation. This approach allows us to simplify the interop problem without sacrificing the functional approach, all within a language that is by now fairly familiar.

The Scala code includes a fair number of intermediate objects but is otherwise fairly clean:



Translating this code into Clojure is more straightforward than expected:



These results seem fairly promising, although we're nowhere near done. This code assumes that all column names and values are strings, a perfectly ridiculous assumption. We also don't offer any support for nested data types, although in fairness this was a failing of our earlier work as well. Finally we haven't built in much support for lazy evaluation; we lazily convert column names to Clojure keywords but that's about it. But fear not, gentle reader; we'll revisit some or all of these points in future work.

[1] At least until CQL arrives in Cassandra 0.8
[2] We won't be able to meet these last two goals in our initial implementation, but with any luck we'll be able to revisit them in future work

No comments:

Post a Comment