Thursday, December 31, 2009

Accessing POSIX system calls from Groovy with JNA

One of the many virtues of the Groovy programming language is largely stylistic. It's syntax, typing and general feature set preserve the feel of earlier dynamically typed languages while still remaining just close enough to Java to support easy interaction with the hefty Java/JVM ecosystem. This feature enables first-class access to existing Java code while still allowing me to think like a Perl programmer.

For those of us who came up writing as much Perl, Python and Lisp/Scheme as we did C++ and Java this is a non-trivial benefit. If you're used to working with both static and dynamic types you find yourself thinking about certain problems in certain ways. For certain problems you may think in terms of dynamic types. For other problems you may employ strong type systems. Often the right tool for the job is the right tool because of the workman using it. [1]

The comparison between Groovy and the other dynamic languages is hardly flawless. Some of the discrepencies are due to the platform independence required by the JVM. Take something as simple as getting the PID of the current process on a POSIX-compliant OS. In Perl I have $PID. In Python I have os.getpid(). In Groovy I got... nuthin'.

Fortunately the JNA library for Java provides access to native libs directly from Java code (no JNI coding required). It seems like access to libc via JNA plus some combination of Groovy's metaprogramming capabilities should help us out. A rough sketch of my requirements would look something like this:

  • Assume a POSIX-complaint OS and ready availability of libc. At least initially we'll constrain ourselves to POSIX system calls.
  • Any solution should stay as close to the POSIX APIs as possible. We can add layers or encapsulate results in objects later. [2] Changing error cases to throw exceptions rather than checking return values in code is an acceptable change (although I admit this is a fairly arbitrary choice).
  • There are a lot of system calls defined in POSIX. No, really; check it out if you don't believe me. I'd like to avoid having to define a function in an interface (or anywhere else) before using it. This may introduce a bit of a performance penalty [3] but I'm okay with that for now.

A simple initial approach would use methodMissing along the lines of the recommendations offered by the JNA guys for the use of their lib in dynamic languages. In our case we have to wrap the JNA NativeLibrary in a Groovy proxy in order to make it available for metaprogramming work.


  /* Create the lib first so that it's in scope
   * when the closure is declared */
  def libc = NativeLibrary.getInstance("c")

  /* No need to register this meta-class; other
   * instances of NativeLibrary may wish
   * to do things differently. */
  ExpandoMetaClass emc = new ExpandoMetaClass(Proxy,false)
  emc.methodMissing = {

   String fname, fargs ->
   println "Invoking method name ${fname}, args: ${fargs}"
   def f = libc.getFunction(fname)

   synchronized (libc) {

    def rv = f.invokeInt(fargs)
    if (rv == -1) {

     def errnoptr = libc.getGlobalVariableAddress("errno")
     def errno = errnoptr.getInt(0)
     def errstr = libc.getFunction("strerror").invokeString([errno] as Object[],false)
     throw new LibcException(errno,errstr)
    }
    else { return rv }
   }
  }
  emc.initialize()

  def libcproxy = new Proxy().wrap(libc)
  libcproxy.setMetaClass(emc)

  def pid = libcproxy.getpid()
  println "PID: ${pid}}"


This seems to work well enough, but upon closer examination there are a few problems:

  • We have to tie up methodMissing for the lib in order to make the lib work. One can imagine that individual apps may wish to do other things with methodMissing.
  • The class we use to make the JNA NativeLibrary into a GroovyObject (groovy.util.Proxy) provides easy access to that NativeLibrary object. This allows other callers to make syscalls using this NativeLibrary while we're in methodMissing. This in turn could easily confuse our error handling code via it's reliance on the global "errno" variable.

So we make a few tweaks and wind up with a better proxy using invokeMethod:


class GroovyLibc extends GroovyObjectSupport {

 private libc = NativeLibrary.getInstance("c")

 /* Complete hack to cover the fact that the private
  * access control modifier for properties is
  * apparently completely ignored now. Details
  * can be found at
  * http://jira.codehaus.org/browse/GROOVY-1875 */
 public Object getProperty(String name) {

  switch (name) {
   case "libc": throw new MissingPropertyException("Property ${name} unknown")
   default: return super.getProperty(name)
  }
 }

 public Object invokeMethod(String name, Object args) {

  println "Invoking method name ${name}, args: ${args}"
  def f = libc.getFunction(name)
  if (f == null) {

   throw new MissingMethodException("Could not find function ${name}")
  }

  synchronized (libc) {

   def rv = f.invokeInt(args)
   if (rv == -1) {

    def errnoptr = libc.getGlobalVariableAddress("errno")
    def errno = errnoptr.getInt(0)
    def errstr = libc.getFunction("strerror").invokeString([errno] as Object[],false)
    throw new LibcException(errno,errstr)
   }
   else { return rv }
  }
 }
}


Both concerns are now addressed. As a final optimization we note that as of version 3.2.0 JNA offers direct support for throwing an exception when a syscall returns an error (according to a defined calling convention). We can make use of this support to clean up our code a bit:


class BetterGroovyLibc extends GroovyObjectSupport {

 private libc = NativeLibrary.getInstance("c")

 /* Complete hack to cover the fact that the private
  * access control modifier for properties is
  * apparently completely ignored now. Details
  * can be found at
  * http://jira.codehaus.org/browse/GROOVY-1875 */
 public Object getProperty(String name) {

  switch (name) {
   case "libc": throw new MissingPropertyException("Property ${name} unknown")
   default: return super.getProperty(name)
  }
 }

 public Object invokeMethod(String name, Object args) {

  println "Invoking method name ${name}, args: ${args}"
  try {

   def f = libc.getFunction(name,Function.THROW_LAST_ERROR)
   if (f == null) {

    throw new MissingMethodException("Could not find function ${name}")
   }

   return f.invokeInt(args)
  }
  catch (UnsatisfiedLinkError ule) {

   throw new MissingMethodException("Could not find function ${name}")
  }
  catch (LastErrorException lee) {

   def errno = lee.errorCode
   def errstr = libc.getFunction("strerror").invokeString([errno] as Object[],false)
   throw new LibcException(errno,errstr)
  }
 }
}


Complete code (along with a few sample unit tests to verify that it works) can be found on Github.

[1] Ports of existing dynamic languages (i.e. Jython and JRuby) enable thinking in terms of dynamic types but don't integrate as cleanly with existing Java code.

[2] The jna-posix project has done some excellent work in a similar vein. The project started as part of the JRuby core but was later spun off into a standalone lib. The problem is that it wraps syscall results in objects rather than following conventional semantics. POSIX.lstat(String) returns a FileStat object rather than the more conventional lstat(String,stat struct). There are good arguments for this approach; it's much friendlier to object-oriented systems and it does avoid calling methods for side effects only. But if you're used to the conventional POSIX system calls a new syntax and/or object hierarchy just gets in the way. Like everything else in this article this is at least in part a matter of taste.

[3] For example this constraint explicitly prevents us from using the direct mapping features for native functions available in JNA.

Thursday, December 3, 2009

Building CouchDB on Karmic

Well, I guess it's time to get going. This blog isn't going to write itself.

Despite a continuing affinity for Debian I've shifted most of my personal work to Ubuntu in recent months. I remain something of a dinosaur by building most of my libraries and binaries myself, but an occasional well-placed package or two can really speed things up, especially for a dependency that may not be particularly interesting by itself. Striking that balance seems to be a bit easier with Ubuntu but that's a fairly subjective opinion; your mileage may vary.

So when I decided to resume some earlier experiments with CouchDB and Scala (including playing with the very intriguing Dispatch library) it only seemed natural to migrate to Ubuntu 9.10 ("Karmic Koala"). So we proceed with an installation of CouchDB 0.10.1 on a fresh install of Ubuntu 9.10. And yes, I am aware that CouchDB 0.10.0 is included as part of the Karmic install. That is good news but it's irrelevant to my purpose. There's no benefit in being forced to work around what Ubuntu does with the native CouchDB install or to worry about version conflicts if we want to upgrade. We can install a standalone copy; why wouldn't we?

So c'mon, kids, let's build a no SQL database! We start with some foundation stuff:

mkdir ~/local
gzip -dc openssl-1.0.0-beta4.tar.gz | tar xf -
cd openssl-1.0.0-beta4
./Configure --prefix=/home/hfencepost/local/openssl-1.0.0-beta4 shared linux-elf
make
make test
make install
cd ..
export LD_LIBRARY_PATH=/home/hfencepost/local/openssl-1.0.0-beta4/lib
bzip2 -dc curl-7.19.7.tar.bz2 | tar xf -
cd curl-7.19.7
./configure --prefix=/home/hfencepost/local/curl-7.19.7 --with-gnu-ld --with-ssl=/home/hfencepost/local/openssl-1.0.0-beta4/
make
make test
make install
cd ..

Probably not necessary to configure LD_LIBRARY_PATH since we're explicitly specifying the location to our OpenSSL install. Anyways, moving on...

CouchDB also makes use of the ICU library so let's take care of that now. It's a C++ lib and g++ doesn't appear to be installed by default so we need to snag it before we build anything:

sudo apt-get install g++
gzip -dc icu4c-4_2_1-src.tgz | tar xf -
cd icu/source
./runConfigureICU Linux --prefix=/home/hfencepost/local/icu-4.2.1
make
make check
make install
cd ..

CouchDB is implemented (mostly) in Erlang so an Erlang platform might be useful. As mentioned above Karmic includes a CouchDB install by default which also means it includes an Erlang install by default. We're building our own for the same reasons we install our own CouchDB instance. Erlang requires a curses lib to build correctly so we lead off with that:

sudo apt-get install ncurses-dev
gzip -dc otp_src_R13B03.tar.gz | tar xf -
cd otp_src_R13B03/
./configure --prefix=/home/hfencepost/local/otp_R13B03 --with-ssl=
/home/hfencepost/local/openssl-1.0.0-beta4
make
make install
cd ..

Finally we come to SpiderMonkey. In the past this has been the only CouchDB dependency I install as a package, but with Karmic something odd has happened:

sudo apt-get install libmozjs-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
libmozjs0d libnspr4-dev
The following packages will be REMOVED:
couchdb-bin desktopcouch evolution-couchdb evolution-documentation-en firefox firefox-3.5
firefox-3.5-branding firefox-3.5-gnome-support firefox-gnome-support gnome-user-guide
python-desktopcouch python-desktopcouch-records ubufox ubuntu-desktop ubuntu-docs
xulrunner-1.9.1 xulrunner-1.9.1-gnome-support yelp
The following NEW packages will be installed:
libmozjs-dev libmozjs0d libnspr4-dev
0 upgraded, 3 newly installed, 18 to remove and 131 not upgraded.
Need to get 844kB of archives.
After this operation, 295MB disk space will be freed.
Do you want to continue [Y/n]?


I'm not sure why installing libjs.so would result in the removal of Firefox (not to mention everything else on that list) but that's a question for another day. Clearly this won't work, so off we go to build from source.

We download SpiderMonkey 1.7 and build based on the instructions found on the CouchDB wiki (and echoed by the SpiderMonkey docs):

gzip -dc js-1.7.0.tar.gz | tar xf -
cd js/src
make -f Makefile.ref
JS_DIST=/home/hfencepost/local/spidermonkey-1.7.0 make -f Makefile.ref export
cd ../..

That seems to have gone okay, but let's dig just a bit deeper:

ld -L/home/hfencepost/local/spidermonkey-1.7.0/lib -ljs
ld: warning: cannot find entry symbol _start; not setting start address
/home/hfencepost/local/spidermonkey-1.7.0/lib/libjs.so: undefined reference to `__umoddi3'
/home/hfencepost/local/spidermonkey-1.7.0/lib/libjs.so: undefined reference to `__udivdi3'
/home/hfencepost/local/spidermonkey-1.7.0/lib/libjs.so: undefined reference to `__divdi3'

Ugh. The references appear to be to code in the gcc runtime that isn't available to the lib. A bit of digging makes it clear that this is caused by the use of ld to actually create the shared lib rather than gcc. Moving forward with this configuration will cause the CouchDB install to fail due to an inability to find a functioning SpiderMonkey install. A simple change to config/Linux_All.mk does the job:

#MKSHLIB = $(LD) -shared $(XMKSHLIBOPTS)
MKSHLIB = $(CC) -shared $(XMKSHLIBOPTS)

Let's clean everything out and rebuild with our new settings:

make -f Makefile.ref clean
rm -rf /home/hfencepost/local/spidermonkey-1.7.0/
make -f Makefile.ref
JS_DIST=/home/hfencepost/local/spidermonkey-1.7.0 make -f Makefile.ref export
cd ../..
ld -L/home/hfencepost/local/spidermonkey-1.7.0/lib -ljs

The lack of any error messages in the output of the last command fills us with joy! At last we get to build CouchDB!

gzip -dc apache-couchdb-0.10.1.tar.gz | tar xf -
cd apache-couchdb-0.10.1/
export LD_LIBRARY_PATH=$HOME/local/spidermonkey-1.7.0/lib:$HOME/local/curl-7.19.7/lib:$HOME/local/openssl-1.0.0-beta4/lib
export PATH=$HOME/local/icu-4.2.1/bin:$HOME/local/curl-7.19.7/bin:$HOME/local/otp_R13B03/bin:$PATH
./configure --prefix=/home/hfencepost/local/couchdb-0.10.1 --with-gnu-ld --with-erlang=$HOME/local/otp_R13B03/lib/erlang/erts-5.7.4/include --with-js-lib=/home/hfencepost/local/spidermonkey-1.7.0/lib --with-js-include=/home/hfencepost/local/spidermonkey-1.7.0/include
make
make install
cd ..

Well, that was fun. All we have to do now is fire this thing up and see what we have:

cd ~/local/couchdb-0.10.1/
bin/couchdb

You should see a welcome message with calming powers second only to "Don't panic":

Apache CouchDB has started. Time to relax.

You can now access your new database via Futon (details of which can be found on the wiki), curl, wget or whatever other HTTP client strikes your fancy.

Enjoy!