One of the facets of CouchDB that has always been unclear to me is how one might join two or more databases together. A bit of context first: my general understanding of a CouchDB database was that it was an unordered collection of documents of some common type. Within a database views could be used to return documents in a certain order them, select certain subsets of the documents or transform some set of documents into something else entirely. If we were to attempt to relate this model to a SQL database  then a CouchDB database is equivalent to a table while a view is equivalent to a select statement on that table.
This model leads to a simple question: if I have database A for documents of type A and database B for documents of type B how can I relate documents in A to documents in B? The natural answer would be some kind of view but views in CouchDB are defined as documents within a given database and there is no notion of a cross-database or global view.
Let's try and make this a bit more concrete. We'll store a set of data gathered from Twitter (using the brilliant Python Twitter Tools package) in a CouchDB instance and interrogate that data in an attempt to answer a few questions. We're interested in the following set of resources:
- The first 100 tweets corresponding to a search term
- Information about the set of authors for these tweets
- The set of followers for each of these authors
Information about these resources will be stored in databases named (respectively) tweets,authors and followers. Once we've obtained this data we'd like to get answers to the following questions:
- For each author: what language do they write their tweets in and how many followers do they have?
- How many tweets did each author write?
- For a given tweet author foo how many tweet authors are also followers of foo?
We'll use the nice folks at Shot of Jaq as our reference point by searching for tweets containing #shotofjaq.
With our multi-database model we can easily answer the first two questions; the first can be answered with a simple map function applied to the authors database and the second with a map and reduce function applied to the tweets database. The third question poses more of a problem. A correct answer requires data from both the authors and followers database and there is simply no way to access both databases using the standard map-reduce functionality. Our model appears to be broken.
Complete code for this example (including a simple Python script to submit temporary views to a CouchDB database) can be found at Github. Note that this repository also includes code for the correct single-database solution to this problem, so no peeking until the next post goes up!
 Yes, the CouchDB folks go to great lengths to point out that this is a silly thing to do. I employ this comparison only to illuminate the question at hand.