CouchDb and Documents

I’ve been have a pre-Book HackDay hack at home (in between cat wrangling duties!) to use Couch DB in managing documents. Since it is a document centred database, no surprise there, but I’ve looking at it from the perspective of creating a system to allow to create their own documentation and make notes against the document. I know, surprise – its NoSQL but not Redis.

I’ve used PHP and the CouchSimple PHP class on the CouchDb wiki more as a quick hack to see how the idea shook out. CouchSimple is perhaps over simple but it does exactly what I needed at the time. In time, for production purposes, I’ll probably change this for something more substantial but for now I’m on my laptop.

Using the /all_dbs command for Couch to retrieve all the databases, I then iterated across the result set having decoded it from the json response that Couch returns and then used the _all_docs to return the documents for each database. I wanted to do this to produce something like the the Microsoft technet site and the way that they order the sections in the manual.

As this is a read-only site, well for the main document, using the revision isn’t necessary but it might be useful in the future if allowing a reader to alter the actual document and then save the revision history. This would be useful, for instance having ‘editions’ of a document so that if a reader uses the information but then it is updated, they can find it. As Couch’s revision history cannot be relied upon (what happens if a database is compacted or the system needs moving to a new server or upgraded – it would get lost), I would need to look at writing something which could be used instead but more of that in the future if this ever tips up into real use. Of course, calling all documents throws other issues such as pulling in _design or _view documents but that should be easily parsed out.

The things that needs sorting out is the actual text of the document which would need to be transformed on its travels between systems. It would need to come from an editorial system and then be put into Couch or an intermediary system before Couch. It then needs transforming for the user to actually read since XML can be a bit difficult on the eyes after a while.

I’ve been having a slight play with Backbone.js as well which is used with Underscore.js. I’ve inserted, and begun altering, the todos example created by Jérôme Gravel-Niquet to get into learning it from examples. I’ve started altering it by taking in the page’s URL to be stored into a database, probably Couch again, but there is the question of the user id which would need to be addressed to make sure that the right users get the right information. The url would be useful for logging, and returning to the user, where the information was retrieved from and at what time it was saved in line with basic citation practices.

Citation is another area that needs work to provide the basic data and tools to encourage its use not only in academic work but in general.

Backbone, judging from Document Cloud‘s use of it (they developed it), provides the tools to really provide useful tools to publishers and consumers. There is going to be a fair amount to think about and to develop on.

Happy days…

Of course there are other issues such as giving notes their own urls, access control and really thinking about the information architecture properly but the possibilities are certainly there. Once I’ve managed to create something useful code wise rather than a hacky (and in places, cut & paste, for learning only) job, I’ll start posting it.