Weeknotes: Open Correspondence, Xapian and Linked Data

After last week’s server move, we discovered one or two things that needed to be changed before they could go live. The main thing was the Xapian search which I had been working on. The initial version kept the Xapian server on the local machine and used that to index and search the letters butt he new version is distributed across machines so it required a brief change.

Opening a “one box wonder” Xapian search in Python is done via:

xapian.WritableDatabase(db_path, xapian.DB_CREATE_OR_OPEN)

where db_path is the database name that you want to give the index and open the index using:

xapian.Database(db_path)

Since the project uses Pylons, the controller used a path out to the .ini file loaded at runtime to link to the correct database.

Using the documentation on the Xapian site for remote backends and the Python bindings, I was able to quickly adjust the code so that xapian.WritableDatabase is replaced by:

xapian.remote_open_writable(“<host name>”, “<port number>”)

and is opened by:

xapian.remote_open(“<host name>”, “<port number>”)

Once that is set up, then all you need to do is to start the the TCP server which is what I’ve been looking at. I downloaded the tar.gz file of Xapian-core from the Xapian site, configured and made on Ubuntu Lucid Lynx and then ran xapian-tcpsrv –port <port number> <database name> in a new terminal window which allowed me to test the connections and get them ready for going live.

Changes are afoot on the Open Correspondence site as well. As part of a conversation that involved Keith Alexander, of Talis, the project is going to evolve into a slightly more Linked Data direction with references to the books, magazines, correspondents and so on. I’d already started going in this direction with the correspondent links (such as http://www.opencorrespondence.org/letters/correspondent/Miss%20Hogarth) so this is really an extension of where we need to go to connect to other resources such  as Dbpedia, Wikipedia and so on. The fact that it is Dickens’s bi-centenary in 2012 gives an added boost to the project. The Linked Data approach gives us the chance of creating some sort of framework for future expansion and linking together of data sources, not only at a literary level but also socially. It also encourages me to sort out the content negotiation work that was started and to try and follow the FAQs that the Pedantic Web group have posted to make sure that the site follows the best standards that it can and to build them into future developments and directions.