Weeknotes: Open Correspondence updates

I’ve bitten the bullet and done it. I’ve uploaded the current changes to the Open Correspondence site.

The current changes are:

  • additional fields in the RDF endpoint.  I still need to do some major work to JSON and XML which I hope to do for the next update.
  • a basic text search
  • a basic set of geographic data in the collection
  • better linking from the letters to the correspondent and geographical data (NB it is still incomplete)
  • a Simile timeline (which is a bit slow at the moment).

Admittedly some of this is exposing work already there but hidden. However I’ve also been working on some unicode fixes to the underlying XML which is used by the project which has meant rebuilding the tables and the Xapian indexes.

Following a request on the Open Literature mailing list, I’m looking at the idea of using Python’s NLTK to create some linguistic API wrappers around the Xapian search. It strikes me that these letters can be used to create a corpus of Dickens’s language where you can explore the language used in family correspondence (to his daughters and wife), to other authors (Wilkie Collins) and to readers. That is a longer project though in terms of building the relevant indexes.

I’m also looking at the idea of clustering a collection of letters to a correspondent and seeing what happens (for some reason, the current script is looking at Wilkie Collins). There is also a set of queries that one might run against letters discusing books and the publication dates to view the distribution. I’m working on these latter questions at the moment for intended release later this week but I do foresee it being delayed a while.