Archive for July, 2009

Mining the Letters of Charles Dickens

Tuesday, July 14th, 2009

As an aside I’ve started  a small project to begin visualising ways of searching the letters of Charles Dickens and exploring the Simile library which MIT have produced.

Its originally an extension to the D-Space repository tool but Rufus Pollock used in the Open Knowledge Foundation’s Weaving History project – to which I contributed the Milton json data file. Originally I’d used it just for biographical timelines but thinking about it, I wondered how you could use it to mine datasets like the letters of Charles Dickens.

Dickens was a prolific letter writer (the Pilgrim edition extends to 12 thick volumes). I don’t have access to that data but I did download the first volume (of three) that his daughters edited.

Using Perl, I have extracted the date and recipient tags and converted the text file into JSON (as part of a larger process of converting the file into XML and using XSL to transform the data) and then created a table view of the data so that you can easily find the dates of the letters sent to certain people in tabular form.

I’ve also used the same data set to produce a fairly basic timeline of the letters which is being rewritten from here. It needs some rewriting to update to the new version of timeline.

Twittering RSS

Monday, July 13th, 2009

The slowness or lack of real time on RSS feeds has reared its head again in terms of getting news out quickly and in “real-time”. Erick Schonfeld on Techcrunch wants to speed them up and  John Biggs has decided that RSS needs to RIP.

I’ve been working on Twittering RSS feeds for the JISCMail service and getting the service news feeds to become tweets using Perl using XML::FeedPP and LWP::UserAgent. I’ve even got a script reading Twitter and posting back any posts from the account to an email address so that the helpline doesn’t need to constantly log into update itself.

Clearly RSS on its own is not going to help with the constant stream of news attention required by some users. It does for most people I suspect who are not running in real time but messaging systems on the web are changing and it is getting faster which perhaps demands a rethink  of how silos, like Twitter and Facebook, and protocols, like RSS, work together.

I noticed that the pubsubhubub solution that Erick points to builds on Atom and pushes via an IM style solution. Andy Skelton at Wordpress has developed a Jabber plug in (which I suppose goes some way to alleviating the problem but only for Wordpress).

Pushing content and transforming it into a different protocol is the easiest way currently to make sure that news or events are ported into different services and that the community can be developed. Building and updating communities has never been easier or frustrating at the same time trying to see how the different services talk to each other and how to build “real-time” update when necessary.