Category Archives: Information Retrieval

Hacking Arts Council data

I lost my hackday cherry yesterday and went to the Open Data hackathon to look at the South East arts council data found at the site ( Our hosts, White October, were fantastic and welcoming (and put the kettle on as soon as I came in!) and Incuna provided the much needed pizzas for […]

Weeknotes: Open Correspondence, Xapian and Linked Data

After last week’s server move, we discovered one or two things that needed to be changed before they could go live. The main thing was the Xapian search which I had been working on. The initial version kept the Xapian server on the local machine and used that to index and search the letters butt […]

Tweeting changes with Node.js

As a break from Open Correspondence, I’ve been looking at node.js, the server side Javascript library. I’ve been thinking about the document stuff that I’ve been working on with Milton. One of the things that I had mooted as an idea was reading Twitter and pushing them back to the document. I’ve been playing with […]

Weeknotes: Ubuntu, messaging and Open Correspondence

It has been a while since the last weeknotes. I’ve finally made the move to Linux, or at least dual booting, by installing Ubuntu so I’m currently learning a little the OS and getting a development environment set up for it. I’ve nearly finsihed the ongoing accounts project at work. The framework is up and […]

Creating bibliographic resources from web pages

Given the increasingly digital nature of research, including not only websites but blogs, forums, wikis, the (in my view), beloved moleskin is becoming increasingly outdated. I’ve just finished writing my first book and had the joy of using moleskin notebooks to note down urls and make notes. I like moleskins a lot but pen and […]

Finding a space for NoSQL

ReadWriteWeb have a post on NoSQL (again?) by Audrey Watters which is a brief overview of the area.  The original post points the Heroku blog, where Adam Wiggins outlines the uses of NoSQL. I’m not an expert by any means but use Redis on a daily basis with the Rediska PHP library. I remember having […]

Weeknotes: Redis, PHP, mail and SOAP

I’ve spent some time writing a queueing library using Redis as a backend. I started with the notion that it would need to be a FIFO queue but didn’t want to only use the in-built parts of PHP as a stack using array_pop or array_push. Whilst it might be faster, it doesn’t allow for queue […]

Weeknotes: Data mining, XML and bibliographies

It seems to be have been a week of frantic completion and refactoring. The first half was spent frantically converting html pages into PDFs using Verypdf’s HTMLtools server product. All in all the manual is very helpful and the test server could be set up quickly. It might have helped the other end if I’d […]

Data curation in real time

Robert Scoble’s blog has this intriguing post on real-time curation which has made me think. At the moment I’m working in curating and archiving gigabytes of information at work (and usually on ways of generating more data from the systems). Whilst this is not necessarily real time, I’d like it to be or at least […]

Textcamp announced

Had dinner with Rufus Pollock and Ben O’Steen on Monday in Oxford. As part of the dicussions, the notion of Textcamp was raised and Ben has created the Textcamp website with an associated blog. It is a slightly bigger concept than I had had but the approach, I think, will allow the creation of a […]