Category Archives: Information Retrieval

Full text search using PHP and MySQL

I’ve been thinking about full text searching for the letters project and trying to find various solutions that are open source. On the Open Shakespeare and Open Milton sites, we used the Xapian  project which is an excellent search engine. However I wanted to try and find a way of getting a search running using […]

Update on the Letters of Dickens

Just started on a new version of the Dickens letters which I’m trying to improve before adding in further volumes of text and other authors. I’ve refactored some of the code to remove some of the cruft and obsolescence. I’ve also been working on the rdf so that I can build up the RDFa links […]

Letters of Charles Dickens website

I’ve finally posted the first draft of the Dickens website here: https://austgate.co.uk/dickens/index.php?author=Dickens.  The idea is that it will allow users to derive networks across the a variety of Victorian authors as and when I can develop the datasets. I’ve also been developing a small text ontology to add to the Friend of a Friend (FOAF)  […]

Mining the Letters of Charles Dickens

As an aside I’ve started  a small project to begin visualising ways of searching the letters of Charles Dickens and exploring the Simile library which MIT have produced. Its originally an extension to the D-Space repository tool but Rufus Pollock used in the Open Knowledge Foundation’s Weaving History project – to which I contributed the […]

Twittering RSS

The slowness or lack of real time on RSS feeds has reared its head again in terms of getting news out quickly and in “real-time”. Erick Schonfeld on Techcrunch wants to speed them up and  John Biggs has decided that RSS needs to RIP. I’ve been working on Twittering RSS feeds for the JISCMail service […]

The changing community of publishing

The New York Times had a piece on digital piracy of books and the contrasting views which was picked up by Slashdot. Starting out from the anti-piracy view, it does note that bestsellers are often the most pirated books which backs up Cory Doctorow‘s assertion: “I really feel like my problem isn’t piracy,…It’s obscurity.” His […]

XML in Milton and Shakespeare

As part of the Open Milton project, I’ve been thinking about the place of  XML in it. Over Christmas, I wrote a small XSL transform using the Bosak XML Shakespeare files. Rufus took Anthony and Cleopatra and,  using Latex (I gather), created the Open Shakespeare Anthony and Cleopatra pdf. At one level, this is yet […]

Depositing blogs – feeding repositories from blogging applications

I’ve recently been working on a plugin for WordPress to set up each post as RDF enabled using OAI_ORE and SWORD which I presented to the Oxon SWIG on Tuesday. The Berlin Declaration of Open Access states the work should be free and also that it should be deposited in a repository. This seems to […]

Re-use, Remix, Redistribute: Opening Knowledge

I’m going to talk to you today about opening science and some of the ways that are being used to create platforms and tools and underlying responsibilities and actions that the commons needs to take if it is to develop a truly open way of working. Technology really is a means to an end; not […]