Category Archives: Text Mining

Installing Xapian into Open Correspondence and next steps

As an aid to getting over the first (and hopefully last) seasonal cold, I’ve been implementing Xapian as a search engine, using the Python bindings. I did look at Solr as an alternative but the set up costs outweighed the fact that Xapian is already installed on the server as part of Python. Unlike OpenMilton, […]

Tagging the revolution – exploring Edmund Burke’s Reflections on the Revolution in France

Over the weekend, I read an interesting article, “Edmund Burke: How did a long-dead Irishman become the hottest thinker of 2010?“, by Amol Rajan in the Independent on the philosopher, Edmund Burke. In the past I’ve read his musings on the sublime in “A Philosophical Enquiry into the Origin of our Ideas of the Sublime […]

A change to the Letters project

During the previously blogged dinner with Ben and Rufus, we talked about the nascent work on the letters project. Both have “encouraged” me (it didn’t take too much persuasion, it must be said) to move the project to the Open Knowledge Foundation and to port it to Python with a Redis backend rather than the […]

Textcamp announced

Had dinner with Rufus Pollock and Ben O’Steen on Monday in Oxford. As part of the dicussions, the notion of Textcamp was raised and Ben has created the Textcamp website with an associated blog. It is a slightly bigger concept than I had had but the approach, I think, will allow the creation of a […]

Mining data driving the web?

Just seen an article on Techcrunch by Bradford Cross of Flightcaster regarding the growth of data on the Web. He appears to argue that data and its uses will drive the Web soon, writing: the data age is less about the raw size of your data, and more about the cool stuff you can do […]

Letters of Charles Dickens website

I’ve finally posted the first draft of the Dickens website here: http://austgate.co.uk/dickens/index.php?author=Dickens.¬† The idea is that it will allow users to derive networks across the a variety of Victorian authors as and when I can develop the datasets. I’ve also been developing a small text ontology to add to the Friend of a Friend (FOAF)¬† […]

Mining the Letters of Charles Dickens

As an aside I’ve started¬† a small project to begin visualising ways of searching the letters of Charles Dickens and exploring the Simile library which MIT have produced. Its originally an extension to the D-Space repository tool but Rufus Pollock used in the Open Knowledge Foundation’s Weaving History project – to which I contributed the […]

Rethinking the idea of the “text”

Is a text really stable? Is it entity? In a lecture during my final year at the University of Leicester, one of the English lecturers posed a a question: What is a text? After soliciting various answers from the masses, he argued that a text is anything – email, note, manuscript and so on. So […]

Building data stores

Mats Dahlstrom’s talk at the Dilemmas of Digitization conference mentioned the Deep Sharing: A Case for the Federated Digital library paper by Daivd Seaman. It would be great if there was a system for rapidly building small data stores from scratch to include texts and then have these with editing software components, text encoding output […]

Spelunking text data

One of the ARTFUL developers presented the PhiloLogic and its PhiloMine extension. Both are free text searching databases and tools. Both sets of code are designed for large sets of data which does raise the question whether it might be useful to develop a set of tools for smaller data holdings or individuals.