Category Archives: Text Mining

Strava, segments, and tracking

A few years ago, Strava visualised the GPS co-ordinates in their data and displayed the locations of secret bases. A change of privacy settings later and, apparently, all was secret again. The Guardian has just run a story on using segments and GPS locations to show individuals within the bases through re-purposing the segment function. […]

Jane Austen’s word choices

A Facebook friend had a link to an NY Times piece on Jane Austen’s word choices. Using Franco Moretti’s techniques, it begins showing how Digital Humanities can be useful. There are one of two of his books that I am waiting for before I can get into the pros and cons but I do have […]

A simple experiment in Sound and Vision for Hamlet

The aim of this hack is to explore turning the structures of the First Folio texts marked up using Text Encoding Initiative XML (TEI) into notes using the Chuck , PHP and Processing languages. I wanted to explore the processes for transforming the texts for the user and explore different ways of presenting the textual […]

Harmonising the Heterogeneous at Cultures of Knowledge

Harmonising the Heterogeneous at the Cultures of Knowledge seminar series with Eero Hyvönen. Notes are unedited. Two forms of the Web : WWW for humans, GGG (Giant Global Graph) for data. Core data set 1048 data sets and 59 billion triples. Google’s Knowledge Graph and Microsoft’s Satori – graph engines in the search giants. Why […]

Future of Editing – some reflections on Nicole Pohl on Sarah Scott

The seminar in today’s The Future of Editing series, “An Editor’s duty is indeed that of most danger’ (Piozzi): editing Sarah Robinson Scott“, by Nicole Pohl that the Bodleian Digital Library Systems and Services is holding at the Oxford e-Research Centre was a thought provoking one in terms the questions raised a series of points […]

Transcribing Bentham seminar notes

Melissa Terras talked about the Transcribing Bentham , a collaborative project to  transcribe the volumes of Bentham, at University College London at the first seminar in the Cultures of Knowledge seminars. Bentham believed in education for all who could afford it in London. UCL has 60,000 volumes and BL has 30,000. 40,000 volumes were untranscribed […]

A quick skim into mining Twitter data

This is a variant on the text prepared for a short talk at the Open Science evening at the Oxford e-Research Centre on Wednesday 27th November. Peter Murray-Rust also spoke at the event on the AMI software and the Chemical Tagger. This is a brief talk about some work that I have been doing in […]

Weeknotes – Scripting and scraping

It has been a while since I last posted a week note, so I thought I would try and get back in the habit. I’ve been involved in glueing together profiling tools to run so that I can have a vaguely generic framework to profile software at the IO level and the CPU level. Shell […]

Attending the Open Humanities Hack

I’ve just come back from a couple of excellent days of Humanities Hacking, organised by the King’s College, London Digital Humanities department and the Open Knowledge Foundation. To be fair, it went slightly differently than I thought it would. After an interesting start trying to find the room we were in, a few of us […]

Looking at mentions and users in a Twitter message

I was preparing for the recent OK Festival and discovered that the Weird Council was taking place; a conference on the awesome China Miéville. As you may guess, I am a bit of a fan. Unfortunately I was not aware that it had taken place so I watched it on Twitter. Whilst on my travels, […]