Having gone to Textcamp yesterday, I started playing with Wordle and IBM’s Many Eyes at the suggestion of Dave Flanders of the JISC. As James Harriman-Smith, the organiser and Open Literature co-ordinator for the Open Knowledge Foundation, had suggested that this year is the anniversary of the manuscript of Alexander Pope‘s An Essay in Criticism, I popped the Gutenberg text into Wordle to see what it shows as a tag cloud. The dominance of wit is not a surprise as Wit in poetry was a prized quality for Pope and Dryden. There are some small issues such as ‘still’ and ‘Still’ and perhaps this could be rectified by making everything lower case but this also presents other issues if two words are similar but the capitalisation suggests a different intonation. As I’ve blogged before, word clouds are great but not if they don’t link so, at some point in the future, I’ll sit down and actually upload a table to create a useful tag cloud. John Levin, of Anterotesis, loaded a csv file of the recently released ECCO files. He loaded Volume Four of Defoe’s Tour of the Whole Island of Great Britain, which features Scotland.
Using the Many Eyes Word Cloud, we can see that Scotland is unsurprisingly the largest item but also Lord and Earl are also popular, suggesting that he stopped with or met the aristocracy rather than just travelling randomly. Dave Flanders and John created some cool visualisations using the tool which allow you to follow words in the text and to see which are the most linked to words (using bigrams I would suppose) in a tree fashion. It is certainly something at I will be looking up later for “quick win” visualisations.
One of the intriguing projects that was suggested was building our own DIY bookscanner using links currently stored on the Textcamp 2011 wiki pages. I think that Dave Flanders might be organising a hack weekend to actually build the machine for real use. I find it interesting but thinking that it would be cool to also see if can be built at home or using iPhone / Android OSes which also entails a software hack, unless an app already exists. That is something to explore later.
Mark MacGillivray, of OKFN and Cottage Labs, and Brian Hole of Ubiquity Press, spoke about Open Access and making scholarship open but also retaining its rigour. Using Open Access, we should be able to share the data, the ways of interpreting it and and the final interpretation which is published.
The science community has been doing this for some while and things like the Panton Principles and Science Commons are showing the way. One of the ideas was to write a handbook for how to use openness in literature and that it is something that we need address and build on. We ought to write an open guide / manual and build on / develop the Panton Principles where necessary as a core set of principles to work with.
Having days like Textcamp and Book Hackday are extremely useful to think about this and to work on the ideas. It is easy to get into echo chambers of mailing lists and blogs, we need these events to meet new people, be challenged to explain ourselves and to either build on the day or go away with ideas to test and try out. The day has excited me out using word clouds again and doing a bit more work on them as a tool to make them useful. It has also got me excited about book scanning and doing some hardware hacking (which I’ve not really done) before.
Running the Pope essay through Wordle makes me excited about testing what we can do with the ECCO TEI documents that John Levine links to. Can we hyperlnk to other texts, author and events that are mentioned in it (not just with the annotator tool but in generated HTML) or use HTML 5 to embed audio links to further discussions or pronunciation (for example Byron’s Don Juan which has been argued as pronounced “Jew-an” rather that “Hwan” and the arguments for and against).
Perhaps that gets to one of the issues that arose in the break-out discussions in the kitchen. After the lightning talk about digital publishing, there seemed to be an argument about whether current digital publishing was really pushing the boundaries or flailing around. I do think that it has some real benefits for niche publishing but these have not been fully explored. The model will need to change and perhaps become more open in those senses, perhaps linking the raw data to the interpretation earlier to allow the relevant community to peer review the data earlier. Just a suggestion. There are two distinct communities, the top-down business layer and the grass roots layer, activists, data developers and so on. Both would appear to have broadly similar aims but how to put them together in a useful way for both to learn. Don’t get me wrong here as I believe I’m at the grass roots layer, but I think that both sides do have a dialogue which could get around the issues that the music and film industries have found themselves in, i.e. confrontation. We are here to disrupt and make.because we are passionate. We care about the industry. Publishing is an industry which needs to change and transform itself. Put the two together and there are ways of moving forward. My hope is that in future events, we could get some more publishers along to the event.
The other important thing is that these conversations carry on afterwards. The round table discussions where great as were the break-out in the kitchen ones but they need to carry on or we create our own echo chamber which reduces the value of what happened yesterday.
Whilst I did not do as much coding as I wanted to yesterday, I met some new people and caught up with colleagues. The fact that organisations such as JISC are supporting events like this shows their underlying importance and use to the community. We’ve started, now we need to carry on by chatting, blogging, sharing and doing more of these events.
Great piece Iain. If anyone is reading this, and wants to join the discussions of those who participated at Text Camp, you can do so by visiting: http://lists.okfn.org/mailman/listinfo/tcamp11
Also, must point this out, Byron expressly rhymes “Juan” with several different words in the poem in order to bemuse his reader. Very Byronic, but also the occasion for a quick visualisation of all the rhymes of Juan perhaps?
It has been a while since I’ve read the Byron (about 15 years give or take) so memory is a bit hazy. Hmmm, that could be interesting re: the visualisation. Perhaps one for a little later in the day but gets into how we can make text mining more interesting.