Category Archives: Text Mining

Thinking about texts and communities at Textcamp

Having gone to Textcamp yesterday, I started playing with Wordle and IBM’s Many Eyes at the suggestion of Dave Flanders of the JISC. As James Harriman-Smith, the organiser and Open Literature co-ordinator for the Open Knowledge Foundation, had suggested that this year is the anniversary of the manuscript of Alexander Pope‘s An Essay in Criticism, […]

Weeknotes: Open Correspondence toolkit and converting XML into JSON

I’ve been quiet for a bit though generally because I’ve been quite busy on projects and exploring ideas. After Book Hackday, I’ve written a post about beginning to develop the Open Correspondence toolkit for the Open Knowledge Foundation’s Notebook blog. I was also contacted regarding converting the TEI XML pages into JSON, which I am […]

Marking up Open Correspondence with TEI XML

As part of the next version of Open Correspondence, I’ve been working on the XML and JSON mark-up. As part of the XML, I’ve been using the TEI mark-up for the letters. I once hard this described as “XML for people who don’t think XML is flexible enough”. Now I can see why. It is […]

Finding and mapping influences

The awesome Jonathan Gray posted an intriguing question on his blog about mapping influence in intellectual history. What he is trying to do is to map the possible routes of influence between people. In his case, it is philosophers; in mine, authors. One of the driving ideas behind the Open Correspondence RDF was to begin […]

Adding linguistic interfaces to Open Correspondence

I’ve been playing around with the Python NLTK package, in particular the WordNet interface. WordNet is hosted by Princeton University. I mentioned that I was going to look at this and the idea of allow a search for lemmas of a word. It came about from a question posed on Open Literature mailing list regarding […]

Weeknotes: Open Correspondence updates

I’ve bitten the bullet and done it. I’ve uploaded the current changes to the Open Correspondence site. The current changes are: additional fields in the RDF endpoint.  I still need to do some major work to JSON and XML which I hope to do for the next update. a basic text search a basic set […]

Weeknotes: Arts funding, Open Correspondence

I’ve been doing some updating this week rather than anything new. I was going to spend time trying to complete the places section of the Open Correspondence website. It needs some tidying up as the endpoint has had some changes made to it. I did come across an issue which has implications in exposing other […]

Contextualising places in time

As part of the Open Correspondence project, I’ve started to look at place names and locations to build a set of temporal and spatial data for the letters to allow for geographical queries. As part of the search, I came across a reference to Sean Gillies’ useful blog post talking about modelling historical place names […]

Weeknotes: Books and places for Open Correspondence

Progress on the next version of Open  Correspondence has been a bit slower than I would have like. Sleep is, however, useful to being alert enough to write code. I’ve gone back to the some of the work that I was doing for the first version of the site way back last year. As part […]

Digital Humanities and building data sets

Rob Myers reposted this New York Times link on the Open Knowledge Foundation discussion list about Digital Humanities and its growth. It mentions the Mapping the Republic of Letters project (unfortunately it does not appear to be open) and its linking together of the centres of letter production. Last night I managed to build the […]