Category Archives: Text Mining

Marking up Open Correspondence with TEI XML

As part of the next version of Open Correspondence, I’ve been working on the XML and JSON mark-up. As part of the XML, I’ve been using the TEI mark-up for the letters. I once hard this described as “XML for people who don’t think XML is flexible enough”. Now I can see why. It is […]

Finding and mapping influences

The awesome Jonathan Gray posted an intriguing question on his blog about mapping influence in intellectual history. What he is trying to do is to map the possible routes of influence between people. In his case, it is philosophers; in mine, authors. One of the driving ideas behind the Open Correspondence RDF was to begin […]

Adding linguistic interfaces to Open Correspondence

I’ve been playing around with the Python NLTK package, in particular the WordNet interface. WordNet is hosted by Princeton University. I mentioned that I was going to look at this and the idea of allow a search for lemmas of a word. It came about from a question posed on Open Literature mailing list regarding […]

Weeknotes: Open Correspondence updates

I’ve bitten the bullet and done it. I’ve uploaded the current changes to the Open Correspondence site. The current changes are: additional fields in the RDF endpoint.  I still need to do some major work to JSON and XML which I hope to do for the next update. a basic text search a basic set […]

Weeknotes: Arts funding, Open Correspondence

I’ve been doing some updating this week rather than anything new. I was going to spend time trying to complete the places section of the Open Correspondence website. It needs some tidying up as the endpoint has had some changes made to it. I did come across an issue which has implications in exposing other […]

Contextualising places in time

As part of the Open Correspondence project, I’ve started to look at place names and locations to build a set of temporal and spatial data for the letters to allow for geographical queries. As part of the search, I came across a reference to Sean Gillies’ useful blog post talking about modelling historical place names […]

Weeknotes: Books and places for Open Correspondence

Progress on the next version of Open  Correspondence has been a bit slower than I would have like. Sleep is, however, useful to being alert enough to write code. I’ve gone back to the some of the work that I was doing for the first version of the site way back last year. As part […]

Digital Humanities and building data sets

Rob Myers reposted this New York Times link on the Open Knowledge Foundation discussion list about Digital Humanities and its growth. It mentions the Mapping the Republic of Letters project (unfortunately it does not appear to be open) and its linking together of the centres of letter production. Last night I managed to build the […]

Making Milton sparql

I’ve been going over some ideas that have been bubbling in my mind for a while about using RDF to load in further details about a test in question. I’ve gone back to an old Milton file, the Areopagitica,  that I created for another project but never really used. Essentially its part of the Burke […]

Installing Xapian into Open Correspondence and next steps

As an aid to getting over the first (and hopefully last) seasonal cold, I’ve been implementing Xapian as a search engine, using the Python bindings. I did look at Solr as an alternative but the set up costs outweighed the fact that Xapian is already installed on the server as part of Python. Unlike OpenMilton, […]