It has been a while since I’ve written a weeknote. Must get back into the habit.
Development on the Open Correspondence project has been slow to stalled for a while. I have been doing bits and pieces but sitting down with Mark McGillivray of Cottage Labs and the Open Knowledge Foundation, brought some clarity. Recently the Textus project has been announced and I have been talking with the developers to put the data onto that platform. It seems to me that it is better to pool resources and to contribute where I can. There are parts of the existing project that I like and others that need more work to make me happy and it seems right now to move onto the developing platform.
At Textcamp last September, one of the sessions covered DIY Bookscanners (Austgate post on Textcamp). One of the actions on the Textus wiki was OCRing text. I have posted previously about playing with Tesseract and seeing this, I emailed the humanities-dev list to explore the possibilities. To this end, I have volunteered to work on the area and will write a blog post about it There is already a large amount of work that exists, so I am perhaps not developing anything new. However it would, I think, be interesting to develop a stand-alone system that is flexible and downloadable. Like other OKF projects, it will be a Python project but also be a hardware project to try and extend some of the existing projects.
I’ve been working on an indexing project which appears to be coming together quite nicely. Hopefully I’ll be able to say some more shortly but it depends on a conversation that has yet to be had.
Next week, after a break, is a return to work and to data. The Dev8d conference provided me with some ideas and clarity on one or two things, so time to put them into practice.