Transcribing Bentham seminar notes

Melissa Terras talked about the Transcribing Bentham , a collaborative project to  transcribe the volumes of Bentham, at University College London at the first seminar in the Cultures of Knowledge seminars.

Bentham's body in cupboardBentham believed in education for all who could afford it in London. UCL has 60,000 volumes and BL has 30,000.

40,000 volumes were untranscribed at the time.

Sense of humour – come and have a go. Issues of handwriting, esp as he gets older. Computing cannot cope with OCRing it.

The Bentham Papers took 50 years to transcribe 20 volumes of writing .

The Guardian’s crowd sourcing of the MPs expenses influenced this project. Saw a massive surge at first but crowds waned slightly.

Prime reason was to digitise the manuscripts. Can they do it to guidelines (transcriptions, TEI) and standards? Transcriptorium  has also come out of this project. Digitisation builds on the available resources. Used Mediawiki and wrote some plugins.

Original image with a blank page to transcribe. Regular users picked up TEI. Editors act as a gateway who checks content and the encoding. If not of standard, then put back into coding; otherwise submitted. Nobody uses the user forums, depends on task nature. Integrate on Twitter and Facebook – user dedication drives community. Nearly 11k documents transcribed, 91% loaded and average circa 12 minutes. 440 users have transcribed something with c 25 super users. Average of 133 per week since March 2014 when BL came onlines. Images are copyright UCL but available as CC BY.

Looking at why participation stopped. There’s a relationship with the human – transcribers are people  as well.

Funding issues. Outreach needs to be physical to attract and retain users.

Motivations: interest in history/philosophy and the collaborative. Bentham not known well outside of UCL.

Motivations to stop: lack of time, the handwriting, complexity of instructions.

Using original pace, transcriptions would complete in 2081 but at current progress would be in a decade. Software sustainability is an issue for long term projects.

Will Mediawiki be supported in 10 years time? Code on github (eMunch). Suggested that code re-use rare in projects but it does happen.

Bentham data as part of collaborative project to develop a hand writing recognition system. Getting accuracy of 96% on Bentham access. Useful for searching and correcting the transcriptions, rather than generate from scratch.

Need to set up institutional support and importance for the collection. Got involved in other Bentham activities and has raised Bentham’s profile for study.