Marking up Open Correspondence with TEI XML

As part of the next version of Open Correspondence, I’ve been working on the XML and JSON mark-up.

As part of the XML, I’ve been using the TEI mark-up for the letters. I once hard this described as “XML for people who don’t think XML is flexible enough”. Now I can see why. It is a highly flexible solution to digitising texts but can be confusing, especially when switching between versions. I believe the original model that I had been working on was P4 but the current one is P5 so I had to negotiate that change and to make sure that I had the correct elements in the blocks. Even then, there can be two or three different versions of the same element in the section and I do have to wonder about that wisdom rather than simplifying the elements so that there are the extensible elements that may or may not be used. I’m intending to use the schema again and to really get my head around it rather than tinkering on the edges.

I’ve attempted this conversion before but think that I’ve finally got it to a point which is nearly there. What I would really like to do is to put together some sort of tool kit as a core to the Open Correspondence project. Clearly this would be a long-term project and would need more research but it might be useful to other projects.

As well as marking up texts, it would be useful to use the XML mark-up to convert the text into other formats such as Mobipocket or the Kindle formats to allow a user to create their own e-publication. It would also be useful to find a way of using the XML in conjunction with the psbook command to create a print version of a letter or collection. This does mean that I need to convert the XML into a PostScript file (which raises a host of questions at the moment – such as converting structured format into layout format) and then print it.

I’ve also been playing around with the correspondent collections and the way of marking up collections in TEI. I had thought of this as working on creating printable collections and making the data re-usable for printing. Equally it might allow the data to be used in answer to Jonathan Gray’s question regarding identifying the letters written to a particular correspondent.

When I can get the XML working and validated, then I’ll look at the JSON output. It would draw a line under this part of the project and allow me to move on. I’m aiming for a release towards the end of March or middle of April in keeping with trying to keep into a six week schedule.

The next thing after that is to begin answering Jonathan’s questions in terms of a tool kit to identify weaknesses and to try and write some code to re-use and re-mix the data. I would hope that would be in the next release towards the end of May.