As part of the Open Milton project, I’ve been thinking about the place of XML in it. Over Christmas, I wrote a small XSL transform using the Bosak XML Shakespeare files. Rufus took Anthony and Cleopatra and, using Latex (I gather), created the Open Shakespeare Anthony and Cleopatra pdf.
At one level, this is yet another version of Shakespeare. True.
But think of the possibilities. A user could happily generate their own version of the play (for instance using it in a class) or create their own annotated version for that class and not have to worry too much about losing the text / book as it can be printed and shared widely. Communities of interested parties could be pointed towards a website where they could download the material either in final form or just get the XML to use it.
To some extent this is also about embracing a standard and making it common outside of academia and closed repositories. It would appear to be easier to share texts and make use of them if we know what the coding is going to be rather than have to wait for the download to complete before taking a look.
To that end, I’ve started a contribution (currently in prototype) to create a small parser so that we can start transforming text files into TEI (Text Encoding Initiative) Lite format. Granted it is at an early stage but the initial results show some promise and are encouraging (well for me at least).
As per Open Milton/Shakespeare, I’ve been using Python to do this with the minidom package with regular expressions. The next step will be to split out the script into reader, parser and writer. I’ve been concentrating on drama but prose and verse have their own vocabularies so the parser will probably need to be split into three, each bit concentrating on a form and calling methods from the writer as appropriate.