Research Databases in the Humanities

I went to the Research Databases in the Humanities workshop, organised by Sudamih, which was an excellent afternoon and time well spent. An Oxford heavy event, there were a number of interesting directions that came out of the afternoon.

Firstly James Wilson, project manager of Sudamih at Oxford University Computing Services, outlined the Database as a Service (DaaS) project which I think outlines a desperately needed service. The project seeks to allow researchers to upload their datasets (I believe from SQL and CSV) into a MySQL, PostreSQL infrastructure with  a commion front end though with  access control levels to the data itself. The idea is to keep data sets available for long term use.

The second important point was that data sets need to be kept available if researchers move on, kept open for sharing since the same data can be used across the field or even by different disciplines or funding ends. Resources, as Claire Warwick of UCL, need to be kept available for the long term, partially in response to promises to funding bodies but also for citation purposes and re-use by future scholars. There are sites which are now appear moribund but could be kept useful if the data could be moved somewhere or the project kept in the service such as the above DaaS concept. Of course sites do need funding to stay alive and the notions of sustainable business models (from free to pay for access) were skirted over.

(I do wonder if it is practical to offer / build something like this as an Open Knowledge Foundation project as an adjunct for CKAN for smaller projects. But perhaps that is another post for another day…)

Jacob Dahl, of the Cuneiform Digital Library Initiative, was one of the only speakers who touched on the openness issue. He commented that there is a site which draws from his open database and makes some amendments but these are then not offered openly back to the originating site ir users un an open fashion. Again this leads to a “silo” mentality which prevents knowledge being shared and developed. This is a more insidious threat to developing datasets and databases since the knowledge cannot be easily shared. The scary thing about this is that rather than the websites being made moribund, the data itself is and a community cannot develop around it to refresh and maintain the data. Perhaps this is a more long term threat to Digital Humanities than tired-looking websites.

On a tangent, one of the speakers mentioned Alastair Dunning’s blog post on digitisation. I’m not going to summariuse as it is fairly short but the outcome that I take away is that digitisation is necessary but it needs to allow users to create new queries. This cannot happen unless the dataset is maintained and that access is given through APIs or search. (Funny how search comes back again. I’m sure it is haunting me.)

The afternoon was rounded off with a talk about the CLAROS project which  is using Semantic Web technologies to query several major databases of Classical Art across the world. It is something that I’m interested in (with the endpoints on OpenCorrespondence but I’m not quite there yet) and it marks the future for projects but I do wonder if the basics, the technological infrastructure for researchers needs to exist first. It comes back to the chicken and egg though. If the possibilities are not given and developed in prototype or early working models, then they remain only possibilities and not useful.

I believe that there are  a number of outcomes that arose and debate which  I’ve outlined above rather than talking about the individual talks. We come back to the notions of openness and preservation. I think that there a quite a few things that could be developed to aid researchers and also issues to keep in mind for developing future resources.