Continuing on the path to better reproducible work

Last year I had the pleasure of helping Julia Stewart Lowndes at a Software Carpentry style workshop in Oxford. I got notice of her paper, Our path to better science in less time using open data science tools, [1] via various means and made time to read the other day.

It describes the path taken to get to reproducible science in the Ocean Health Index community from workflow to tools to data. The paper places the work into social and political context, an occasionally forgotten part of science.

At the workshop, Julia enthused about R Markdown and I have been looking at it in part.

The main thing that I am taking away from this is the need to work out my own workflow when looking at sonification in Digital Humanities. Some conversations suggest just using Jupyter notebooks, others shell scripts (or Make files) but that seems to only solve part of the problem of reproducibility. I am not suggesting that Markdown is the only answer or that, indeed, there is one answer. Version control is used as well.

The paper comments:

 We quickly realized we needed a nimble and robust approach to sharing data, methods and results within and outside our team—we needed to completely upgrade our workflow.

I come at this point from two angles. Firstly, I am writing a paper using some methods for reproduction of the data and secondly, I am putting together a short lecture at the Digital Humanities Oxford Summer School about reproducible research. I find myself at the same point that the Ocean Health Index team (OHI) and the Science side in between stages: I need to completely upgrade my workflow.

A question that I should ask myself: can my future self reproduce the results? Is there enough information (technical and documentation) to do so? How and where can this be improved?

It feels like it needs to be easier and made more open and transparent. Part of this may be to pull existing codebases together and use fewer languages as well as searching for existing open data tools.

This is a constantly iterative journey, improving on what has gone before as tools and contexts change but the goal should be kept in mind. The paper, may be, did not really open up new questions in terms of the overall goals but reading a team’s experiences towards reproducibility is a very worthwhile action. The focus on iteration is a key to this, the critical questioning of existing practices and do they still support the objective. It invites a constant questioning of the existing processes and hearing about how others have tackled similar questions. I found something similar reading Hope Jahren’s Lab Girl and her iterations with teams and locations.

As a result, I’ve gone back to some other work and making notes of things that do not necessarily work and can be improved to support not only the work in progress but also how to build on it. I hope to get to something along the lines of the OHI work in time.

[1] Lowndes, J.S.S. et al. Our path to better science in less time using open data science tools. Nat. Ecol. Evol. 1, 0160 (2017)