Posts Tagged ‘zobel’
Zobel on research methods
I was recommended this paper by Justin Zobel on research methods that has some interesting points to make about record keeping. In particular, he makes some practical suggestions for record keeping around experiments that is in line with good practice and looks quite practical for Computer Science.
He observes that current practice does not meet the minimum standards followed in other disciplines. In particular, it is often very hard (probably impossible) to replicate an experiment from the description found in the average computer science paper.
His key suggestions in this area were (note that the following is only partially paraphrased and is mostly direct quotes!):
1. Notebooks. These should be a guide to the experiment. They can be electronic although they should really be have some mechanism for checking their integrity to defeat attempts to falsify results (timestamps and regular dumping to backup files might be a good way to achieve this). The notebooks should record: dates; daily notes; names and locations of code, scripts, input, and other files; important references and web addresses; minutes of discussions; bug reports; locations and identifying marks of paper records; experimental parameters; and intent, outcomes, and interpretation of experiements (note that this blog covers some of this). He says that such notes can provide a “guidebook” to the experiement and should contain descriptions of ideas and show the progress of the research. Notes should be on the order of a few lines so as to make maintenance less onerous.
2. Code At an absolute minimum researchers should preserve the exact code used to yield any published results, and if possible the exact input. We may not need to keep every variation of the code because some changes may be very small — almost trivial. The notebook should discuss the kinds of changes that were made and why; if the changes are small enough to be quickly made by a competent programmer, and are documented in notebooks.
3. Logs should be complete transcipts of the output of the experiment. This should be the data as reduced by some process for human consumption. Note that we should keep all or some where the criteria is chosen ahead of time to avoid investigative bias.