Electronic Documents Give Reproducible
Research a New Meaning

Jon F. Claerbout & Martin Karrenbach, 1992


Paper: DOI doi.org/10.1190/1.1822162
Presenter: Adina Wagner ( mas.to/@adswa),
Chair: MichaƂ Szczepanik

Our basis goal is reproducible research and the ability to publish it in reproducible form. The electronic document is our means to this end.

Reproducibility

Definition
"an author attaches to every figure caption a pushbutton or a name tag usable to recalculate the figure from all its data, parameters, and programs"
Aims (met in 1990)
  • A publication should be "merged" with its underlying computational analysis
  • Researchers should be able to reproduce their own research results a year or more later
  • Researchers should leave their work in a state where coworkers can reproduce it automatically
  • External sites should be able to automatically reproduce research
  • Researchers should bundle software environments so that they can take their work to their next work site
  • Collaborative work by multiple authors
  • Reproducibility

    The goal of scientific publications is to teach new concepts, show the resulting implications and those concepts in an illustration, and provide enough detail to make the work reproducible.
    In real life, reproducibility is haphazard and variable [...] we rarely see a seismology PhD thesis being redone at a later date by another person.

    The original "electronic document"

      Either access to materials on Stanfords servers
      via "X-window" (today's X11), or a CD-ROM containing:

    • An electronic document based on LaTeX
    • Software executables for plotting; compilers; LaTeX
    • A cross-platform compatible build-system based on make (cake) to wipe and rebuild reproducible plots
    • plots in the form of PostScript plot files, marked as reproducible or non-reproducible via naming conventions
    • individual command scripts for every reproducible figure

    Hardware considerations

  • Magnetic tapes versus CD-ROMs
  • The price tag
  • Software problems

    Remote access is slow
    Remote machines lack compatible fonts
    cross-platform incompatible build systems

    Software problems

    Read-only file system, strict file name requirements & disk space requirements (solution: symlinks)
    administrative rights

    The role of open source for reproducibility

  • open source software:
  • Open access publications
  • Their conclusion and outlook

    With workstations becoming widespread and software available, the burdens imposed on the author to create reproducible results are little more than the task of filing everything systematically. We are nearing a time when it will simply be the author's choice whether to keep detailed means to results confidential with the use of traditional publication or to communicate fully by using reproducible documents

    30 years later: Connections to current discourse on reproducibility

    Let's discuss

  • What's more difficult: reproducible science 30 years ago, or today?