Data Management
for Open & reproducible Science



Dr. Adina Wagner
mas.to@adswa

Institute of Neuroscience and Medicine, Brain & Behavior (INM-7)
Research Center Jülich
ReproNim/INCF fellow


Slides: files.inm7.de/adina/talks/html/dgkn24.html

  • Domain-agnostic command-line tool (+ graphical user interface), built on top of Git & Git-annex
  • 10+ year open source project (100+ contributors), available for all major OS
  • Major features:
  • Version-controlling arbitrarily large content
    Version control data & software alongside to code!
    Transport mechanisms for sharing, updating & obtaining data
    Consume & collaborate on data (analyses) like software
    (Computationally) reproducible data analysis
    Track and share provenance of all digital objects
    (... and much more)

Examples of what DataLad can be used for:

a screenrecording of browsing open neuro

Examples of what DataLad can be used for:

  • Publish or consume datasets via GitHub, GitLab, OSF, the European Open Science Cloud, or similar services
a screenrecording of cloning studyforrest data from github

Examples of what DataLad can be used for:

  • Creating and sharing reproducible, open science: Sharing data, software, code, and provenance
a screenrecording of cloning REMODNAV paper dataset from github

Examples of what DataLad can be used for:

  • Creating and sharing reproducible, open science: Sharing data, software, code, and provenance
  • a screenrecording of cloning REMODNAV paper dataset from github

Examples of what DataLad can be used for:

  • Central data management and archival system

Examples of what DataLad can be used for:

  • Data catalog for consortia

Examples of what DataLad can be used for:

  • Scalable computing framework for reproducible science

Basic facts about DataLad

  • Free & Open Source: Builds upon Git and git-annex
  • ✓ Comprehensive documentation & weekly open office hours
  • ✓ Command line tool + graphical user interface
  • Scalable: hundreds of TB and millions of files are not an issue
  • Interoperable: Compatible with dozens of services and most infrastructure

Further Information



Install it on your own hardware: handbook.datalad.org/r.html?install

Acknowledgements

Software
  • Joey Hess (git-annex)
  • The DataLad team & contributors


Thanks!

Questions?

(scan the QR code for slides)
Funders
Collaborators

distribits.live

  • First conference on technologies for distributed data management
  • 2 day conference plus single-day Hackathon
  • @ Haus der Universität Düsseldorf
  • late registrations (virtual, on-site) still possible