Adina Wagner
@AdinaKrik |
|
|
Psychoinformatics lab,
Institute of Neuroscience and Medicine, Brain & Behavior (INM-7) Research Center JΓΌlich |
|
Funders
Collaborators
|
|
Comprehensive user documentation in the DataLad Handbook (handbook.datalad.org) |
|
|
|
|
|
|
File viewer and terminal view of a DataLad dataset
Stay flexible:

Procedurally, version control is easy with DataLad!
datalad create creates an empty dataset.datalad save records the dataset or file state to the history. datalad download-url obtains web content and records its origin. datalad status reports the current state of the dataset.
$ datalad clone git@github.com:psychoinformatics-de/studyforrest-data-phase2.git
install(ok): /tmp/studyforrest-data-phase2 (dataset)
$ cd studyforrest-data-phase2 && du -sh
18M .
$ datalad get sub-01/ses-movie/func/sub-01_ses-movie_task-movie_run-1_bold.nii.gz
get(ok): /tmp/studyforrest-data-phase2/sub-01/ses-movie/func/sub-01_ses-movie_task-movie_run-1_bold.nii.gz (file) [from mddatasrc...]
# eNKI dataset (1.5TB, 34k files):
$ du -sh
1.5G .
# HCP dataset (80TB, 15 million files)
$ du -sh
48G .
| Git | git-annex |
| handles small files well (text, code) | handles all types and sizes of files well |
| file contents are in the Git history and will be shared upon git/datalad push | file contents are in the annex. Not necessarily shared |
| Shared with every dataset clone | Can be kept private on a per-file level when sharing the dataset |
| Useful: Small, non-binary, frequently modified, need-to-be-accessible (DUA, README) files | Useful: Large files, private files |
|
|
|
text2git, yoda)
or created and shared by users
(Tutorial) .gitattributes (e.g., based on file type,
file/path name, size, ...;
rules and examples
)datalad save --to-git)adina@bulk1 in /ds/hcp/super on git:masterβ± datalad status --annex -r
15530572 annex'd files (77.9 TB recorded total size)
nothing to save, working tree clean
(github.com/datalad-datasets/human-connectome-project-openaccess)
datalad clone installs a dataset.datalad get downloads file content on demand.
![]() |
|
datalad run
datalad-containerdatalad-containers run
datalad run records a command and
its impact on the dataset.--input
are retrieved prior to command execution.--output
will be unlocked for modifications prior to a rerun of the command. datalad containers-run can be used
to capture the software environment as provenance.datalad rerun can automatically re-execute run-records later.
|
Imagenette dataset
|
datalad save it, or use commands such as datalad download-url
or datalad add-urls to retrieve it from web-sources-c yoda prepares a useful structure-c text2git keeps text files such as scripts in Gitdatalad containers-rundatalad containers-rundatalad create -f can transform any directory or Git repository into a dataset