This is where your title goes

DataLad Publication & Collaboration Session starts in

Publishing data

Transport logistics

Share data like source code
Datasets can be cloned, pushed, and updated from and to local paths, remote hosting services, external special remotes

Flexible data access management for annexed file contents based on storage location

Interoperability

DataLad is built to maximize interoperability and use with hosting and storage technology

Interoperability

DataLad is built to maximize interoperability and use with hosting and storage technology

Publishing datasets

I have a dataset on my computer. How can I share it, or collaborate on it?

Publishing datasets

Most public datasets separate content in Git versus git-annex behind the scenes

Publishing datasets

Typical case:

Datasets are exposed via a private or public repository on a repository hosting service
Data can't be stored in the repository hosting service, but can be kept in almost any third party storage
Publication dependencies automate pushing to the correct place

Publishing datasets

$ git config --local remote.github.datalad-publish-depends gdrive

Publishing datasets

Real-life example:

a screenrecording of cloning studyforrest data from github

Publishing datasets

Special case 1: repositories with annex support

Publishing datasets

Special case 2: Special remotes with repositories

Publishing datasets

Special case 1: repositories with annex support

Publishing datasets

Special case 2: Special remotes with repositories

Cloning DataLad datasets

How does cloning dataset feel like for a consumer?

Cloning DataLad datasets

How does cloning dataset feel like for a consumer?

Cloning DataLad datasets

How does cloning dataset feel like for a consumer?

Cloning DataLad datasets

Let's take a look at the special cases:

Cloning DataLad datasets

Let's take a look at the special cases:

Data sharing using Seafile / Keeper

More info: DataLad Handbook chapter on Third party infrastructure

Keeper (Seafile): A flexible DataLad sibling

https://keeper.mpdl.mpg.de/
Keeper offers 1TB flexible storage to all Max Planck employees *

* Even if you end up not using DataLad, you might want to check Keeper out and get those 1TB of storage space!
* Data are stored on servers of the Max Planck Society - your data protection officer will be pleased!

Creating a remote repository

On GitLab (here, of the MPIB) ...

Creating a remote repository

... or on GitHub:

Our comic collection is published!

Sharing the Keeper library with your collaborators ...

... will allow them to access the annexed contents of your dataset!
(after they successfully configured the seafile rclone special remote)

$ datalad siblings -d "/Users/wittkuhn/Desktop/cartoon-collection" enable -s seafile
.: seafile(?) [git]

$ datalad siblings # just checking if all siblings are configured
.: here(+) [git]
.: seafile(+) [rclone]
.: origin(-) [https://git.mpib-berlin.mpg.de/wittkuhn/cartoon-collection.git (git)]

$ datalad get compiling.png
get(ok): compiling.png (file) [from seafile...]

Step 1: Share the special remote and the repository

🎉

But wait... I also know cool comics!

From data sharing to collaboration

Add a collaborator on GitLab

And don't forget to add your collaborators to your Seafile / Keeper library!

Collaboration!

Typical collaborative workflow:

Optional: Create an issue

It is good practice to let you collaborators know what you are working on. Creating an issue on GitLab is a good way to give them a heads-up and discuss plans

Collaboration!

Typical collaborative workflow:

Optional: Create an issue

It is good practice to let you collaborators know what you are working on. Creating an issue on GitLab is a good way to give them a heads-up and discuss plans

Collaboration!

Step 1: Create a new branch in your dataset

It is good practice to develop a new feature/add data/extend code/... in a new branch

$ git checkout -b morecomics
Switched to a new branch 'morecomics'

Collaboration!

Step 2: Make a change, and save it

$ wget https://imgs.xkcd.com/comics/interdisciplinary.png
--2020-11-15 12:23:22--  https://imgs.xkcd.com/comics/interdisciplinary.png
Resolving imgs.xkcd.com (imgs.xkcd.com)... 2a04:4e42:1b::67, 151.101.112.67
Connecting to imgs.xkcd.com (imgs.xkcd.com)|2a04:4e42:1b::67|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 27246 (27K) [image/png]
Saving to: ‘interdisciplinary.png’

interdisciplinary.p 100%[===================>]  26.61K  --.-KB/s    in 0.001s

2020-11-15 12:23:23 (29.4 MB/s) - ‘interdisciplinary.png’ saved [27246/27246]

$ datalad save -m "Add another fun comic" interdisciplinary.png
add(ok): interdisciplinary.png (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

Collaboration!

Optional: Configure a publication dependency to seafile

With a publication dependency, datalad push pushes annexed data to seafile, and the rest to GitLab

$ datalad siblings
.: here(+) [git]
.: origin(-) [https://git.mpib-berlin.mpg.de/wittkuhn/cartoon-collection.git (git)]
.: seafile(+) [rclone]
$ git config --local remote.origin.datalad-publish-depends seafile

This translates to: When pushing anything to the sibling/remote "origin", push changes to the sibling "seafile" first!

Collaboration!

Step 4: Push your change and create a merge request

$ datalad push --to origin
Push to 'origin':  25%|████▎            | 1.00/4.00 [00:00<00:00, 11.6k Steps/s]
Username for 'https://git.mpib-berlin.mpg.de': adina.wagner
Password for 'https://adina.wagner@git.mpib-berlin.mpg.de':
copy(ok): interdisciplinary.png (file) [to seafile...]
Update availability for 'origin':  75%|▊| 3.00/4.00 [00:00<00:00, 7.48k Steps/s]
Username for 'https://git.mpib-berlin.mpg.de': adina.wagner<00:01, 1.31 Steps/s]
Password for 'https://adina.wagner@git.mpib-berlin.mpg.de':
publish(ok): . (dataset) [refs/heads/git-annex->origin:refs/heads/git-annex 048e2c4..595f30a]
publish(ok): . (dataset) [refs/heads/morecomics->origin:refs/heads/morecomics [new branch]]

Collaboration!

Step 4: Push your change and create a merge request

Collaboration!

Step 4: Push your change and create a merge request

Collaboration!

Step 4: Push your change and create a merge request

Great! 🎉

But, phewww, isn't there an "easier" way?

"My collaborators don't want to deal with DataLad" 😢

Don't worry, just let them download the dataset:

But tell them they are missing out on learning an awesome tool! 😜 (Admittedly, if you publish publicly, not everyone will consume your dataset through DataLad, so the download option is nice to have!)

Collaboration!

Overview of a typical collaborative workflow:

Optional: Create an issue

It is good practice to let you collaborators know what you are working on. Creating an issue on GitLab is a good way to give them a heads-up and discuss plans

Step 1: Create a new branch in your dataset

It is good practice to develop a new feature/data point/code in a new branch

Step 2: Make a change, and save it

Step 3: (optional) Configure a publication dependency

With a publication dependency, datalad push pushes annexed data to a special remote, and the rest to GitLab

Step 4: Push your change and create a merge request

Publishing data

Transport logistics

Interoperability

Interoperability

Publishing datasets

Publishing datasets

Publishing datasets

Publishing datasets

Publishing datasets

Publishing datasets

Publishing datasets

Publishing datasets

Publishing datasets

Publishing datasets

Publishing datasets

Cloning DataLad datasets

Cloning DataLad datasets

Cloning DataLad datasets

Cloning DataLad datasets

Cloning DataLad datasets

Data sharing using Seafile / Keeper

Keeper (Seafile): A flexible DataLad sibling

Creating a remote repository

Creating a remote repository

Our comic collection is published!

🎉

But wait... I also know cool comics!

From data sharing to collaboration

Add a collaborator on GitLab

Collaboration!

Collaboration!

Collaboration!

Collaboration!

Collaboration!

Collaboration!

Collaboration!

Collaboration!

Collaboration!

Collaboration!

"My collaborators don't want to deal with DataLad" 😢

Collaboration!

Overview of a typical collaborative workflow:

Questions!