| Julian Kosciessa  @JulianKosciessa | (kindly power-point-karaoke-ing with slides from Adina Wagner) | 
| Cognitive Neuromodulation Group, Donders Institute for Brain, Cognition and Behaviour |  | 
         
    
datalad tag| 
 | Funders               Collaborators                   | 
 
         
         
         
         
      | 
 | 
 
    Important! The Hub is a shared resource. Don't fill it up :)
        
            git config --get user.name
            git config --get user.email
        
    
    
            
                git config --global user.name "Adina Wagner"
                git config --global user.email "adina.wagner@t-online.de"
            
        
    
            
                git config --global --add datalad.extensions.load next
            
        
    
        
            datalad --version
        
        
    
    
            
                datalad --help
            
            The help may be displayed in a pager - exit it by pressing "q"
        
    datalad wtf.
        Let's find out what kind of system we're on:
        
            
                datalad wtf -S system
            
        
    
        
            ipython
        
    
    
            
                import datalad.api as dl
                dl.create(path='mydataset')
            
        
    
            
                exit
            
        
    yoda configuration, which adds
    a helpful structure and configuration for data analyses):  
    
        
            datalad create -c yoda my-analysis
        
    
    cd (change directory):
        
            
                cd my-analysis
            
        
    ls:
        
            
                ls -la .
            
        
    
        
            echo "# My example DataLad dataset" >| README.md
        
    
    status of the dataset:
        
            
                datalad status
            
        
    save
        
            
                datalad save -m "Add project title into the README"
            
        
    
            
                echo "Contains a small data analysis for my project" >> README.md
            
        
    
            
                git diff
            
        
    
            
                datalad save -m "Add information on the dataset contents to the README"
            
        
    
            
                git log
            
        
    
            
                tig
            
            (navigate with arrow keys and enter, press "q" to go back and exit the program)
        
    
            datalad download-url -m "Add an analysis script" \
  -O code/mne_time_frequency_tutorial.py \
  https://raw.githubusercontent.com/datalad-handbook/resources/master/mne_time_frequency_tutorial.py
            
        
    
            git log code/mne_time_frequency_tutorial.py
        
    Procedurally, version control is easy with DataLad!
            black code/mne_time_frequency_tutorial.py
        
    
            git diff
        
    
            git restore code/mne_time_frequency_tutorial.py
        
    
            datalad run -m "Reformat code with black" \
 "black code/mne_time_frequency_tutorial.py"
        
    
            git show
        
    
            datalad rerun
        
    clone.
    Either as a stand-alone entity:
    
        
            # just an example:
            datalad clone \
            https://github.com/psychoinformatics-de/studyforrest-data-phase2.git
        
    
    
        
            # just an example:
            datalad clone -d . \
            https://github.com/psychoinformatics-de/studyforrest-data-phase2.git
        
    
     
    
            datalad clone --dataset . \
   https://github.com/OpenNeuroDatasets/ds003104.git \
   input/
        
    subdatasets command:
        
            
                datalad subdatasets
            
        
    git show:
        
            
                git show
            
        
    
            
                cd input
                ls
            
        
    
            
                tig
            
        
    du disk-usage command):
        
            
                du -sh
            
        
    
            
                datalad status --annex
            
        
    get:
    
        
            datalad get sub-01
        
    
    drop its content:
        
            
                datalad drop sub-01
        
    dl.get('input/sub-01')
[really complex analysis]
dl.drop('input/sub-01')| Git | git-annex | 
| handles small files well (text, code) | handles all types and sizes of files well | 
| file contents are in the Git history and will be shared upon git/datalad push | file contents are in the annex. Not necessarily shared | 
| Shared with every dataset clone | Can be kept private on a per-file level when sharing the dataset | 
| Useful: Small, non-binary, frequently modified, need-to-be-accessible (DUA, README) files | Useful: Large files, private files | 
code/
          directory and the dataset descriptions (e.g., README files) to be in Git.
          There are many other configurations, and you can also
              
                  write your own. 
          
cd ..
python code/mne_time_frequency_tutorial.pydatalad run
            can run any command in a way that links the command or script to the
            results it produces and the data it was computed fromdatalad rerun
            can take this recorded provenance and recompute the commanddatalad containers-run
            (from the extension "datalad-container") can capture software provenance in the form of software containers in addition to the provenance that datalad run capturesdatalad-container extension, we can add software containers
        to datasets and work with them.
        Let's add a software container with Python software to run the script
        
            
               datalad containers-add python-env \
                --url https://files.inm7.de/adina/resources/mne \
                --call-fmt "singularity exec {img} {cmd}"
            
        
    
            
                datalad containers-list
            
        
    containers-run command:
    
        
datalad containers-run -m "run classification analysis in python environment" \
  --container-name python-env \
  --input "input/sub-01/meg/sub-01_task-somato_meg.fif" \
  --output "figures/*" \
  "python3 code/mne_time_frequency_tutorial.py {inputs}"
        
    
    containers-run command has completed?
        datalad diff (based on git diff):
        
            
                datalad diff -f HEAD~1
            
        
    
            
                git log -n 1
            
        
    
            
                ssh-keygen -t ed25519 -C "your-email"
                eval "$(ssh-agent -s)"
                ssh-add ~/.ssh/id_ed25519
            
        
    
            
                cat ~/.ssh/id_ed25519.pub
            
        
     
        sibling-repositories
    on various infrastructure and third party services (GitHub, GitLab, OSF, WebDAV-based services, DataVerse, ...)
    , to which data can then be published with push.
    
        
            datalad create-sibling-gin example-analysis --access-protocol ssh
        
    
    siblings command:
        
            
                datalad siblings
            
        
    
            
                datalad push --to gin
            
        
     
        cd ../
datalad clone \
   https://gin.g-node.org/adswa/example-analysis \
   myclone
    
    
            
                cd myclone
            
        
    
            
                datalad get figures/inter_trial_coherence.png
            
        
    
            
                datalad drop figures/inter_trial_coherence.png
            
        
    
            
                datalad rerun
            
        
    | Data changes (errors are fixed, data is extended, naming standards change, an analysis requires only a subset of your data...) |   | 
 
     
     
     
         
         
    = "The tools and processes used to create a digital file, the responsible entity, and when and where the process events occurred"
 
           "In defense of decentralized Research Data Management", doi.org/10.1515/nf-2020-0037
         "In defense of decentralized Research Data Management", doi.org/10.1515/nf-2020-0037 
     
         
    
/dataset
βββ sample1
β   βββ a001.dat
βββ sample2
β   βββ a001.dat
...
/dataset
βββ sample1
β   βββ ps34t.dat
β   βββ a001.dat
βββ sample2
β   βββ ps34t.dat
β   βββ a001.dat
...
/raw_dataset
βββ sample1
β   βββ a001.dat
βββ sample2
β   βββ a001.dat
...
/derived_dataset
βββ sample1
β   βββ ps34t.dat
βββ sample2
β   βββ ps34t.dat
βββ ...
βββ inputs
    βββ raw
        βββ sample1
        β   βββ a001.dat
        βββ sample2
        β   βββ a001.dat
        ...
$ datalad clone --dataset . http://example.com/ds inputs/rawdata
$ git diff HEAD~1
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..c3370ba
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "inputs/rawdata"]
+       path = inputs/rawdata
+       url = http://example.com/importantds
diff --git a/inputs/rawdata b/inputs/rawdata
new file mode 160000
index 0000000..fabf852
--- /dev/null
+++ b/inputs/rawdata
@@ -0,0 +1 @@
+Subproject commit fabf8521130a13986bd6493cb33a70e580ce8572
 
     
 
 
|   | Women neuroscientists are underrepresented in neuroscience. You can use the Repository for Women in Neuroscience to find and recommend neuroscientists for conferences, symposia or collaborations, and help making neuroscience more open & divers. | 
datalad create creates an empty dataset.datalad save records the dataset or file state to the history. datalad download-url obtains web content and records its origin. datalad status reports the current state of the dataset.datalad clone installs a dataset.datalad get  downloads file content on demand.datalad run records a command and
            its impact on the dataset.--input
            are retrieved prior to command execution.--output
            will be unlocked for modifications prior to a rerun of the command. datalad containers-run can be used
            to capture the software environment as provenance.datalad rerun can automatically re-execute run-records later.