Create workflow_overview.md #83

NadiaBlostein · 2022-12-08T21:55:46Z

No description provided.

kousu

Thanks Nadia, this is a good start. And thanks for submitting it as a PR, that really helps!

You can read below that I think so far a lot of this duplicates content we already have. I think you are hoping to just have a single high-level process but in technical writing it is tricky to balance detail against accuracy (and too much detail slides into inaccuracy because it makes peoples eyes glaze over and miss steps. 😅 )

I am thinking this doc needs to be something like:

Make a [git repo](LINK TO GIT TUTORIAL) on GitHub (or on an internal private git server??) for your analysis code
Find your data on the [data server](LINK TO DATA SERVER DOC) and include it data by (one of)
1. Running git clone -b r20230623 [email protected]:datasets/<DATASET> /tmp/<DATASET> && (cd /tmp/<DATASET> && git annex get) (or some other folder??) manually and making sure to prefix all analysis commands with that path, e.g. sct_analyze_lesion -i /tmp/<DATASET>/path/to/file.nii.gz
2. Writing git clone -b r20230623 [email protected]:datasets/<DATASET> && (cd <DATASET> && git annex get) in a top level analyse.py or analyse.sh script and adding <DATASET>/ to your .gitignore
3. Using git submodule add -b r20230623 [email protected]:datasets/<DATASET> && (cd <DATASET> && git annex get)
4. other alternatives??
Troubleshoot problems by seeing the [git-annex](LINK TO GIT ANNEX PAGE) page.
Run git push regularly to share your work and ask for feedback early
Test your reproducibility by having someone else git clone ((or git clone --recurse-submodules)) your project and running its analysis script on a different machine at a different time. If it doesn't run with just one or two command lines, fix it so it does by adding the necessary steps to your top level driver script.

On a style point, I feel this file should probably be in a subfolder. I'm not sure where though, this crosses both "Onboarding" and "Geek Tips" and "Computing Resources/Data". I notice you added an emoji to the title, which means it would already fit in with the other top-level pages table of contents. Was that your intention?

Wherever you put it, you'll need to find the nearest toctree and add it there in order for this page to show up, e.g.:

intranet.neuro.polymtl.ca/mri-scanning/README.md

Lines 7 to 12 in 4f32201

    
           ```{toctree} 
        
           :hidden: 
        
           unf-3t-prisma 
        
           mni-mcgill-7t-terra 
        
           mhi-7t-agilent 
        
           ```

If you want it in the top-level folder, this is the toctree to edit:

intranet.neuro.polymtl.ca/README.md

Lines 3 to 33 in 4f32201

    
           ```{toctree} 
        
           :caption: Ressources 
        
           :hidden: 
        
           onboarding/README 
        
           agenda-and-calendar 
        
           computing-resources/README 
        
           data/README 
        
           mri-scanning/README 
        
           rf-lab/README 
        
           ``` 
        
           ```{toctree} 
        
           :caption: Academic Life 
        
           :hidden: 
        
           courses 
        
           scholarships 
        
           conferences 
        
           bibliography/README 
        
           ideas-for-cool-projects 
        
           writing-articles 
        
           ``` 
        
           ```{toctree} 
        
           :caption: Miscellaneous 
        
           :hidden: 
        
           practical-information/README 
        
           geek-tips/README 
        
           edi 
        
           contact 
        
           NeuroPoly Website <https://neuro.polymtl.ca> 
        
           ```

kousu · 2022-12-08T22:27:43Z

workflow_overview.md

@@ -0,0 +1,57 @@
+
+# 📄 General Overview of Project Workflow


I appreciate that you noticed and kept the emojis here! There's a small glitch about sphinx, all the top-level pages need to have their emojis wrapped like this to get them to align properly

Suggested change

# 📄 General Overview of Project Workflow

# <span>📄</span> General Overview of Project Workflow

kousu · 2022-12-08T22:41:37Z

workflow_overview.md

+
+# 📄 General Overview of Project Workflow
+
+Once your [onboarding](https://intranet.neuro.polymtl.ca/onboarding/README.html) is complete, you will be ready to tackle your project!


I try to use relative links everywhere, because the domain this is on is not promised to last forever. So

Suggested change

Once your [onboarding](https://intranet.neuro.polymtl.ca/onboarding/README.html) is complete, you will be ready to tackle your project!

Once your [onboarding](onboarding) is complete, you will be ready to tackle your project!

or maybe that doesn't work, tbh I forget how sphinx handles folder links. Maybe better to be explicit:

Suggested change

Once your [onboarding](https://intranet.neuro.polymtl.ca/onboarding/README.html) is complete, you will be ready to tackle your project!

Once your [onboarding](onboarding/README.md) is complete, you will be ready to tackle your project!

Note that the build process -- sphinx -- is smart enough to map .md to .html in the final build, but by keeping the link as a .md then if you've done the relative links right, GitHub's markdown renderer will render the whole site as a fully functional backup. Take a gander at https://github.com/neuropoly/intranet.neuro.polymtl.ca/blob/4f322012f35de2f170111e325110431bda94a7d6/workflow_overview.md, you will see in this version this link is broken. Even once this is published to the live site, I'd still count that link as broken in spirit 😲 because it'd go to an external site and so the docs aren't self-contained.

You can double-check how your work will render in the end with our theme and all the links fixed up properly. It's a bit more work, but you can just check out this branch on your computer and follow these instructions:

intranet.neuro.polymtl.ca/setup.py

Lines 2 to 7 in 4f32201

To build the docs:

pip install .[sphinx]

make html

They will end up in _build/html/

(these are the same commands that are used by GitHub to publish the live copy of the site)

kousu · 2022-12-08T22:46:57Z

workflow_overview.md

+**Step 3.**
+* Create your project working directory:
+```
+cd data_nvme_<POLYGRAMES_USERNAME>


Most systems don't have /mnt/nvme or ~/data_nvme_$USER. That's really just a hack we added for joplin at some point, and it's already documented here

intranet.neuro.polymtl.ca/computing-resources/neuropoly/cpus.md

Lines 15 to 17 in 58f03fe

| **Hostname** | `joplin.neuro.polymtl.ca` |

For fast I/O, use the NVMe hard drive, which is automatically available: `~/data_nvme_$USER`

I would say there's no need to specify this. Anywhere someone has access that has enough space is fine, and if you just don't mention it then most people will by default end up working in their home directories, which should work perfectly well on most systems, at least to start out. The big gotcha is that combining duke and git is a bad idea, but I think documenting that is probably out of scope of this.

kousu · 2022-12-08T22:49:48Z

workflow_overview.md

+mkdir <PROJECT_NAME>
+cd <PROJECT_NAME>


We can help people out by being explicit here:

Suggested change

mkdir <PROJECT_NAME>

cd <PROJECT_NAME>

mkdir <PROJECT_NAME>

cd <PROJECT_NAME>

git init

kousu · 2022-12-08T22:58:40Z

workflow_overview.md

+* After adding your NeuroPoly workstation [SSH key to your Github account](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account?platform=linux), you are ready to make a local fork of that remote repository:
+```
+cd data_nvme_<POLYGRAMES_USERNAME>/<PROJECT_NAME>
+git clone -b "<YOUR_WORKING_BRANCH>" [email protected]:<REPOSITORY>.git


I think we want everything we do to be tracked by git by default. Things that should not be under git, like private data, or test files, can either go in /tmp or can be added to .gitignore explicitly. That is to say, just have <PROJECT_NAME>/, not <PROJECT_NAME>/<REPOSITORY>/.

Then, we can either git init locally and git push, or do the reverse and click the New Repo button on GitHub (which runs git init on their side) and then git clone. Either way works and GitHub gives helpful pointers to guide people through either workflow. Once you've done either two or three times they become obvious and most developers don't bother to.

We already have a page that's supposed to cover this:

intranet.neuro.polymtl.ca/geek-tips/git.md

Line 1 in 58f03fe

# git & Github

and while it needs a lot of love at the moment, it'd be better to work these tips into it than repeat them here. Can we arrange it so this section is just a link to that page? Maybe that page needs to get some edits in this PR as well to bring it up to speed with our current practices and/or to slim out the advice we don't use.

kousu · 2022-12-08T23:04:49Z

workflow_overview.md

+```
+git annex drop .
+```
+* Any data derivatives that you output should be added to `data.neuro:datasets/<PROJECT_DATASET>` according to the [BIDS](https://bids-specification.readthedocs.io/en/stable/) data standard! More documentation on how to version control your data on `data.neuro` can be found [here](https://intranet.neuro.polymtl.ca/data/git-datasets.html#update).


Ditto on the absolute link: better to make it relative.

kousu · 2022-12-08T23:20:35Z

workflow_overview.md

+* Thanks to `git annex`, the following command will copy the directory structure and some small files of your dataset on `data.neuro`:
+```
+cd data_nvme_<POLYGRAMES_USERNAME>/<PROJECT_NAME>
+git clone [email protected]:datasets/<PROJECT_DATASET> 


This is the most interesting part to me. This is the part where we might address neuropoly/data-management#136.

One option here would be

Suggested change

git clone [email protected]:datasets/<PROJECT_DATASET>

git submodule add [email protected]:datasets/<PROJECT_DATASET>

This would be the Datalad YODA recommendation. They don't use the word "submodule" on that page but it is what they have in mind.

We can also do

Suggested change

git clone [email protected]:datasets/<PROJECT_DATASET>

git submodule add -b v1.0.3 [email protected]:datasets/<PROJECT_DATASET>

to pick out a specific older version (in this case v1.0.3).

Then when you git push the code to GitHub it looks like this:

The dataset folder is unclickable because it's tagged as a submodule reference, and moreover it's a submodule that is on the private, intentionally-inaccessible, storage node:

but if I'm on one of our internal processing nodes with permission to the datasets, I can reproduce my entire project with git clone --recurse-submodules

git clone --recurse-submodules

p115628@bireli:~/src$ git clone --recurse-submodules https://github.com/kousu/proj1 Clonage dans 'proj1'... remote: Enumerating objects: 6, done. remote: Counting objects: 100% (6/6), done. remote: Compressing objects: 100% (4/4), done. remote: Total 6 (delta 0), reused 6 (delta 0), pack-reused 0 Réception d'objets: 100% (6/6), fait. Sous-module 'canproco' ([email protected]:datasets/canproco) enregistré pour le chemin 'canproco' Clonage dans '/home/GRAMES.POLYMTL.CA/p115628/src/proj1/canproco'... remote: Énumération des objets: 51098, fait. remote: Décompte des objets: 100% (51098/51098), fait. remote: Compression des objets: 100% (27474/27474), fait. remote: Total 51098 (delta 22242), réutilisés 39149 (delta 14297), réutilisés du pack 0 Réception d'objets: 100% (51098/51098), 4.44 Mio | 11.12 Mio/s, fait. Résolution des deltas: 100% (22242/22242), fait. Chemin de sous-module 'canproco' : '42d424b0b56b9269fe0c5058130f5e3bc7c9e941' extrait p115628@bireli:~/src$ cd proj1/ p115628@bireli:~/src/proj1$ ls canproco/ dataset_description.json sub-cal135 sub-cal192 sub-edm086 sub-edm183 sub-mon109 sub-mon174 sub-tor031 sub-tor103 sub-van111 sub-van176 derivatives sub-cal136 sub-cal194 sub-edm087 sub-edm184 sub-mon111 sub-mon175 sub-tor032 sub-tor106 sub-van112 sub-van177 participants.json sub-cal137 sub-cal195 sub-edm088 sub-mon001 sub-mon113 sub-mon176 sub-tor033 sub-tor107 sub-van116 sub-van178 participants.tsv sub-cal138 sub-cal197 sub-edm089 sub-mon002 sub-mon118 sub-mon180 sub-tor035 sub-tor109 sub-van123 sub-van180 sub-cal056 sub-cal140 sub-cal198 sub-edm094 sub-mon003 sub-mon119 sub-mon181 sub-tor036 sub-tor110 sub-van124 sub-van181 sub-cal072 sub-cal142 sub-cal199 sub-edm095 sub-mon004 sub-mon121 sub-mon183 sub-tor037 sub-tor112 sub-van125 sub-van182 sub-cal073 sub-cal143 sub-cal200 sub-edm098 sub-mon005 sub-mon124 sub-mon185 sub-tor038 sub-tor114 sub-van129 sub-van183 sub-cal078 sub-cal144 sub-cal201 sub-edm105 sub-mon006 sub-mon125 sub-mon186 sub-tor039 sub-tor115 sub-van131 sub-van184 sub-cal080 sub-cal145 sub-cal202 sub-edm107 sub-mon007 sub-mon126 sub-mon187 sub-tor040 sub-tor118 sub-van133 sub-van185 sub-cal083 sub-cal146 sub-cal206 sub-edm113 sub-mon009 sub-mon129 sub-mon189 sub-tor041 sub-tor121 sub-van134 sub-van186 sub-cal084 sub-cal149 sub-cal207 sub-edm114 sub-mon010 sub-mon131 sub-mon190 sub-tor043 sub-tor123 sub-van135 sub-van189 sub-cal085 sub-cal150 sub-cal209 sub-edm118 sub-mon011 sub-mon132 sub-mon191 sub-tor044 sub-tor124 sub-van136 sub-van191 sub-cal088 sub-cal151 sub-cal210 sub-edm123 sub-mon013 sub-mon133 sub-mon192 sub-tor049 sub-tor125 sub-van137 sub-van192

which is pretty cool! And a lot more reliable than most other reproduction methods, which say "go dig up this DOI and try to find the matching dataset on Zenodo, and then the paper, and some code that went with the paper on some obscure university FTP site" (and don't say, but should, "and make sure you're running such and such a version of such and such an OS and using such and such a version of python and using such and such a version of nvidia's GPU hardware"...)

@sandrinebedard gave me a tour of how she handled https://github.com/sct-pipeline/ukbiobank-spinalcord-csa last year.

In short, she wrote a detailed process guide in

https://github.com/sct-pipeline/ukbiobank-spinalcord-csa/blob/2f2cec3b91294153a635a8f725c0fbc749d172ca/README.md?plain=1#L42-L43

https://github.com/sct-pipeline/ukbiobank-spinalcord-csa/blob/2f2cec3b91294153a635a8f725c0fbc749d172ca/README.md?plain=1#L94-L100

https://github.com/sct-pipeline/ukbiobank-spinalcord-csa/blob/2f2cec3b91294153a635a8f725c0fbc749d172ca/README.md?plain=1#L111-L120

While explaining, she realized some parts were left undocumented: she set up a conda environment to avoid surprise breakage, which may cause different results since requirements.txt doesn't specify versions, and there is a git tag named "r20210928" attached to the precise dataset she analysed

p115628@joplin:~/datasets/uk-biobank-processed$ git remote -v origin [email protected]:datasets/uk-biobank-processed (fetch) origin [email protected]:datasets/uk-biobank-processed (push) p115628@joplin:~/datasets/uk-biobank-processed$ git log HEAD~.. commit 7ed28cd0aacaab0ce1570bccdba3bff495b5f496 (HEAD -> master, tag: show, tag: r20210928, origin/master, origin/HEAD) Author: Sandrine Bedard <[email protected]> Date: Sun Sep 5 17:49:04 2021 -0400 change suffix in derivatives from labels-manual to labels-disc-manual

but she hasn't documented that that's the tag to go with the project. Otherwise her procedure fits option 3: write down all the steps and expect people will read all your docs and follow all your instructions.

Maybe you can already tell but I'm leaning against option 3. I think we need something more automated because if there's one thing I know about human-computer interaction it's that people don't read instructions. I don't know if git submodule (option 1) is the right answer but I think we need something automated. And we will know if we have it when we can run a weekly cronjob on each of our papers that reproduces the final paper including all figures and statistical analyses from just source code and access to the data server.

Oh perf, I just poked around a bit and noticed I guess that you were trying to reproduce this project: https://github.com/NadiaBlostein/Chemical_Structure_Reconstruction

The very first thing it says pretty much is:

Download the Mordred compound sets (1 through 3) and place them in a data/ folder

I have no idea where to get the Mordred compound sets (1 through 3) and what version (if any) of them I'm supposed to get and I don't know if they assume the data/ folder is supposed to be in the same folder as the code, or if they mean in the current working directory (read it in the unix convention, I would read that as the current working directory!) which isn't necessarily the same.

And

Run the main function in src/thresholding.py.

There is no main function in that file. As a programmer I can read the code and interpret that they must mean "run python src/thesholding.py", but if working with a large series of projects it would be impossible to track down and correct every small gap like that.

Their repo also depends on pandas and some other third-party python packages, but doesn't declare a setup.py or requirements.txt or a conda environment or anything. It doesn't even say what version of python it was written against. All of that is key to write down for reproducibility.

Those are some prime examples of the mistakes I want us to be able to avoid.

kousu · 2022-12-08T23:22:27Z

workflow_overview.md

+
+**Step 5. The data**
+* It is critical to make sure that you know what data you are working with. 
+* Ideally, it should be in [BIDS](https://bids-specification.readthedocs.io/en/stable/) format on the [`data.neuro`](https://intranet.neuro.polymtl.ca/data/git-datasets.html) storage node: `data.neuro:datasets/<PROJECT_DATASET>`.


Ditto with the relative links. Also I'd shorten the BIDS link to be safer against link rot -- their domain will change less frequently than their subfolders, hopefully.

Suggested change

* Ideally, it should be in [BIDS](https://bids-specification.readthedocs.io/en/stable/) format on the [`data.neuro`](https://intranet.neuro.polymtl.ca/data/git-datasets.html) storage node: `data.neuro:datasets/<PROJECT_DATASET>`.

Ideally, it should be in [BIDS](https://bids-specification.readthedocs.io) format. We have many of these on the private [`data`](data/git-datasets.md) server.

We should also think about the workflow for combining datasets. In principle it's easy to include multiple datasets -- each of them is just a different folder, and we can do that with git submodules or almost any other tool. We should think about how to leave the door open to this and avoid tricking people into thinking that there's a 1-to-1 mapping between analyses and datasets, but while remembering that the 1-to-1 situation is the common case.

kousu · 2022-12-08T23:46:52Z

workflow_overview.md

+**Step 1.**
+* Make sure that your VPN connection is established or that you are connected to the Polytechnique wifi.
+
+**Step 2.**
+* Log in to one of the available [Neuropoly compute nodes](https://intranet.neuro.polymtl.ca/computing-resources/neuropoly/README.html):
+```
+ssh <POLYGRAMES_USERNAME>@<STATION>.neuro.polymtl.ca
+```


This duplicates the instructions at

intranet.neuro.polymtl.ca/computing-resources/neuropoly/README.md

Lines 152 to 158 in 58f03fe

### SSH (command line)

Once the VPN connection established, connect via ssh using the `STATION` you want:

```bash

ssh <POLYGRAMES_USERNAME>@<STATION>.neuro.polymtl.ca

```

I think those instructions could be clearer -- it would be nice if that page was broken up -- but in any case it's best not to duplicate the work.

kousu · 2022-12-09T00:31:17Z

workflow_overview.md

+* However, in order to save space, make sure to "undownload" those big files once you are done working with them with:
+```
+git annex drop .
+```


This is a good tip, but I think it would fit better as a new subsection over in https://intranet.neuro.polymtl.ca/data/git-datasets.html#drop.

kousu · 2022-12-14T02:53:23Z

By the way, today Jan and now me spent several hours putting together a BIDS curation guide:

intranet.neuro.polymtl.ca/data/dataset-curation.md

Lines 1 to 167 in b89d175

    
           # Dataset curation 
        
           ## Subject naming convention 
        
           **Basic convention** 
        
           sub-XXX 
        
           example: 
        
           - sub-001 
        
           - sub-002 
        
           **Multi-institution/Multi-pathology convention** 
        
           sub-<site><pathology>XXX 
        
           example: 
        
           - sub-montrealDCM001 
        
           - sub-torontoHC001 
        
           ## BIDS template 
        
           ⚠️ Every dataset must have the following files:  
        
           ``` 
        
           ├── README.md 
        
           ├── participants.json 
        
           ├── participants.tsv 
        
           ├── dataset_description.json 
        
           └── code 
        
               └── curate.py 
        
           ``` 
        
           For details, see [BIDS specification](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#code). 
        
           ### `README.md` 
        
           The [`README.md`](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#readme) is a [markdown](https://markdown-guide.readthedocs.io/en/latest/index.html) file describing the dataset in more detail. 
        
           Below is a template - modify it!!! 
        
           <details><summary>README.md template:</summary> 
        
           ``` 
        
           # <NAME OF DATASET> 
        
           This is an <MRI/Microscopy> dataset acquired in the context of the <XYZ> project.  
        
           <IF DATASET CONTAINS DERIVATIVES>It also contains <manual segmentation/labels> of <MS lesions/tumors/etc> from <one/two/or more> expert raters located under the derivatives folder. 
        
           ## contact person 
        
           Dataset shared by: <NAME AND EMAIL> 
        
           <IF THERE WAS EMAIL COMM>Email communication: <DATE OF EMAIL AND SUBJECT> 
        
           <REPOSITORY OF PROJECT/MODEL, etc>Repository: https://github.com/<organization>/<repository_name> 
        
           ## <IF DATA ARE MISSING FOR SOME SUBJECT(S)>missing data 
        
           <LIST HERE MISSING SUBJECTS> 
        
           ``` 
        
           </details> 
        
           ### `participants.tsv` 
        
           The `participants.tsv` is a TSV file and should include at least the following columns: 
        
           | participant_id | age | pathology | source_id  | institution | 
        
           | ----------- | ----------- | ----------- | ----------- | ----------- | 
        
           | sub-001 | 30 | HC | 001 | montreal | 
        
           | sub-002 | 40 | MS | 005 | montreal | 
        
           | sub-003 | n/a | MS | 007 | montreal | 
        
           - `participant_id` - unique participant ID 
        
           - `age` - participant age 
        
           - `pathology` - pathology name; can take values listed in the [pathology column](https://docs.google.com/spreadsheets/d/1yjcA8Z0COn4OZxusIDHjStH2DpeXvscsj-aWE2X-_sg/edit?usp=sharing) 
        
           - `source_id` -  subject ID used to identify subject in the unprocessed data 
        
           - `institution` - human-friendly institution name 
        
           - others - if available, include also demographic characteristics such as `age`, `sex`, `height`, `weight`, `researcher`, and [additional columns](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#participants-file). 
        
           ❗️Indicate missing values with `n/a` (for "not available"), not by empty cells! 
        
           ### `participants.json` 
        
           The `participants.json` is a JSON file describing the column names in the `participants.tsv` and properties of their values. 
        
           <details><summary>participants.json template:</summary> 
        
           ```json 
        
           { 
        
               "participant_id": { 
        
                   "Description": "Unique Participant ID", 
        
                   "LongName": "Participant ID" 
        
               }, 
        
                 "age": { 
        
                   "Description": "Participant age", 
        
                   "LongName": "Participant age" 
        
               }, 
        
               "pathology": { 
        
                   "Description": "Pathology", 
        
                   "LongName": "Pathology name" 
        
               }, 
        
               "source_id": { 
        
                   "Description": "Subject ID in the unprocessed data", 
        
                   "LongName": "Subject ID in the unprocessed data" 
        
               }, 
        
               "institution": { 
        
                   "Description": "Institution ID after conversion to BIDS", 
        
                   "LongName": "BIDS Institution ID" 
        
               } 
        
           } 
        
           ``` 
        
           </details> 
        
           ### `dataset_description.json` 
        
           The [`dataset_description.json`](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#dataset_descriptionjson) is a JSON file describing the dataset. 
        
           ❗The `dataset_description.json` file within the top-level dataset should include `"DatasetType": "raw"`. 
        
           <details><summary>dataset_description.json template:</summary> 
        
           ```json 
        
           { 
        
               "BIDSVersion": "BIDS X.Y.Z", 
        
               "Name": "<dataset_name>", 
        
               "DatasetType": "raw" 
        
           } 
        
           ``` 
        
           </details> 
        
           ### `code/` 
        
           The data curation script(s) should be placed inside the BIDS datasets, under the `code/` folder. For more convenience, you can create a PR with a curation script that you are working on, so that others can give feedback; once the script is validated, you can simply close the PR and delete the branch without merging. 
        
           ## derivatives structure 
        
           The [`derivatives`](https://bids-specification.readthedocs.io/en/stable/05-derivatives/01-introduction.html) are files generated from the top-level dataset such as segmentations or labels. 
        
           Convention for derivatives JSON metadata: 
        
           ```json 
        
           { 
        
             "Author": "Firstname Lastname", 
        
             "Date": "YYYY-MM-DD HH:MM:SS" 
        
           } 
        
           ``` 
        
           NOTE: "Date" is optional. We usually include it when running the manual correction via python scripts. 
        
           ❗The `derivatives` must include its own `dataset_description.json` file (with `"DatasetType": "derivative"`). 
        
           ## Changelog policy 
        
           We will use git log over [CHANGES](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#changes) BIDS file. 
        
           Good commit message examples: 
        
           ``` 
        
           Add manual segmentations 
        
           ``` 
        
           or 
        
           ``` 
        
           Add new subjects from <email_adress> 
        
           ```

We could add a link to this to the workflow. Some projects start by curating their own data, some just use pre-curated data, and others involve generating derivative datasets (which also need to be curated).

There are two other documents I'm involved with that duplicate this work:

none of these are finished, or even necessarily correct.

I would like to get all steps of all our workflows -- collection, curation, analysis, reproduction -- down in our wiki to offer as a gold standard.

kousu · 2022-12-14T05:02:00Z

Also I've left a blank spot in

intranet.neuro.polymtl.ca/data/dataset-curation.md

Lines 174 to 176 in 6222bd9

    
           ```{note} 
        
           Analysis scripts should not be kept here. Keep them in separate repositories, usually in public on GitHub, with instructions about. See [PIPELINE-DOC](TODO-PIPELINE-DOC). 
        
           ```

to link up this doc when it's ready!

Create workflow_overview.md

4f32201

NadiaBlostein assigned mguaypaq, kousu, NadiaBlostein, valosekj and sandrinebedard Dec 8, 2022

kousu requested changes Dec 9, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create workflow_overview.md #83

Create workflow_overview.md #83

NadiaBlostein commented Dec 8, 2022

kousu left a comment •

edited

Loading

kousu Dec 8, 2022

kousu Dec 8, 2022 •

edited

Loading

kousu Dec 8, 2022

kousu Dec 8, 2022

kousu Dec 8, 2022

kousu Dec 8, 2022

kousu Dec 8, 2022

kousu Dec 9, 2022

kousu Dec 9, 2022

kousu Dec 8, 2022 •

edited

Loading

kousu Dec 9, 2022 •

edited

Loading

kousu Dec 8, 2022

kousu Dec 9, 2022

kousu commented Dec 14, 2022 •

edited

Loading

kousu commented Dec 14, 2022 •

edited

Loading

	```{toctree}
	:hidden:
	unf-3t-prisma
	mni-mcgill-7t-terra
	mhi-7t-agilent
	```

	```{toctree}
	:caption: Ressources
	:hidden:
	onboarding/README
	agenda-and-calendar
	computing-resources/README
	data/README
	mri-scanning/README
	rf-lab/README
	```

	```{toctree}
	:caption: Academic Life
	:hidden:
	courses
	scholarships
	conferences
	bibliography/README
	ideas-for-cool-projects
	writing-articles
	```

	```{toctree}
	:caption: Miscellaneous
	:hidden:
	practical-information/README
	geek-tips/README
	edi
	contact
	NeuroPoly Website <https://neuro.polymtl.ca>
	```

	# 📄 General Overview of Project Workflow
	# <span>📄</span> General Overview of Project Workflow


		# 📄 General Overview of Project Workflow

		Once your [onboarding](https://intranet.neuro.polymtl.ca/onboarding/README.html) is complete, you will be ready to tackle your project!

	Once your [onboarding](https://intranet.neuro.polymtl.ca/onboarding/README.html) is complete, you will be ready to tackle your project!
	Once your [onboarding](onboarding) is complete, you will be ready to tackle your project!

	Once your [onboarding](https://intranet.neuro.polymtl.ca/onboarding/README.html) is complete, you will be ready to tackle your project!
	Once your [onboarding](onboarding/README.md) is complete, you will be ready to tackle your project!

	To build the docs:

	pip install .[sphinx]
	make html

	They will end up in _build/html/

	\| Hostname \| `joplin.neuro.polymtl.ca` \|

	For fast I/O, use the NVMe hard drive, which is automatically available: `~/data_nvme_$USER`

	git clone [email protected]:datasets/<PROJECT_DATASET>
	git submodule add [email protected]:datasets/<PROJECT_DATASET>

	git clone [email protected]:datasets/<PROJECT_DATASET>
	git submodule add -b v1.0.3 [email protected]:datasets/<PROJECT_DATASET>

	* Ideally, it should be in [BIDS](https://bids-specification.readthedocs.io/en/stable/) format on the [`data.neuro`](https://intranet.neuro.polymtl.ca/data/git-datasets.html) storage node: `data.neuro:datasets/<PROJECT_DATASET>`.
	Ideally, it should be in [BIDS](https://bids-specification.readthedocs.io) format. We have many of these on the private [`data`](data/git-datasets.md) server.

	### SSH (command line)

	Once the VPN connection established, connect via ssh using the `STATION` you want:

	```bash
	ssh <POLYGRAMES_USERNAME>@<STATION>.neuro.polymtl.ca
	```

Create workflow_overview.md #83

Are you sure you want to change the base?

Create workflow_overview.md #83

Conversation

NadiaBlostein commented Dec 8, 2022

kousu left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kousu Dec 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kousu Dec 8, 2022 • edited Loading

Choose a reason for hiding this comment

kousu Dec 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kousu commented Dec 14, 2022 • edited Loading

kousu commented Dec 14, 2022 • edited Loading

kousu left a comment •

edited

Loading

kousu Dec 8, 2022 •

edited

Loading

kousu Dec 8, 2022 •

edited

Loading

kousu Dec 9, 2022 •

edited

Loading

kousu commented Dec 14, 2022 •

edited

Loading

kousu commented Dec 14, 2022 •

edited

Loading