Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create workflow_overview.md #83

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

NadiaBlostein
Copy link
Contributor

No description provided.

Copy link
Member

@kousu kousu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nadia, this is a good start. And thanks for submitting it as a PR, that really helps!

You can read below that I think so far a lot of this duplicates content we already have. I think you are hoping to just have a single high-level process but in technical writing it is tricky to balance detail against accuracy (and too much detail slides into inaccuracy because it makes peoples eyes glaze over and miss steps. 😅 )

I am thinking this doc needs to be something like:

  1. Make a [git repo](LINK TO GIT TUTORIAL) on GitHub (or on an internal private git server??) for your analysis code

  2. Find your data on the [data server](LINK TO DATA SERVER DOC) and include it data by (one of)

    1. Running git clone -b r20230623 [email protected]:datasets/<DATASET> /tmp/<DATASET> && (cd /tmp/<DATASET> && git annex get) (or some other folder??) manually and making sure to prefix all analysis commands with that path, e.g. sct_analyze_lesion -i /tmp/<DATASET>/path/to/file.nii.gz
    2. Writing git clone -b r20230623 [email protected]:datasets/<DATASET> && (cd <DATASET> && git annex get) in a top level analyse.py or analyse.sh script and adding <DATASET>/ to your .gitignore
    3. Using git submodule add -b r20230623 [email protected]:datasets/<DATASET> && (cd <DATASET> && git annex get)
    4. other alternatives??

    Troubleshoot problems by seeing the [git-annex](LINK TO GIT ANNEX PAGE) page.

  3. Run git push regularly to share your work and ask for feedback early

  4. Test your reproducibility by having someone else git clone ((or git clone --recurse-submodules)) your project and running its analysis script on a different machine at a different time. If it doesn't run with just one or two command lines, fix it so it does by adding the necessary steps to your top level driver script.


On a style point, I feel this file should probably be in a subfolder. I'm not sure where though, this crosses both "Onboarding" and "Geek Tips" and "Computing Resources/Data". I notice you added an emoji to the title, which means it would already fit in with the other top-level pages table of contents. Was that your intention?

Wherever you put it, you'll need to find the nearest toctree and add it there in order for this page to show up, e.g.:

```{toctree}
:hidden:
unf-3t-prisma
mni-mcgill-7t-terra
mhi-7t-agilent
```

If you want it in the top-level folder, this is the toctree to edit:

```{toctree}
:caption: Ressources
:hidden:
onboarding/README
agenda-and-calendar
computing-resources/README
data/README
mri-scanning/README
rf-lab/README
```
```{toctree}
:caption: Academic Life
:hidden:
courses
scholarships
conferences
bibliography/README
ideas-for-cool-projects
writing-articles
```
```{toctree}
:caption: Miscellaneous
:hidden:
practical-information/README
geek-tips/README
edi
contact
NeuroPoly Website <https://neuro.polymtl.ca>
```

@@ -0,0 +1,57 @@

# 📄 General Overview of Project Workflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that you noticed and kept the emojis here! There's a small glitch about sphinx, all the top-level pages need to have their emojis wrapped like this to get them to align properly

Suggested change
# 📄 General Overview of Project Workflow
# <span>📄</span> General Overview of Project Workflow


# 📄 General Overview of Project Workflow

Once your [onboarding](https://intranet.neuro.polymtl.ca/onboarding/README.html) is complete, you will be ready to tackle your project!
Copy link
Member

@kousu kousu Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to use relative links everywhere, because the domain this is on is not promised to last forever. So

Suggested change
Once your [onboarding](https://intranet.neuro.polymtl.ca/onboarding/README.html) is complete, you will be ready to tackle your project!
Once your [onboarding](onboarding) is complete, you will be ready to tackle your project!

or maybe that doesn't work, tbh I forget how sphinx handles folder links. Maybe better to be explicit:

Suggested change
Once your [onboarding](https://intranet.neuro.polymtl.ca/onboarding/README.html) is complete, you will be ready to tackle your project!
Once your [onboarding](onboarding/README.md) is complete, you will be ready to tackle your project!

Note that the build process -- sphinx -- is smart enough to map .md to .html in the final build, but by keeping the link as a .md then if you've done the relative links right, GitHub's markdown renderer will render the whole site as a fully functional backup. Take a gander at https://github.com/neuropoly/intranet.neuro.polymtl.ca/blob/4f322012f35de2f170111e325110431bda94a7d6/workflow_overview.md, you will see in this version this link is broken. Even once this is published to the live site, I'd still count that link as broken in spirit 😲 because it'd go to an external site and so the docs aren't self-contained.

You can double-check how your work will render in the end with our theme and all the links fixed up properly. It's a bit more work, but you can just check out this branch on your computer and follow these instructions:

To build the docs:
pip install .[sphinx]
make html
They will end up in _build/html/

(these are the same commands that are used by GitHub to publish the live copy of the site)

**Step 3.**
* Create your project working directory:
```
cd data_nvme_<POLYGRAMES_USERNAME>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most systems don't have /mnt/nvme or ~/data_nvme_$USER. That's really just a hack we added for joplin at some point, and it's already documented here

| **Hostname** | `joplin.neuro.polymtl.ca` |
For fast I/O, use the NVMe hard drive, which is automatically available: `~/data_nvme_$USER`

I would say there's no need to specify this. Anywhere someone has access that has enough space is fine, and if you just don't mention it then most people will by default end up working in their home directories, which should work perfectly well on most systems, at least to start out. The big gotcha is that combining duke and git is a bad idea, but I think documenting that is probably out of scope of this.

Comment on lines +21 to +22
mkdir <PROJECT_NAME>
cd <PROJECT_NAME>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can help people out by being explicit here:

Suggested change
mkdir <PROJECT_NAME>
cd <PROJECT_NAME>
mkdir <PROJECT_NAME>
cd <PROJECT_NAME>
git init

* After adding your NeuroPoly workstation [SSH key to your Github account](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account?platform=linux), you are ready to make a local fork of that remote repository:
```
cd data_nvme_<POLYGRAMES_USERNAME>/<PROJECT_NAME>
git clone -b "<YOUR_WORKING_BRANCH>" [email protected]:<REPOSITORY>.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want everything we do to be tracked by git by default. Things that should not be under git, like private data, or test files, can either go in /tmp or can be added to .gitignore explicitly. That is to say, just have <PROJECT_NAME>/, not <PROJECT_NAME>/<REPOSITORY>/.

Then, we can either git init locally and git push, or do the reverse and click the New Repo button on GitHub (which runs git init on their side) and then git clone. Either way works and GitHub gives helpful pointers to guide people through either workflow. Once you've done either two or three times they become obvious and most developers don't bother to.

We already have a page that's supposed to cover this:

and while it needs a lot of love at the moment, it'd be better to work these tips into it than repeat them here. Can we arrange it so this section is just a link to that page? Maybe that page needs to get some edits in this PR as well to bring it up to speed with our current practices and/or to slim out the advice we don't use.

```
git annex drop .
```
* Any data derivatives that you output should be added to `data.neuro:datasets/<PROJECT_DATASET>` according to the [BIDS](https://bids-specification.readthedocs.io/en/stable/) data standard! More documentation on how to version control your data on `data.neuro` can be found [here](https://intranet.neuro.polymtl.ca/data/git-datasets.html#update).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on the absolute link: better to make it relative.

* Thanks to `git annex`, the following command will copy the directory structure and some small files of your dataset on `data.neuro`:
```
cd data_nvme_<POLYGRAMES_USERNAME>/<PROJECT_NAME>
git clone [email protected]:datasets/<PROJECT_DATASET>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most interesting part to me. This is the part where we might address neuropoly/data-management#136.

One option here would be

Suggested change
git clone [email protected]:datasets/<PROJECT_DATASET>
git submodule add [email protected]:datasets/<PROJECT_DATASET>

This would be the Datalad YODA recommendation. They don't use the word "submodule" on that page but it is what they have in mind.

We can also do

Suggested change
git clone [email protected]:datasets/<PROJECT_DATASET>
git submodule add -b v1.0.3 [email protected]:datasets/<PROJECT_DATASET>

to pick out a specific older version (in this case v1.0.3).

Then when you git push the code to GitHub it looks like this:

Screenshot 2022-12-08 at 18-10-24 kousu_proj1

The dataset folder is unclickable because it's tagged as a submodule reference, and moreover it's a submodule that is on the private, intentionally-inaccessible, storage node:

Screenshot 2022-12-08 at 18-10-37 kousu_proj1

but if I'm on one of our internal processing nodes with permission to the datasets, I can reproduce my entire project with git clone --recurse-submodules

git clone --recurse-submodules
p115628@bireli:~/src$ git clone --recurse-submodules https://github.com/kousu/proj1
Clonage dans 'proj1'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 0), reused 6 (delta 0), pack-reused 0
Réception d'objets: 100% (6/6), fait.
Sous-module 'canproco' ([email protected]:datasets/canproco) enregistré pour le chemin 'canproco'
Clonage dans '/home/GRAMES.POLYMTL.CA/p115628/src/proj1/canproco'...
remote: Énumération des objets: 51098, fait.        
remote: Décompte des objets: 100% (51098/51098), fait.        
remote: Compression des objets: 100% (27474/27474), fait.        
remote: Total 51098 (delta 22242), réutilisés 39149 (delta 14297), réutilisés du pack 0        
Réception d'objets: 100% (51098/51098), 4.44 Mio | 11.12 Mio/s, fait.
Résolution des deltas: 100% (22242/22242), fait.
Chemin de sous-module 'canproco' : '42d424b0b56b9269fe0c5058130f5e3bc7c9e941' extrait
p115628@bireli:~/src$ cd proj1/
p115628@bireli:~/src/proj1$ ls canproco/
dataset_description.json  sub-cal135  sub-cal192  sub-edm086  sub-edm183  sub-mon109  sub-mon174  sub-tor031  sub-tor103  sub-van111  sub-van176
derivatives               sub-cal136  sub-cal194  sub-edm087  sub-edm184  sub-mon111  sub-mon175  sub-tor032  sub-tor106  sub-van112  sub-van177
participants.json         sub-cal137  sub-cal195  sub-edm088  sub-mon001  sub-mon113  sub-mon176  sub-tor033  sub-tor107  sub-van116  sub-van178
participants.tsv          sub-cal138  sub-cal197  sub-edm089  sub-mon002  sub-mon118  sub-mon180  sub-tor035  sub-tor109  sub-van123  sub-van180
sub-cal056                sub-cal140  sub-cal198  sub-edm094  sub-mon003  sub-mon119  sub-mon181  sub-tor036  sub-tor110  sub-van124  sub-van181
sub-cal072                sub-cal142  sub-cal199  sub-edm095  sub-mon004  sub-mon121  sub-mon183  sub-tor037  sub-tor112  sub-van125  sub-van182
sub-cal073                sub-cal143  sub-cal200  sub-edm098  sub-mon005  sub-mon124  sub-mon185  sub-tor038  sub-tor114  sub-van129  sub-van183
sub-cal078                sub-cal144  sub-cal201  sub-edm105  sub-mon006  sub-mon125  sub-mon186  sub-tor039  sub-tor115  sub-van131  sub-van184
sub-cal080                sub-cal145  sub-cal202  sub-edm107  sub-mon007  sub-mon126  sub-mon187  sub-tor040  sub-tor118  sub-van133  sub-van185
sub-cal083                sub-cal146  sub-cal206  sub-edm113  sub-mon009  sub-mon129  sub-mon189  sub-tor041  sub-tor121  sub-van134  sub-van186
sub-cal084                sub-cal149  sub-cal207  sub-edm114  sub-mon010  sub-mon131  sub-mon190  sub-tor043  sub-tor123  sub-van135  sub-van189
sub-cal085                sub-cal150  sub-cal209  sub-edm118  sub-mon011  sub-mon132  sub-mon191  sub-tor044  sub-tor124  sub-van136  sub-van191
sub-cal088                sub-cal151  sub-cal210  sub-edm123  sub-mon013  sub-mon133  sub-mon192  sub-tor049  sub-tor125  sub-van137  sub-van192

which is pretty cool! And a lot more reliable than most other reproduction methods, which say "go dig up this DOI and try to find the matching dataset on Zenodo, and then the paper, and some code that went with the paper on some obscure university FTP site" (and don't say, but should, "and make sure you're running such and such a version of such and such an OS and using such and such a version of python and using such and such a version of nvidia's GPU hardware"...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandrinebedard gave me a tour of how she handled https://github.com/sct-pipeline/ukbiobank-spinalcord-csa last year.

In short, she wrote a detailed process guide in

https://github.com/sct-pipeline/ukbiobank-spinalcord-csa/blob/2f2cec3b91294153a635a8f725c0fbc749d172ca/README.md?plain=1#L42-L43

https://github.com/sct-pipeline/ukbiobank-spinalcord-csa/blob/2f2cec3b91294153a635a8f725c0fbc749d172ca/README.md?plain=1#L94-L100

https://github.com/sct-pipeline/ukbiobank-spinalcord-csa/blob/2f2cec3b91294153a635a8f725c0fbc749d172ca/README.md?plain=1#L111-L120

While explaining, she realized some parts were left undocumented: she set up a conda environment to avoid surprise breakage, which may cause different results since requirements.txt doesn't specify versions, and there is a git tag named "r20210928" attached to the precise dataset she analysed

p115628@joplin:~/datasets/uk-biobank-processed$ git remote -v
origin  [email protected]:datasets/uk-biobank-processed (fetch)
origin  [email protected]:datasets/uk-biobank-processed (push)
p115628@joplin:~/datasets/uk-biobank-processed$ git log HEAD~..
commit 7ed28cd0aacaab0ce1570bccdba3bff495b5f496 (HEAD -> master, tag: show, tag: r20210928, origin/master, origin/HEAD)
Author: Sandrine Bedard <[email protected]>
Date:   Sun Sep 5 17:49:04 2021 -0400

    change suffix in derivatives from labels-manual to labels-disc-manual

but she hasn't documented that that's the tag to go with the project. Otherwise her procedure fits option 3: write down all the steps and expect people will read all your docs and follow all your instructions.

Maybe you can already tell but I'm leaning against option 3. I think we need something more automated because if there's one thing I know about human-computer interaction it's that people don't read instructions. I don't know if git submodule (option 1) is the right answer but I think we need something automated. And we will know if we have it when we can run a weekly cronjob on each of our papers that reproduces the final paper including all figures and statistical analyses from just source code and access to the data server.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh perf, I just poked around a bit and noticed I guess that you were trying to reproduce this project: https://github.com/NadiaBlostein/Chemical_Structure_Reconstruction

The very first thing it says pretty much is:

  1. Download the Mordred compound sets (1 through 3) and place them in a data/ folder

I have no idea where to get the Mordred compound sets (1 through 3) and what version (if any) of them I'm supposed to get and I don't know if they assume the data/ folder is supposed to be in the same folder as the code, or if they mean in the current working directory (read it in the unix convention, I would read that as the current working directory!) which isn't necessarily the same.

And

  1. Run the main function in src/thresholding.py.

There is no main function in that file. As a programmer I can read the code and interpret that they must mean "run python src/thesholding.py", but if working with a large series of projects it would be impossible to track down and correct every small gap like that.

Their repo also depends on pandas and some other third-party python packages, but doesn't declare a setup.py or requirements.txt or a conda environment or anything. It doesn't even say what version of python it was written against. All of that is key to write down for reproducibility.

Those are some prime examples of the mistakes I want us to be able to avoid.


**Step 5. The data**
* It is critical to make sure that you know what data you are working with.
* Ideally, it should be in [BIDS](https://bids-specification.readthedocs.io/en/stable/) format on the [`data.neuro`](https://intranet.neuro.polymtl.ca/data/git-datasets.html) storage node: `data.neuro:datasets/<PROJECT_DATASET>`.
Copy link
Member

@kousu kousu Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto with the relative links. Also I'd shorten the BIDS link to be safer against link rot -- their domain will change less frequently than their subfolders, hopefully.

Suggested change
* Ideally, it should be in [BIDS](https://bids-specification.readthedocs.io/en/stable/) format on the [`data.neuro`](https://intranet.neuro.polymtl.ca/data/git-datasets.html) storage node: `data.neuro:datasets/<PROJECT_DATASET>`.
Ideally, it should be in [BIDS](https://bids-specification.readthedocs.io) format. We have many of these on the private [`data`](data/git-datasets.md) server.

Copy link
Member

@kousu kousu Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also think about the workflow for combining datasets. In principle it's easy to include multiple datasets -- each of them is just a different folder, and we can do that with git submodules or almost any other tool. We should think about how to leave the door open to this and avoid tricking people into thinking that there's a 1-to-1 mapping between analyses and datasets, but while remembering that the 1-to-1 situation is the common case.

Comment on lines +8 to +15
**Step 1.**
* Make sure that your VPN connection is established or that you are connected to the Polytechnique wifi.

**Step 2.**
* Log in to one of the available [Neuropoly compute nodes](https://intranet.neuro.polymtl.ca/computing-resources/neuropoly/README.html):
```
ssh <POLYGRAMES_USERNAME>@<STATION>.neuro.polymtl.ca
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the instructions at

### SSH (command line)
Once the VPN connection established, connect via ssh using the `STATION` you want:
```bash
ssh <POLYGRAMES_USERNAME>@<STATION>.neuro.polymtl.ca
```

I think those instructions could be clearer -- it would be nice if that page was broken up -- but in any case it's best not to duplicate the work.

Comment on lines +53 to +56
* However, in order to save space, make sure to "undownload" those big files once you are done working with them with:
```
git annex drop .
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good tip, but I think it would fit better as a new subsection over in https://intranet.neuro.polymtl.ca/data/git-datasets.html#drop.

@kousu
Copy link
Member

kousu commented Dec 14, 2022

By the way, today Jan and now me spent several hours putting together a BIDS curation guide:

# Dataset curation
## Subject naming convention
**Basic convention**
sub-XXX
example:
- sub-001
- sub-002
**Multi-institution/Multi-pathology convention**
sub-<site><pathology>XXX
example:
- sub-montrealDCM001
- sub-torontoHC001
## BIDS template
⚠️ Every dataset must have the following files:
```
├── README.md
├── participants.json
├── participants.tsv
├── dataset_description.json
└── code
└── curate.py
```
For details, see [BIDS specification](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#code).
### `README.md`
The [`README.md`](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#readme) is a [markdown](https://markdown-guide.readthedocs.io/en/latest/index.html) file describing the dataset in more detail.
Below is a template - modify it!!!
<details><summary>README.md template:</summary>
```
# <NAME OF DATASET>
This is an <MRI/Microscopy> dataset acquired in the context of the <XYZ> project.
<IF DATASET CONTAINS DERIVATIVES>It also contains <manual segmentation/labels> of <MS lesions/tumors/etc> from <one/two/or more> expert raters located under the derivatives folder.
## contact person
Dataset shared by: <NAME AND EMAIL>
<IF THERE WAS EMAIL COMM>Email communication: <DATE OF EMAIL AND SUBJECT>
<REPOSITORY OF PROJECT/MODEL, etc>Repository: https://github.com/<organization>/<repository_name>
## <IF DATA ARE MISSING FOR SOME SUBJECT(S)>missing data
<LIST HERE MISSING SUBJECTS>
```
</details>
### `participants.tsv`
The `participants.tsv` is a TSV file and should include at least the following columns:
| participant_id | age | pathology | source_id | institution |
| ----------- | ----------- | ----------- | ----------- | ----------- |
| sub-001 | 30 | HC | 001 | montreal |
| sub-002 | 40 | MS | 005 | montreal |
| sub-003 | n/a | MS | 007 | montreal |
- `participant_id` - unique participant ID
- `age` - participant age
- `pathology` - pathology name; can take values listed in the [pathology column](https://docs.google.com/spreadsheets/d/1yjcA8Z0COn4OZxusIDHjStH2DpeXvscsj-aWE2X-_sg/edit?usp=sharing)
- `source_id` - subject ID used to identify subject in the unprocessed data
- `institution` - human-friendly institution name
- others - if available, include also demographic characteristics such as `age`, `sex`, `height`, `weight`, `researcher`, and [additional columns](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#participants-file).
❗️Indicate missing values with `n/a` (for "not available"), not by empty cells!
### `participants.json`
The `participants.json` is a JSON file describing the column names in the `participants.tsv` and properties of their values.
<details><summary>participants.json template:</summary>
```json
{
"participant_id": {
"Description": "Unique Participant ID",
"LongName": "Participant ID"
},
"age": {
"Description": "Participant age",
"LongName": "Participant age"
},
"pathology": {
"Description": "Pathology",
"LongName": "Pathology name"
},
"source_id": {
"Description": "Subject ID in the unprocessed data",
"LongName": "Subject ID in the unprocessed data"
},
"institution": {
"Description": "Institution ID after conversion to BIDS",
"LongName": "BIDS Institution ID"
}
}
```
</details>
### `dataset_description.json`
The [`dataset_description.json`](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#dataset_descriptionjson) is a JSON file describing the dataset.
❗The `dataset_description.json` file within the top-level dataset should include `"DatasetType": "raw"`.
<details><summary>dataset_description.json template:</summary>
```json
{
"BIDSVersion": "BIDS X.Y.Z",
"Name": "<dataset_name>",
"DatasetType": "raw"
}
```
</details>
### `code/`
The data curation script(s) should be placed inside the BIDS datasets, under the `code/` folder. For more convenience, you can create a PR with a curation script that you are working on, so that others can give feedback; once the script is validated, you can simply close the PR and delete the branch without merging.
## derivatives structure
The [`derivatives`](https://bids-specification.readthedocs.io/en/stable/05-derivatives/01-introduction.html) are files generated from the top-level dataset such as segmentations or labels.
Convention for derivatives JSON metadata:
```json
{
"Author": "Firstname Lastname",
"Date": "YYYY-MM-DD HH:MM:SS"
}
```
NOTE: "Date" is optional. We usually include it when running the manual correction via python scripts.
❗The `derivatives` must include its own `dataset_description.json` file (with `"DatasetType": "derivative"`).
## Changelog policy
We will use git log over [CHANGES](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#changes) BIDS file.
Good commit message examples:
```
Add manual segmentations
```
or
```
Add new subjects from <email_adress>
```

We could add a link to this to the workflow. Some projects start by curating their own data, some just use pre-curated data, and others involve generating derivative datasets (which also need to be curated).

There are two other documents I'm involved with that duplicate this work:

none of these are finished, or even necessarily correct.

I would like to get all steps of all our workflows -- collection, curation, analysis, reproduction -- down in our wiki to offer as a gold standard.

@kousu
Copy link
Member

kousu commented Dec 14, 2022

Also I've left a blank spot in

```{note}
Analysis scripts should not be kept here. Keep them in separate repositories, usually in public on GitHub, with instructions about. See [PIPELINE-DOC](TODO-PIPELINE-DOC).
```

to link up this doc when it's ready!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants