Merge pull request #225 from nextstrain/nextclade

Add Nextclade dataset workflow in `/nextclade` dir
nextstrain · Dec 8, 2023 · 0ff5ce8 · 0ff5ce8
2 parents af4da1d + 27fd43e
commit 0ff5ce8
Show file tree

Hide file tree

Showing 65 changed files with 8,202 additions and 9 deletions.
diff --git a/.gitignore b/.gitignore
@@ -8,6 +8,7 @@ outbreak_data/
 logs/
 staging/
 benchmarks/
+bin/
 
 # Keep ingest/data directory, but ignore all files within it
 !ingest/data/

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,5 +1,15 @@
 exclude: '\.(tsv|fasta|gb)$|^ingest/vendored/'
 repos:
+  # Disabled for now due to bug:
+  # https://github.com/snakemake/snakefmt/issues/207
+  # - repo: https://github.com/snakemake/snakefmt
+  #   rev: v0.8.5
+  #   hooks:
+  #     - id: snakefmt
+  - repo: https://github.com/rhysd/actionlint
+    rev: v1.6.26
+    hooks:
+      - id: actionlint
   - repo: https://github.com/codespell-project/codespell
     rev: v2.2.6
     hooks:
@@ -36,15 +46,7 @@ repos:
     rev: v0.0.1
     hooks:
       - id: sync-pre-commit-deps
-  - repo: https://github.com/rhysd/actionlint
-    rev: v1.6.26
-    hooks:
-      - id: actionlint
   - repo: https://github.com/shellcheck-py/shellcheck-py
     rev: v0.9.0.6
     hooks:
       - id: shellcheck
-  - repo: https://github.com/snakemake/snakefmt
-    rev: v0.8.5
-    hooks:
-      - id: snakefmt
diff --git a/README.md b/README.md
@@ -2,10 +2,11 @@
 
 [![pre-commit.ci status](https://results.pre-commit.ci/badge/github/nextstrain/mpox/master.svg)](https://results.pre-commit.ci/latest/github/nextstrain/mpox/master)
 
-This repository contains two workflows for the analysis of mpox virus (MPXV) data:
+This repository contains three workflows for the analysis of mpox virus (MPXV) data:
 
 - [`ingest/`](./ingest) - Download data from GenBank, clean and curate it and upload it to S3
 - [`phylogenetic/`](./phylogenetic) - Make phylogenetic trees for nextstrain.org
+- [`nextclade/`](./nextclade) - Make Nextclade datasets for nextstrain/nextclade_data
 
 Each folder contains a README.md with more information.
 

diff --git a/nextclade/.gitignore b/nextclade/.gitignore
@@ -0,0 +1,4 @@
+/test_output/
+/datasets/
+/bin/
+/test/
diff --git a/nextclade/README.md b/nextclade/README.md
@@ -0,0 +1,62 @@
+# Nextclade reference tree workflow for monkeypox
+
+This README doesn't end up in the datasets, so it's a developer README, rather than a dataset user README.
+
+## Usage
+
+```bash
+snakemake
+```
+
+You need to have a `nextclade3` binary in your path. It's in the `nextstrain/docker-base` image or you can get it from <https://github.com/nextstrain/nextclade/releases/tag/3.0.0-alpha.0>.
+
+### Visualize results
+
+View results with:
+
+```bash
+nextstrain view auspice/
+```
+
+## Maintenance
+
+### Updating for new clades
+
+- [ ] Update each `config/{build}/clades.tsv` with new clades
+- [ ] Add new clades to color ordering
+- [ ] Check that clades look good, exclude problematic sequences as necessary
+
+### Creating a new dataset version
+
+- [ ] Edit CHANGELOG.md
+- [ ] Switch to `nextclade_data/data/mpox` repo
+- [ ] Create branch there, copy datasets, commit, push, open PR:
+
+```bash
+cd ../../nextclade_data
+git checkout master
+git pull
+git checkout -b mpox-update
+cp -r ../monkeypox/nextclade/datasets/ data/nextstrain/mpox
+git add data/nextstrain/mpox
+git commit -m "Update mpox dataset"
+git push -u origin mpox-update
+gh pr create
+```
+
+## Configuration
+
+Builds differ in paths, relevant configs are pulled in through lookup.
+
+## Installation
+
+Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html) for Nextstrain's suite of software tools.
+
+## Data use
+
+We gratefully acknowledge the authors, originating and submitting laboratories of the genetic
+sequences and metadata for sharing their work. Please note that although data generators have
+generously shared data in an open fashion, that does not mean there should be free license to
+publish on this data. Data generators should be cited where possible and collaborations should be
+sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if
+uncertain.