Skip to content

Commit

Permalink
Merge pull request #225 from nextstrain/nextclade
Browse files Browse the repository at this point in the history
Add Nextclade dataset workflow in `/nextclade` dir
  • Loading branch information
corneliusroemer authored Dec 8, 2023
2 parents af4da1d + 27fd43e commit 0ff5ce8
Show file tree
Hide file tree
Showing 65 changed files with 8,202 additions and 9 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ outbreak_data/
logs/
staging/
benchmarks/
bin/

# Keep ingest/data directory, but ignore all files within it
!ingest/data/
Expand Down
18 changes: 10 additions & 8 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
exclude: '\.(tsv|fasta|gb)$|^ingest/vendored/'
repos:
# Disabled for now due to bug:
# https://github.com/snakemake/snakefmt/issues/207
# - repo: https://github.com/snakemake/snakefmt
# rev: v0.8.5
# hooks:
# - id: snakefmt
- repo: https://github.com/rhysd/actionlint
rev: v1.6.26
hooks:
- id: actionlint
- repo: https://github.com/codespell-project/codespell
rev: v2.2.6
hooks:
Expand Down Expand Up @@ -36,15 +46,7 @@ repos:
rev: v0.0.1
hooks:
- id: sync-pre-commit-deps
- repo: https://github.com/rhysd/actionlint
rev: v1.6.26
hooks:
- id: actionlint
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.9.0.6
hooks:
- id: shellcheck
- repo: https://github.com/snakemake/snakefmt
rev: v0.8.5
hooks:
- id: snakefmt
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@

[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/nextstrain/mpox/master.svg)](https://results.pre-commit.ci/latest/github/nextstrain/mpox/master)

This repository contains two workflows for the analysis of mpox virus (MPXV) data:
This repository contains three workflows for the analysis of mpox virus (MPXV) data:

- [`ingest/`](./ingest) - Download data from GenBank, clean and curate it and upload it to S3
- [`phylogenetic/`](./phylogenetic) - Make phylogenetic trees for nextstrain.org
- [`nextclade/`](./nextclade) - Make Nextclade datasets for nextstrain/nextclade_data

Each folder contains a README.md with more information.

Expand Down
4 changes: 4 additions & 0 deletions nextclade/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
/test_output/
/datasets/
/bin/
/test/
62 changes: 62 additions & 0 deletions nextclade/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Nextclade reference tree workflow for monkeypox

This README doesn't end up in the datasets, so it's a developer README, rather than a dataset user README.

## Usage

```bash
snakemake
```

You need to have a `nextclade3` binary in your path. It's in the `nextstrain/docker-base` image or you can get it from <https://github.com/nextstrain/nextclade/releases/tag/3.0.0-alpha.0>.

### Visualize results

View results with:

```bash
nextstrain view auspice/
```

## Maintenance

### Updating for new clades

- [ ] Update each `config/{build}/clades.tsv` with new clades
- [ ] Add new clades to color ordering
- [ ] Check that clades look good, exclude problematic sequences as necessary

### Creating a new dataset version

- [ ] Edit CHANGELOG.md
- [ ] Switch to `nextclade_data/data/mpox` repo
- [ ] Create branch there, copy datasets, commit, push, open PR:

```bash
cd ../../nextclade_data
git checkout master
git pull
git checkout -b mpox-update
cp -r ../monkeypox/nextclade/datasets/ data/nextstrain/mpox
git add data/nextstrain/mpox
git commit -m "Update mpox dataset"
git push -u origin mpox-update
gh pr create
```

## Configuration

Builds differ in paths, relevant configs are pulled in through lookup.

## Installation

Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html) for Nextstrain's suite of software tools.

## Data use

We gratefully acknowledge the authors, originating and submitting laboratories of the genetic
sequences and metadata for sharing their work. Please note that although data generators have
generously shared data in an open fashion, that does not mean there should be free license to
publish on this data. Data generators should be cited where possible and collaborations should be
sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if
uncertain.
Loading

0 comments on commit 0ff5ce8

Please sign in to comment.