Skip to content

Commit

Permalink
Merge pull request #192 from nextstrain/mpox-2024-04
Browse files Browse the repository at this point in the history
Update mpox datasets to include new lineages B.1.21, B.1.22, and C.1.1
  • Loading branch information
corneliusroemer authored Apr 17, 2024
2 parents fb4a856 + 6016124 commit 83de556
Show file tree
Hide file tree
Showing 31 changed files with 2,015 additions and 9 deletions.
5 changes: 5 additions & 0 deletions data/nextstrain/mpox/all-clades/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## Unreleased

- New hMPXV-1 lineages B.1.21, B.1.22, and C.1.1 are now included in the dataset. For more information on these lineages, see the [hMPXV-1 lineage definitions PR](https://github.com/mpxv-lineages/lineage-designation/pull/37)
- The sequences used in the reference trees have been updated to include the latest sequences available in Genbank as of 2024-04-16

## 2024-01-16T20:31:02Z

Initial release of this dataset. This dataset is similar to the v2 dataset [`MPXV/ancestral`](https://github.com/nextstrain/nextclade_data/tree/2023-08-17--15-51-24--UTC/data/datasets/MPXV/references/ancestral/versions/2023-08-01T12%3A00%3A00Z/files) with some differences.
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/mpox/all-clades/tree.json

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions data/nextstrain/mpox/clade-iib/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## Unreleased

- New hMPXV-1 lineages B.1.21, B.1.22, and C.1.1 are now included in the dataset. For more information on these lineages, see the [hMPXV-1 lineage definitions PR](https://github.com/mpxv-lineages/lineage-designation/pull/37)
- The sequences used in the reference trees have been updated to include the latest sequences available in Genbank as of 2024-04-16

## 2024-01-16T20:31:02Z

Initial release of this dataset. This dataset is similar to the v2 dataset [`hMPXV/NC_063383.1`](https://github.com/nextstrain/nextclade_data/tree/2023-08-17--15-51-24--UTC/data/datasets/hMPXV/references/NC_063383.1/versions/2023-08-01T12%3A00%3A00Z/files) with some differences.
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/mpox/clade-iib/tree.json

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions data/nextstrain/mpox/lineage-b.1/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## Unreleased

- New hMPXV-1 lineages B.1.21, B.1.22, and C.1.1 are now included in the dataset. For more information on these lineages, see the [hMPXV-1 lineage definitions PR](https://github.com/mpxv-lineages/lineage-designation/pull/37)
- The sequences used in the reference trees have been updated to include the latest sequences available in Genbank as of 2024-04-16

## 2024-01-16T20:31:02Z

Initial release of this dataset. This dataset is similar to the v2 dataset [`hMPXV_B.1/pseudo_ON563414`](https://github.com/nextstrain/nextclade_data/tree/2023-08-17--15-51-24--UTC/data/datasets/hMPXV_B.1/references/pseudo_ON563414/versions/2023-08-01T12%3A00%3A00Z/files) with some differences.
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/mpox/lineage-b.1/tree.json

Large diffs are not rendered by default.

30 changes: 24 additions & 6 deletions data_output/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -1357,6 +1357,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-01-16T20:31:02Z",
"tag": "2024-01-16--20-31-02Z",
Expand All @@ -1367,8 +1374,7 @@
}
],
"version": {
"updatedAt": "2024-01-16T20:31:02Z",
"tag": "2024-01-16--20-31-02Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -1406,6 +1412,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-01-16T20:31:02Z",
"tag": "2024-01-16--20-31-02Z",
Expand All @@ -1416,8 +1429,7 @@
}
],
"version": {
"updatedAt": "2024-01-16T20:31:02Z",
"tag": "2024-01-16--20-31-02Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -1458,6 +1470,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-01-16T20:31:02Z",
"tag": "2024-01-16--20-31-02Z",
Expand All @@ -1468,8 +1487,7 @@
}
],
"version": {
"updatedAt": "2024-01-16T20:31:02Z",
"tag": "2024-01-16--20-31-02Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down
18 changes: 18 additions & 0 deletions data_output/nextstrain/mpox/all-clades/unreleased/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## Unreleased

- New hMPXV-1 lineages B.1.21, B.1.22, and C.1.1 are now included in the dataset. For more information on these lineages, see the [hMPXV-1 lineage definitions PR](https://github.com/mpxv-lineages/lineage-designation/pull/37)
- The sequences used in the reference trees have been updated to include the latest sequences available in Genbank as of 2024-04-16

## 2024-01-16T20:31:02Z

Initial release of this dataset. This dataset is similar to the v2 dataset [`MPXV/ancestral`](https://github.com/nextstrain/nextclade_data/tree/2023-08-17--15-51-24--UTC/data/datasets/MPXV/references/ancestral/versions/2023-08-01T12%3A00%3A00Z/files) with some differences.

### New and changed gene names

Some genes have been renamed and one has been added. The new annotation is based on NCBI refseq annotations that were released in November 2022. The v2 dataset predates this refseq:

- The 4 genes in the inverted terminal repeat segment (ITR) on both ends of the genome (OPG001, OPG002, OPG003,OPG015) are now all included. The genes on the 3' end (~positions 190000-197000) now have an `_dup` appended to distinguish them.
- The gene previously named `NBT03_gp052` is now called `OPG073`
- The gene previously named `NBT03_gp174` is now called `OPG016`
- The gene previously named `NBT03_gp175` is now called `OPG015_dup`
- Gene `OPG166` has been added
27 changes: 27 additions & 0 deletions data_output/nextstrain/mpox/all-clades/unreleased/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Nextclade dataset for "Mpox virus (All Clades)"

| Key | Value |
| ---------------------- | --------------------------------------------------------------------------------------------------------------------- |
| authors | [Cornelius Roemer](https://neherlab.org), [Richard Neher](https://neherlab.org), [Nextstrain](https://nextstrain.org) |
| data source | Genbank |
| workflow | [github.com/nextstrain/mpox/nextclade](https://github.com/nextstrain/mpox/nextclade) |
| nextclade dataset path | nextstrain/mpox/all-clades |
| annotation | [NC_063383.1](https://www.ncbi.nlm.nih.gov/nuccore/NC_063383) |
| clade definitions | [github.com/mpxv-lineages/lineage-designation](https://github.com/mpxv-lineages/lineage-designation) |
| related datasets | Mpox virus (Clade IIb): `nextstrain/mpox/clade-iib`<br> Mpox virus (Lineage B.1) `nextstrain/mpox/lineage-b.1` |

## Scope of this dataset

This dataset is for Mpox viruses of all clades (I, IIa and IIb). For a focused analysis of sequences from clade IIb, you may want to use the more specific dataset: "Clade IIb" (`nextstrain/mpox/clade-iib`). For an even more focused analysis of 2022-2023 outbreak sequences (lineage B.1 and sublineages), you may want to use the even more specific dataset: "Lineage B.1" (`nextstrain/mpox/lineage-b.1`).

## Reference sequence and reference tree

The reference used in this dataset is the clade IIb NCBI refseq `NC_063383.1` (Isolate `MPXV-M5312_HM12_Rivers`).

Sequences for the reference tree come from NCBI/Genbank and are downsampled to around 500 sequences from the diversity of clades, lineages, countries and collection dates.

## Further reading

The lineage system used is described in [Happi et. al. (2022)](https://doi.org/10.1371/journal.pbio.3001769). Lineage definitions are available at [github.com/mpxv-lineages/lineage-designation](https://github.com/nextstrain/mpox/nextclade).

Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
Binary file not shown.
Loading

0 comments on commit 83de556

Please sign in to comment.