Skip to content

Commit

Permalink
edit msa doc file
Browse files Browse the repository at this point in the history
  • Loading branch information
JeanMainguy committed Dec 19, 2023
1 parent c527a83 commit f791622
Showing 1 changed file with 21 additions and 7 deletions.
28 changes: 21 additions & 7 deletions docs/user/MSA.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,43 @@
# Multiple Sequence Alignment

This command is available from 1.1.103 and on.
It is used to call [mafft](https://mafft.cbrc.jp/alignment/software/) with default options to compute MSA of any partition of the pangenome. Using multiple cpus is recommended as it is quite demanding in computational resources.
The commande msa compute multiple sequence alignement of any partition of the pangenome. The command uses [mafft](https://mafft.cbrc.jp/alignment/software/) with default options to perform the alignment. Using multiple cpus is recommended as multiple alignment can be quite demanding in computational resources.

This command can be used as follow:

```bash
ppanggolin msa -p pangenome.h5
```

By default it will write the strict 'core' (genes that are present in absolutely all genomes) and remove any duplicated genes. Beware however that, if you have many genomes (over 1000), the core will likely be either very small or even empty if you have fragmented genomes.

It will write one MSA for each family. You can then provide the directory where the MSA are written to [IQ-TREE](https://github.com/Cibiv/IQ-TREE) for example, to do phylogenetic analysis.

### partitions
### Modify the partition with `--partition`

Check warning on line 15 in docs/user/MSA.md

View workflow job for this annotation

GitHub Actions / build

Non-consecutive header level increase; H1 to H3 [myst.header]

You can change the partition which is written, by using the --partition option.
`ppanggolin msa -p pangenome.h5 --partition persistent` for example will compute MSA for all the persistent gene families.

Supported partitions are core, persistent, shell, cloud, softcore, accessory. If you wish to have additional filters, you can raise an issue with your demand, or write a PR directly, most possibilites should be quite straightforward to add.
for example will compute MSA for all the persistent gene families.

```bash
ppanggolin msa -p pangenome.h5 --partition persistent
```

Supported partitions are `core`, `persistent`, `shell`, `cloud`, `softcore`, `accessory`. If you wish to have additional filters, you can raise an issue in the [issue tracker](https://github.com/labgem/PPanGGOLiN/issues) with your demand, or write a PR directly (see [here](../dev/contribute.md) for instruction on how to contribute), most possibilites should be quite straightforward to add.

### source

Check warning on line 27 in docs/user/MSA.md

View workflow job for this annotation

GitHub Actions / build

Non-consecutive header level increase; H1 to H3 [myst.header]

You can specify whether to use dna or protein sequences for the MSA by using --source. It uses protein sequences by default.

`ppanggolin msa -p pangenome.h5 --source dna`
```bash
ppanggolin msa -p pangenome.h5 --source dna
```

### phylo

Check warning on line 35 in docs/user/MSA.md

View workflow job for this annotation

GitHub Actions / build

Non-consecutive header level increase; H1 to H3 [myst.header]

It is also possible to write a single whole genome MSA file, which many phylogenetic softwares accept as input, by using the --phylo option as such:

`ppanggolin msa -p pangenome.h5 --phylo`
```bash
ppanggolin msa -p pangenome.h5 --phylo
```

This will contatenate all of the family MSA into a single MSA, with one sequence for each genome.

0 comments on commit f791622

Please sign in to comment.