Skip to content

Commit

Permalink
Merge pull request #257 from fchapoton/fix_typos_in_docs
Browse files Browse the repository at this point in the history
fix some typos in docs
  • Loading branch information
JeanMainguy authored Aug 2, 2024
2 parents b5ab07a + 52e1456 commit 8651498
Show file tree
Hide file tree
Showing 11 changed files with 15 additions and 16 deletions.
6 changes: 3 additions & 3 deletions docs/user/MSA.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Multiple Sequence Alignment

The commande `msa` compute multiple sequence alignement of any partition of the pangenome. The command uses [mafft](https://mafft.cbrc.jp/alignment/software/) with default options to perform the alignment. Using multiple cpus with the `--cpu` argument is recommended as multiple alignment can be quite demanding in computational resources.
The commande `msa` compute multiple sequence alignment of any partition of the pangenome. The command uses [mafft](https://mafft.cbrc.jp/alignment/software/) with default options to perform the alignment. Using multiple cpus with the `--cpu` argument is recommended as multiple alignment can be quite demanding in computational resources.

This command can be used as follow:

Expand Down Expand Up @@ -34,10 +34,10 @@ ppanggolin msa -p pangenome.h5 --source dna

### Write a single whole MSA file with `--phylo`

It is also possible to write a single whole genome MSA file, which many phylogenetic softwares accept as input, by using the `--phylo` option as such:
It is also possible to write a single whole genome MSA file, which many phylogenetic software accept as input, by using the `--phylo` option as such:

```bash
ppanggolin msa -p pangenome.h5 --phylo
```

This will contatenate all of the family MSA into a single MSA, with one sequence for each genome.
This will concatenate all of the family MSA into a single MSA, with one sequence for each genome.
3 changes: 1 addition & 2 deletions docs/user/Modules/moduleOutputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ Modules:
Number_of_modules: 380
Families_in_Modules: 2242
Partition_composition:
Persitent: 0.27
Persistent: 0.27
Shell: 37.69
Cloud: 62.04
Number_of_Families_per_Modules:
Expand All @@ -122,4 +122,3 @@ Modules:
mean: 5.9

```

4 changes: 2 additions & 2 deletions docs/user/Modules/modulePrediction.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,12 @@ ppanggolin panmodule --fasta GENOME_LIST_FILE
```
Replace `GENOME_LIST_FILE` with a tab-separated file listing the genome names, and the fasta file path of their genomic sequences as described [here](../PangenomeAnalyses/pangenomeAnnotation.md#annotate-from-fasta-files). Alternatively, you can provide a list of GFF/GBFF files as input by using the `--anno` parameter, similar to how it is used in the workflow and annotate commands.

The panmodule workflow predicts modules using default parameters. To fine-tune the detection, you can use the `module` command on a partioned pangenome acquired through the `workflow` for example or use a configuration file, as described [here](../practicalInformation.md#configuration-file).
The panmodule workflow predicts modules using default parameters. To fine-tune the detection, you can use the `module` command on a partitioned pangenome acquired through the `workflow` for example or use a configuration file, as described [here](../practicalInformation.md#configuration-file).


## Predict conserved module

The `module` command predicts conserved modules on an partioned pangenome. The command has several options for tuning the prediction. Details about each parameter are available in the related [preprint](https://www.biorxiv.org/content/10.1101/2021.12.06.471380v1).
The `module` command predicts conserved modules on an partitioned pangenome. The command has several options for tuning the prediction. Details about each parameter are available in the related [preprint](https://www.biorxiv.org/content/10.1101/2021.12.06.471380v1).

The command can be used simply as such:

Expand Down
2 changes: 1 addition & 1 deletion docs/user/PangenomeAnalyses/pangenomeCluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ Family_C Gene_6 Gene_6
```{mermaid}
---
title: "Pangenome gene families when specifing representative gene"
title: "Pangenome gene families when specifying representative gene"
align: center
---
Expand Down
2 changes: 1 addition & 1 deletion docs/user/PangenomeAnalyses/pangenomeGraphOut.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The pangneome graph can be given through the `.gexf` and through the `_light.gexf` files. The `_light.gexf` file will contain the gene families as nodes and the edges between gene families describing their relationship, and the `.gexf` file will contain the same things but also include more details about each gene and each relation between gene families.
We have made two different files representing the same graph because, while the non-light file is exhaustive, it can be very heavy to manipulate and most of its content is not of interest to everyone. The `_light.gexf` file should be the one you use to manipulate the pangenome graph most of the time.

These files can be manipulated and visualized for example through a software called [Gephi](https://gephi.org/), with which we have made extensive testings, or potentially any other softwares or libraries able to read gexf files such as [networkx](https://networkx.github.io/documentation/stable/index.html) or [gexf-js](https://github.com/raphv/gexf-js) among others. Gephi also have a web version able to open small pangenome graphs [gephi-lite](https://gephi.org/gephi-lite/).
These files can be manipulated and visualized for example through a software called [Gephi](https://gephi.org/), with which we have made extensive testings, or potentially any other software or libraries able to read gexf files such as [networkx](https://networkx.github.io/documentation/stable/index.html) or [gexf-js](https://github.com/raphv/gexf-js) among others. Gephi also have a web version able to open small pangenome graphs [gephi-lite](https://gephi.org/gephi-lite/).

Using Gephi, the layout can be tuned as illustrated below:

Expand Down
2 changes: 1 addition & 1 deletion docs/user/QuickUsage/quickWorkflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ genome_updater.sh -d "refseq" -o "B_japonicum_genomes" -M "gtdb" -T "s__Bradyrh
```


After the completion of the `all` command, all of your genomes have had their genes predicted, the genes have been clustered into gene families, a pangenome graph has been successfully constructed and partitioned into three distinct paritions: **persistent**, **shell**, and **cloud**. Additionally, **RGP, spots, and modules** have been detected within your pangenome.
After the completion of the `all` command, all of your genomes have had their genes predicted, the genes have been clustered into gene families, a pangenome graph has been successfully constructed and partitioned into three distinct partitions: **persistent**, **shell**, and **cloud**. Additionally, **RGP, spots, and modules** have been detected within your pangenome.

The results of the workflow is saved in the **pangenome.h5** file, which is in the HDF-5 file format.
When you run an analysis using this file as input, the results of that analysis will be added to the file to supplement the data that are already stored in it.
Expand Down
2 changes: 1 addition & 1 deletion docs/user/RGP/rgpClustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ There are three modes available for calculating the GRR value: `min_grr`, `max_g
- `incomplete_aware_grr` (default) mode: If at least one RGP is considered incomplete, which typically happens when it is located at the border of a contig, the `min_grr` mode is used. Otherwise, the `max_grr` mode is applied. This mode is useful to correctly cluster incomplete RGP.


The resulting RGP clusters are stored in a tsv file with the folowing columns:
The resulting RGP clusters are stored in a tsv file with the following columns:

| column | description |
|---------|------------------------------|
Expand Down
2 changes: 1 addition & 1 deletion docs/user/RGP/rgpPrediction.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ ppanggolin panrgp --fasta genomes.fasta.list
```

Just like [workflow](../PangenomeAnalyses/pangenomeAnalyses.md#workflow), this command will deal with the [annotation](../PangenomeAnalyses/pangenomeAnalyses.md#annotation), [clustering](../PangenomeAnalyses/pangenomeAnalyses.md#compute-pangenome-gene-families), [graph](../PangenomeAnalyses/pangenomeAnalyses.md#graph) and [partition](../PangenomeAnalyses/pangenomeAnalyses.md#partition) commands by itself.
Then, the RGP detection is ran using [rgp](#rgp-detection) after the pangenome partitionning. Once all RGP have been computed, those found in similar genomic contexts in the genomes are gathered into spots of insertion using [spot](#spot-prediction).
Then, the RGP detection is ran using [rgp](#rgp-detection) after the pangenome partitioning. Once all RGP have been computed, those found in similar genomic contexts in the genomes are gathered into spots of insertion using [spot](#spot-prediction).

If you want to tune the rgp detection, you can use the `rgp` command after the `workflow` command. If you wish to tune the spot detection, you can use the `spot` command after the `rgp` command. Additionally, you have the option to utilize a configuration file to customize each detection within the `panrgp` command.

Expand Down
2 changes: 1 addition & 1 deletion docs/user/align.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ By default the command creates two output files:

### 2. 'input_to_pangenome_associations.blast-tab'

'input_to_pangenome_associations.blast-tab' is a .tsv file that follows the tabular blast format which many alignment softwares (such as blast, diamond, mmseqs etc.) use, with two additional columns: the length of query sequence which was aligned, and the length of the subject sequence which was aligned (provided with qlen and slen with the softwares I previously named). You can find a detailed description of the format in [this blog post](https://www.metagenomics.wiki/tools/blast/blastn-output-format-6) for example (and there are many other descriptions of this format on internet, if you search for 'tabular blast format'). The query are the provided sequences, and the subjet are the pangenome gene families.
'input_to_pangenome_associations.blast-tab' is a .tsv file that follows the tabular blast format which many alignment software (such as blast, diamond, mmseqs etc.) use, with two additional columns: the length of query sequence which was aligned, and the length of the subject sequence which was aligned (provided with qlen and slen with the software I previously named). You can find a detailed description of the format in [this blog post](https://www.metagenomics.wiki/tools/blast/blastn-output-format-6) for example (and there are many other descriptions of this format on internet, if you search for 'tabular blast format'). The query are the provided sequences, and the subject are the pangenome gene families.


### 3. Optional outputs
Expand Down
2 changes: 1 addition & 1 deletion docs/user/practicalInformation.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ If you want, verbosity can be reduced in several ways.
First, you can specify the verbosity level with the `--verbose` option.
With `0` will show only warnings and errors, `1` will add the information (default value), and if you encounter any problem you can use the debug level with value `2`.
Then you can also remove the progress bars with the option `--disable_prog_bar`
Finaly, you can also save PPanGGOLiN logs in a file by indicating its path with the option `--log`.
Finally, you can also save PPanGGOLiN logs in a file by indicating its path with the option `--log`.

## Configuration file

Expand Down
4 changes: 2 additions & 2 deletions docs/user/projection.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,13 @@ For Gene Family and Partition of Input Genes:
For RGPs and Spots:

- `plastic_regions.tsv`: This file contains information about RGPs within the input genome. Its format follows [this output](RGP/rgpOutputs.md#rgp-outputs).
- `input_genome_rgp_to_spot.tsv`: It provides information about the association between RGPs and insertion spots in the input genome. Its format follows [this ouput](RGP/rgpOutputs.md#summarize-spots).
- `input_genome_rgp_to_spot.tsv`: It provides information about the association between RGPs and insertion spots in the input genome. Its format follows [this output](RGP/rgpOutputs.md#summarize-spots).

Optionally, you can generate a graph of the spots using the `--spot_graph` option. This graph resembles the one produced by the `ppanggolin draw --spots` command, which is detailed [here](RGP/rgpOutputs.md#draw-spots).

For Modules:

- `modules_in_input_genome.tsv`: This file lists the modules that have been found in the input genome. Its format follows [this ouput](Modules/moduleOutputs.md#module-outputs).
- `modules_in_input_genome.tsv`: This file lists the modules that have been found in the input genome. Its format follows [this output](Modules/moduleOutputs.md#module-outputs).



0 comments on commit 8651498

Please sign in to comment.