fix duplicated section and header size

labgem · Jan 9, 2024 · 540174e · 540174e
1 parent ff80fb1
commit 540174e
Showing 1 changed file with 5 additions and 20 deletions.
diff --git a/docs/user/PangenomeAnalyses/pangenomeStat.md b/docs/user/PangenomeAnalyses/pangenomeStat.md
@@ -81,23 +81,8 @@ The fragmentation value denotes the proportion of families containing fragmented
 
 
 
-#### Mean Persistent Duplication
-
-The `mean_persistent_duplication.tsv` is a tab-separated file that lists the gene families and their duplication ratio, their mean presence in the pangenome and whether it is considered a 'single copy marker'. A gene family is considered duplicated it is found in single copy in less than 5% of the genomes by default. This threshold can be adjusted with the parameter `--dup_margin`. And the value that has been used to generated this file is specififed as a comment line strating with a '#'. This notion of single copy marker is used to compute a contamination value for each genome in the [genome statistics table](#genome-statistics-table) described previously, where the contamination is the proportion of single copy marker found in multicopy in a specified genome. 
-
-
 
-
-To generate this file, use the following `write_pangenome` subcommand:
-
-```bash
-ppanggolin write_pangenome -p pangenome.h5 --stats
-```
-
-Executing this command will also create the `genomes_statistics.tsv` file, detailed in the section labeled [here](#mean-persistent-duplication).
-
-
-### Mean Persistent Duplication
+#### Mean Persistent Duplication
 
 The `mean_persistent_duplication.tsv` file lists the gene families along with their duplication ratios, average presence in the pangenome, and classification as 'single copy markers.' In this context, a gene family is not considered in single copy if it appears in single copy in less than 5% of the genomes by default. This default threshold can be adjusted using the `--dup_margin` parameter. The chosen threshold value for generating this file is indicated within a comment line starting with a '#'.
 
@@ -134,7 +119,7 @@ The flag `--stats` will also generate the `genomes_statistics.tsv` file desdcrib
 
 
 (gene-presence-absence)=
-### Gene Presence-Absence Matrix
+#### Gene Presence-Absence Matrix
 
 The `gene_presence_absence.Rtab` file represents a presence-absence matrix wherein columns are the genomes used to construct the pangenome, and rows correspond to gene families. Each gene family is identified by the identifier of their representative gene.
 
@@ -147,7 +132,7 @@ ppanggolin write_pangenome -p pangenome.h5 --Rtab
 ```
 
 
-### Matrix File
+#### Matrix File
 The `matrix.csv` file, formatted as a .csv file, follows a structure similar to the `gene_presence_absence.csv` file generated by [Roary](https://sanger-pathogens.github.io/Roary/). This file format is compatible with [Scoary](https://github.com/AdmiralenOla/Scoary) for performing pangenome-wide association studies.
 
 To generate this file, use the `write_pangenome` subcommand with the `--csv` flag:
@@ -158,7 +143,7 @@ ppanggolin write_pangenome -p pangenome.h5 --csv
 
 
 
-### Partitions Files
+#### Partitions Files
 
 The 'Partitions' files are stored within the `partitions` directory and are named after the specific partition they represent (e.g., 'persistent.txt' for the persistent partition). Each file contains a list of gene family identifiers corresponding to the gene families belonging to that particular partition. The format consists of one family identifier per line, facilitating their usage in downstream analysis workflows.
 
@@ -167,7 +152,7 @@ To generate these files, use the `write_pangenome` subcommand with the `--partit
 `ppanggolin write_pangenome -p pangenome.h5 --partitions`
 
 
-### Gene Families to Genes Associations
+#### Gene Families to Genes Associations
 
 The `gene_families.tsv` file mirrors the format provided by [MMseqs2](https://github.com/soedinglab/MMseqs2) through its `createtsv` subcommand. This file structure comprises three columns: the gene family name in the first column, the gene names in the second, and a third column that remains empty or contains an "F" to denote potential gene fragments instead of complete genes. This indication appears only if the [defragmentation](./pangenomeCluster.md#defragmentation) pipeline has been used.