Skip to content

Genotyping Tools

Robert J. Gifford edited this page Oct 24, 2024 · 42 revisions

Overview

For DENV sequences for which the serotype is known (or has been determined using the denvSerotypeRecogniser module) Dengue-GLUE provides modules for performing maximum likelihood-based clade assignment (a separate module is provided for each DENV serotype).

Classification follows recently established DENV lineage nomenclature.

Maximum Likelihood Clade Assignment (MLCA) in Dengue-GLUE

Dengue-GLUE employs a robust genotyping method called Maximum Likelihood Clade Assignment (MLCA) to assign DENV sequences to genotypes and lineages.

MLCA is based on the Evolutionary Placement Algorithm (EPA), a feature of the highly optimized RAxML software. RAxML typically generates complete phylogenetic trees from multiple sequence alignments, but EPA allows for efficient clade assignment by placing new sequences onto an existing reference tree without recalculating the entire phylogeny. This efficiency makes EPA well-suited for virus sequence clade assignment, forming the foundation of the MLCA method integrated into GLUE.

In GLUE, the MLCA process is implemented using the maxLikelihoodGenotyper and maxLikelihoodPlacer modules.

Example Usage in Dengue-GLUE

The genotyping process in Dengue-GLUE can be executed through the command-line interface. Below is an example of using the MLCA genotyping module for DENV2:

Mode path: /project/dengue
GLUE> module denv2MaxLikelihoodGenotyper genotype file -f sources/fasta-testseqs-denv2/EU482760.1_2005.fas 

This command processes the sequences in the specified FASTA file and outputs the assigned genotype and subtype clades for each sequence:

+=================+====================+====================+=========================+=========================+============================+
|    queryName    | serotypeFinalClade | genotypeFinalClade | major_lineageFinalClade | minor_lineageFinalClade | minor_sublineageFinalClade |
+=================+====================+====================+=========================+=========================+============================+
| EU482760.1_2005 | AL_DENV_2          | AL_DENV_2III       | AL_DENV_2III_D          | AL_DENV_2III_D1         | AL_DENV_2III_D1.3          |
+=================+====================+====================+=========================+=========================+============================+

In this example, the DENV2 sequence EU482760.1_2005 is assigned to the lineage 2III_D1.3.

The MLCA Algorithm

The MLCA algorithm operates in three stages: alignment, placement, and neighbor-weighting. Each stage plays a crucial role in accurately assigning query sequences to predefined clades.

  1. Alignment Stage:
    The first step involves aligning the query sequences to a reference set of DENV sequences. This is achieved using the MAFFT software, specifically the --add and --keeplength options, which integrate query sequences into the existing multiple sequence alignment without altering the original alignment's structure. Each query sequence is aligned independently, ensuring that the alignment computations remain isolated for each sequence.

  2. Placement Stage:
    In the placement stage, the extended alignment from the previous step is combined with a fixed reference tree. For each query sequence, the algorithm identifies potential placements on the tree that maximize the likelihood of the extended tree structure. Using RAxML's EPA subsystem, the algorithm inserts the query sequence at various points on the tree, optimizing the branch lengths and positions to find the most likely placements. A small set of high-likelihood placements is retained for further analysis.

  3. Neighbor-Weighting Stage:
    The final stage of the MLCA algorithm is neighbor-weighting, which summarizes the placement results by calculating clade weightings for each query sequence. The algorithm evaluates the evolutionary distance between the query sequence and its closest neighboring reference sequences. Since these neighbors are already assigned to specific clades, their proximity provides evidence for the query sequence's clade assignment. The closer the neighbor, the stronger the evidence. The algorithm then assigns the query sequence to the clade if the calculated weight exceeds a predefined threshold.

    This neighbor-weighting mechanism relies on the evolutionary distances in the phylogenetic tree, where shorter branch lengths indicate closer genetic relationships. By focusing on nearby reference sequences, the algorithm effectively assigns query sequences to the most appropriate clades based on genetic similarity.

Benefits of Using MLCA for DENV Genotyping

The integration of MLCA within Dengue-GLUE offers a powerful and efficient tool for DENV genotyping. By leveraging the EPA feature of RAxML and the structured approach of MLCA, the method provides a high level of accuracy and computational efficiency, making it well-suited for large-scale sequence analysis in both research and clinical settings.


Clone this wiki locally