Skip to content

Genotyping Tools

Robert J. Gifford edited this page Oct 4, 2024 · 42 revisions

BLAST-Based Serotype Recognition in Dengue-GLUE

Dengue-GLUE provides functionality for rapidly assigning DENV sequences to serotypes via the dengueSerotypeRecogniser module.

The dengueSerotypeRecogniser module is a custom implementation of GLUE's blastSequenceRecogniser module. This module uses BLAST and a set of category-associated reference sequences to assign sequences to categories. The config file for the dengueSerotypeRecogniser module can be found here.

To run the recogniser module, start by referencing the module by name and then providing the module-specific command you wish to execute, as shown here:

Mode path: /project/dengue
GLUE> module dengueSerotypeRecogniser recognise 

Remember you can use tab completion to see options for the next command element: for example, options for input to the recognise command will be shown when tab complete is used as shown below:

Mode path: /project/dengue
GLUE> module dengueSerotypeRecogniser recognise 
fasta-document    file              sequence

Below is an example of a complete command, in which the dengueSerotypeRecogniser module is used to perform recognition on a single specific sequence contained within the Dengue-GLUE project database. The sequence is referenced via a where clause and its unique sequenceID value:

GLUE> module dengueSerotypeRecogniser recognise sequence -w "sequenceID = 'FJ639811.1_2005'"

+======================================+============+===========+
|           querySequenceId            | categoryId | direction |
+======================================+============+===========+
| fasta-testseqs-denv1/FJ639811.1_2005 | 1          | FORWARD   |
+======================================+============+===========+

Results are shown in tabular text format. The categoryId field gives the serotype.

Maximum Likelihood Clade Assignment (MLCA) in Dengue-GLUE

Dengue-GLUE employs a robust genotyping method called Maximum Likelihood Clade Assignment (MLCA) to assign DENV sequences to genotypes and lineages.

MLCA is based on the Evolutionary Placement Algorithm (EPA), a feature of the highly optimized RAxML software. RAxML typically generates complete phylogenetic trees from multiple sequence alignments, but EPA allows for efficient clade assignment by placing new sequences onto an existing reference tree without recalculating the entire phylogeny. This efficiency makes EPA well-suited for virus sequence clade assignment, forming the foundation of the MLCA method integrated into GLUE.

Example Usage in Dengue-GLUE

The genotyping process in Dengue-GLUE can be executed through the command-line interface. Below is an example of using the MLCA genotyping module for DENV2:

Mode path: /project/dengue
GLUE> module denv2MaxLikelihoodGenotyper genotype file -f sources/fasta-testseqs-denv2/EU482760.1_2005.fas 

This command processes the sequences in the specified FASTA file and outputs the assigned genotype and subtype clades for each sequence:

+=================+====================+====================+=========================+=========================+============================+
|    queryName    | serotypeFinalClade | genotypeFinalClade | major_lineageFinalClade | minor_lineageFinalClade | minor_sublineageFinalClade |
+=================+====================+====================+=========================+=========================+============================+
| EU482760.1_2005 | AL_DENV_2          | AL_DENV_2III       | AL_DENV_2III_D          | AL_DENV_2III_D1         | AL_DENV_2III_D1.3          |
+=================+====================+====================+=========================+=========================+============================+

In this example, the DENV2 sequence EU482760.1_2005 is assigned to genotype AL_DENV_2III, major lineage AL_DENV_2III_D, minor lineage AL_DENV_2III_D1, and minor_sublineage AL_DENV_2III_D1.3.

The MLCA Algorithm

The MLCA algorithm operates in three stages: alignment, placement, and neighbor-weighting. Each stage plays a crucial role in accurately assigning query sequences to predefined clades.

  1. Alignment Stage:
    The first step involves aligning the query sequences to a reference set of DENV sequences. This is achieved using the MAFFT software, specifically the --add and --keeplength options, which integrate query sequences into the existing multiple sequence alignment without altering the original alignment's structure. Each query sequence is aligned independently, ensuring that the alignment computations remain isolated for each sequence.

  2. Placement Stage:
    In the placement stage, the extended alignment from the previous step is combined with a fixed reference tree. For each query sequence, the algorithm identifies potential placements on the tree that maximize the likelihood of the extended tree structure. Using RAxML's EPA subsystem, the algorithm inserts the query sequence at various points on the tree, optimizing the branch lengths and positions to find the most likely placements. A small set of high-likelihood placements is retained for further analysis.

  3. Neighbor-Weighting Stage:
    The final stage of the MLCA algorithm is neighbor-weighting, which summarizes the placement results by calculating clade weightings for each query sequence. The algorithm evaluates the evolutionary distance between the query sequence and its closest neighboring reference sequences. Since these neighbors are already assigned to specific clades, their proximity provides evidence for the query sequence's clade assignment. The closer the neighbor, the stronger the evidence. The algorithm then assigns the query sequence to the clade if the calculated weight exceeds a predefined threshold.

    This neighbor-weighting mechanism relies on the evolutionary distances in the phylogenetic tree, where shorter branch lengths indicate closer genetic relationships. By focusing on nearby reference sequences, the algorithm effectively assigns query sequences to the most appropriate clades based on genetic similarity.

Benefits of Using MLCA for DENV Genotyping

The integration of MLCA within Dengue-GLUE offers a powerful and efficient tool for DENV genotyping. By leveraging the EPA feature of RAxML and the structured approach of MLCA, the method provides a high level of accuracy and computational efficiency, making it well-suited for large-scale sequence analysis in both research and clinical settings.