Skip to content

Genotyping Tools

Robert J. Gifford edited this page Sep 23, 2024 · 42 revisions

BLAST-Based Serotype Recognition in Dengue-GLUE

Dengue-GLUE provides functionality for rapidly assigning DENV sequences to serotypes, using BLAST, via the dengueSerotypeRecogniser module.

The dengueSerotypeRecogniser module is a custom implementation of GLUE's blastSequenceRecogniser module. Its config file can be found here.

To run the recogniser module, start by referencing the module by name and then providing the module-specific command you wish to execute, as shown here:

Mode path: /project/dengue
GLUE> module dengueSerotypeRecogniser recognise 

Remember you can use tab completion to see options for the next command element: for example, options for input to the recognise command will be shown when tab complete is used as shown below:

Mode path: /project/dengue
GLUE> module dengueSerotypeRecogniser recognise 
fasta-document    file              sequence

Below is an example of a complete command, in which the dengueSerotypeRecogniser module is used to perform recognition on aspecific sequence contained within the Dengue-GLUE project database. The sequence is referenced via a where clause and its unique sequenceID value:

GLUE> module dengueSerotypeRecogniser recognise sequence -w "sequenceID = 'FJ639811.1_2005'"

+======================================+============+===========+
|           querySequenceId            | categoryId | direction |
+======================================+============+===========+
| fasta-testseqs-denv1/FJ639811.1_2005 | 1          | FORWARD   |
+======================================+============+===========+

Results are shown. The 'categoryId' field gives the serotype.

Phylogeny-Based Genotyping in Dengue-GLUE

Dengue-GLUE supports a rapid and accurate method for phylogeny-based genotyping called Maximum Likelihood Clade Assignment (MLCA). This approach allows users to assign genotypes to query sequences based on their evolutionary placement within a predefined reference phylogeny. The MLCA method can be applied to individual sequences or in batch mode for multiple query sequences.

Overview of the MLCA Protocol

The MLCA protocol works by optimally positioning query sequences within a reference phylogeny using maximum likelihood (ML) techniques. It compares the query sequence with reference sequences that represent known clade categories, allowing for accurate classification based on phylogenetic placement.

MLCA requires the following inputs:

  1. Query Sequence: The nucleotide sequence for which the clade assignment is being determined.
  2. Reference Phylogeny: A previously generated phylogenetic tree that contains at least one representative sequence for each clade category.
  3. Multiple Sequence Alignment: The alignment used to construct the reference phylogeny.

How MLCA Works

  1. Alignment: The query sequence is first aligned to the existing multiple sequence alignment using the MAFFT program. This ensures that the query sequence is aligned in the same way as the reference sequences.

  2. Placement in Phylogeny: The extended alignment, including the newly aligned query sequence, is then passed along with the reference phylogeny to RAxML's Evolutionary Placement Algorithm (EPA). This algorithm calculates the most likely placement of the query sequence within the fixed reference tree.

  3. Clade Assignment: The output phylogeny from RAxML is then processed by the GLUE engine. Based on the query sequence's position within the phylogeny and its evolutionary distance from reference sequences, GLUE assigns the sequence to a clade.

Batch Processing

While MLCA can be used for a single query sequence, it is also capable of handling batches of sequences simultaneously. This makes it highly scalable for genotyping multiple sequences in a single run.

By integrating with tools such as MAFFT and RAxML, MLCA in Dengue-GLUE provides a robust and automated solution for genotyping dengue virus sequences, ensuring accurate and consistent clade assignment.