-
Notifications
You must be signed in to change notification settings - Fork 0
Genotyping Tools
Dengue-GLUE provides functionality for rapidly assigning DENV sequences to serotypes via the dengueSerotypeRecogniser
module.
The dengueSerotypeRecogniser
module is a custom implementation of GLUE's blastSequenceRecogniser module. This module uses BLAST and a set of category-associated reference sequences to assign sequences to categories. The config file for the dengueSerotypeRecogniser
module can be found here.
To run the recogniser module, start by referencing the module by name and then providing the module-specific command you wish to execute, as shown here:
Mode path: /project/dengue
GLUE> module dengueSerotypeRecogniser recognise
Remember you can use tab completion to see options for the next command element: for example, options for input to the recognise
command will be shown when tab complete is used as shown below:
Mode path: /project/dengue
GLUE> module dengueSerotypeRecogniser recognise
fasta-document file sequence
Below is an example of a complete command, in which the dengueSerotypeRecogniser
module is used to perform recognition on aspecific sequence contained within the Dengue-GLUE project database. The sequence is referenced via a where clause and its unique sequenceID
value:
GLUE> module dengueSerotypeRecogniser recognise sequence -w "sequenceID = 'FJ639811.1_2005'"
+======================================+============+===========+
| querySequenceId | categoryId | direction |
+======================================+============+===========+
| fasta-testseqs-denv1/FJ639811.1_2005 | 1 | FORWARD |
+======================================+============+===========+
Results are shown in tabular text format. The categoryId
field gives the serotype.
Dengue-GLUE supports a rapid and accurate method for phylogeny-based genotyping called Maximum Likelihood Clade Assignment (MLCA). This approach allows users to assign genotypes to query sequences based on their evolutionary placement within a predefined reference phylogeny. The MLCA method can be applied to individual sequences or in batch mode for multiple query sequences.
The MLCA protocol works by optimally positioning query sequences within a reference phylogeny using maximum likelihood (ML) techniques. It compares the query sequence with reference sequences that represent known clade categories, allowing for accurate classification based on phylogenetic placement.
MLCA requires the following inputs:
- Query Sequence: The nucleotide sequence for which the clade assignment is being determined.
- Reference Phylogeny: A previously generated phylogenetic tree that contains at least one representative sequence for each clade category.
- Multiple Sequence Alignment: The alignment used to construct the reference phylogeny.
-
Alignment: The query sequence is first aligned to the existing multiple sequence alignment using the MAFFT program. This ensures that the query sequence is aligned in the same way as the reference sequences.
-
Placement in Phylogeny: The extended alignment, including the newly aligned query sequence, is then passed along with the reference phylogeny to RAxML's Evolutionary Placement Algorithm (EPA). This algorithm calculates the most likely placement of the query sequence within the fixed reference tree.
-
Clade Assignment: The output phylogeny from RAxML is then processed by the GLUE engine. Based on the query sequence's position within the phylogeny and its evolutionary distance from reference sequences, GLUE assigns the sequence to a clade.
While MLCA can be used for a single query sequence, it is also capable of handling batches of sequences simultaneously. This makes it highly scalable for genotyping multiple sequences in a single run.
By integrating with tools such as MAFFT and RAxML, MLCA in Dengue-GLUE provides a robust and automated solution for genotyping dengue virus sequences, ensuring accurate and consistent clade assignment.