Skip to content

Why Use GLUE?

Robert J. Gifford edited this page Sep 18, 2024 · 1 revision

GLUE projects are equally suited for carrying out exploratory work (e.g. using virus genome data to investigate structural and functional properties of viruses) as they are for implementing operational procedures (e.g. producing standardised reports in a public or animal health setting).

Hosting of GLUE projects in an online version control system (e.g. GitHub) provides a mechanism for their stable, collaborative development, as shown below.

Lentivirus-GLUE offers a number of advantages for performing comparative sequence analysis of lentiviruses:

  1. Reproducibility.
    For many reasons, bioinformatics analyses are notoriously difficult to reproduce. The GLUE framework supports the implementation of fully reproducible comparative genomics through the introduction of data standards and the use of a relational database to capture the semantic links between data items.

  2. Reusable data objects and analysis logic.
    For many — if not most — comparative genomic analyses, data preparation is nine-tenths of the battle. The GLUE framework has been designed to ensure that work spent preparing high-value data items such as multiple sequence alignments need only be performed once. Hosting of GLUE projects in an online version control system such as GitHub allows for collaborative management of important data items and community testing of hypotheses.

  3. Validation.
    Building GLUE projects entails mapping the semantic links between data items (e.g. sequences, tabular data, multiple sequence alignments). This process provides an opportunity for cross-validation, and thereby enforces a high level of data integrity.

  4. Standardisation of the genomic coordinate space.
    GLUE projects allow all sequences to utilise the coordinate space of a chosen reference sequence. Contingencies associated with insertions and deletions (indels) are handled in a systematic way.

  5. Predefined, fully annotated reference sequences.
    This project includes fully-annotated reference sequences for major lineages within the Lentivirus family.

  6. Alignment trees.
    GLUE allows linking of alignments constructed at distinct taxonomic levels via an "alignment tree" data structure. In the alignment tree, each alignment is constrained to a standard reference sequence, thus all multiple sequence alignments are linked to one another via a standardised coordinate system.