Skip to content
Robert J. Gifford edited this page Nov 2, 2024 · 5 revisions

A3-Evolution

Welcome to the A3-Evolution wiki!

Description

This GitHub repository contains data and program logic supporting comparative and phylogenetic investigations of APOBEC3 evolution. This resource was generated as part of a collaborative investigation by Kei Sato, Jumpei Ito, and Rob Gifford.


Nomenclature

A3 genes contain conserved regions called Z-domains. Z-domains can be used to group A3 genes into classes, as follows: there are three distinct types of Z-domain, labeled Z1-Z3, and A3 genes are modular in nature, consisting of either a single Z domain (Z1, Z2, or Z3) or some combination of two Z domains (e.g., Z2 and Z3). In the proposed nomenclature for A3 genes, they are labeled accordingly (e.g., A3Z2-A3Z3).

In some genomes, one or more of the three Z domains has been duplicated. In this case, duplicates are distinguished by adding lowercase letters (a, b, c, etc), labeling each duplicate alphabetically, proceeding in a 5' to 3' direction.


Sequence data

The sequence data in this project have been organized into the following sources:

  • ncbi-refseqs: mRNA reference sequences for A3 genes from distinct species. These XML-formatted files are downloaded directly from NCBI using a GLUE module (see here) and are uniquely identified within this project by their GenBank accession numbers.

  • fasta-curated: A non-redundant set of loci disclosing similarity to A3 Z-domains. These FASTA sequences have been curated via systematic screening of whole genome sequence (WGS) assemblies using the DIGS tool. Sequences in this source have unique IDs based on arbitrary numbering.

  • fasta-refseqs: Reference sequences of A3 genes not included in NCBI (e.g., pseudogenes).


Sequence-associated data

Sequences included in this project are linked to auxiliary data in tabular format, which includes:

  1. Sequence-associated metadata for the reference mRNA sequences in ncbi-refseqs.
  2. Locus data for the sequences in fasta-curated.

Multiple sequence alignments (MSAs)

Several distinct categories of MSA are included in this project, each representing a distinct taxonomic level.

  1. Tip MSAs: Alignments of A3 alleles from a single species (mRNAs)
  2. Internal MSAs: A3 genes (mRNAs) and A3 gene loci (genomic DNA)
  3. Root MSA capturing homology between distinct A3 genes

Scripts

Scripts used in the analysis can be found here.


Contributors


Related Publications

  • Ito J, Gifford RJ, and Kei Sato (2019)
    Retroviruses drive the rapid evolution of mammalian APOBEC3 genes.
    PNAS view

  • Zhu H, Dennis T, Hughes J, and RJ Gifford (2018)
    Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database.
    preprint


License

This project is licensed under the GNU Affero General Public License v. 3.0.

Clone this wiki locally