Skip to content
Robert J. Gifford edited this page Nov 3, 2024 · 5 revisions

A3-Evolution

Welcome to the A3-Evolution wiki!

Description

This GitHub repository contains data and program logic supporting comparative and phylogenetic investigations of APOBEC3 evolution. This resource was generated as part of a collaborative investigation by Kei Sato, Jumpei Ito, and Rob Gifford.


Nomenclature

A3 genes contain conserved regions called Z-domains. Z-domains can be used to group A3 genes into classes, as follows: there are three distinct types of Z-domain, labeled Z1-Z3, and A3 genes are modular in nature, consisting of either a single Z domain (Z1, Z2, or Z3) or some combination of two Z domains (e.g., Z2 and Z3). In the proposed nomenclature for A3 genes, they are labeled accordingly (e.g., A3Z2-A3Z3).

In some genomes, one or more of the three Z domains has been duplicated. In this case, duplicates are distinguished by adding lowercase letters (a, b, c, etc), labeling each duplicate alphabetically, proceeding in a 5' to 3' direction.


Sequence data

The sequence data in this project have been organized into the following sources:

  • ncbi-refseqs: mRNA reference sequences for A3 genes from distinct species. These XML-formatted files are downloaded directly from NCBI using a GLUE module (see here) and are uniquely identified within this project by their GenBank accession numbers.

  • fasta-curated: A non-redundant set of loci disclosing similarity to A3 Z-domains. These FASTA sequences have been curated via systematic screening of whole genome sequence (WGS) assemblies using the DIGS tool. Sequences in this source have unique IDs based on arbitrary numbering.

  • fasta-refseqs: Reference sequences of A3 genes not included in NCBI (e.g., pseudogenes).


Sequence-associated data

Sequences included in this project are linked to auxiliary data in tabular format, which includes:

  1. Sequence-associated metadata for the reference mRNA sequences in ncbi-refseqs.
  2. Locus data for the sequences in fasta-curated.

Multiple sequence alignments (MSAs)

Several distinct categories of MSA are included in this project, each representing a distinct taxonomic level.

  1. Tip MSAs: Alignments of A3 alleles from a single species (mRNAs)
  2. Internal MSAs: A3 genes (mRNAs) and A3 gene loci (genomic DNA)
  3. Root MSA capturing homology between distinct A3 genes

Scripts

Scripts used in the analysis can be found here.


License

This project is licensed under the GNU Affero General Public License v. 3.0.