-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the A3-Evolution wiki!
This GitHub repository contains data and program logic supporting comparative and phylogenetic investigations of APOBEC3 evolution. This resource was generated as part of a collaborative investigation by Kei Sato, Jumpei Ito, and Rob Gifford.
A3 genes contain conserved regions called Z-domains. Z-domains can be used to group A3 genes into classes, as follows: there are three distinct types of Z-domain, labeled Z1-Z3, and A3 genes are modular in nature, consisting of either a single Z domain (Z1, Z2, or Z3) or some combination of two Z domains (e.g., Z2 and Z3). In the proposed nomenclature for A3 genes, they are labeled accordingly (e.g., A3Z2-A3Z3).
In some genomes, one or more of the three Z domains has been duplicated. In this case, duplicates are distinguished by adding lowercase letters (a, b, c, etc), labeling each duplicate alphabetically, proceeding in a 5' to 3' direction.
The sequence data in this project have been organized into the following sources:
-
ncbi-refseqs: mRNA reference sequences for A3 genes from distinct species. These XML-formatted files are downloaded directly from NCBI using a GLUE module (see here) and are uniquely identified within this project by their GenBank accession numbers.
-
fasta-curated: A non-redundant set of loci disclosing similarity to A3 Z-domains. These FASTA sequences have been curated via systematic screening of whole genome sequence (WGS) assemblies using the DIGS tool. Sequences in this source have unique IDs based on arbitrary numbering.
-
fasta-refseqs: Reference sequences of A3 genes not included in NCBI (e.g., pseudogenes).
Sequences included in this project are linked to auxiliary data in tabular format, which includes:
- Sequence-associated metadata for the reference mRNA sequences in ncbi-refseqs.
- Locus data for the sequences in fasta-curated.
Several distinct categories of MSA are included in this project, each representing a distinct taxonomic level.
- Tip MSAs: Alignments of A3 alleles from a single species (mRNAs)
- Internal MSAs: A3 genes (mRNAs) and A3 gene loci (genomic DNA)
- Root MSA capturing homology between distinct A3 genes
Scripts used in the analysis can be found here.
This project is licensed under the GNU Affero General Public License v. 3.0.