-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathManual_addition_of_reads_to_CRUX.txt
105 lines (95 loc) · 7.29 KB
/
Manual_addition_of_reads_to_CRUX.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
################################################
# Manually building a CRUX compatible reference library or adding reads to a preexisting library.
# Written By Emily Curd
# Updated July 27, 2018
################################################
----------------------------
All Crux libraries require a specific directory structure:
* Make the following directories and subdirectories
-> metabarcode
-> -> metabarcode_bowtie2_database
-> -> metabarcode_fasta_and_taxonomy
e.g
-> 16S
-> -> 16S_bowtie2_database
-> -> 16S_fasta_and_taxonomy
To make the directories and subdirectories run the following:
mkdir -p path_to_directory_where_you_want_your_new_Crux_library/<metabarcode>
mkdir -p path_to_directory_where_you_want_your_new_Crux_library/<metabarcode>/<metabarcode>_bowtie2_database
mkdir -p path_to_directory_where_you_want_your_new_Crux_library/<metabarcode>/<metabarcode>_fasta_and_taxonomy
----------------------------
##########################################
# To manually build a CRUX compatible reference library
################################################
----------------------------
* If you have a fasta formatted file with NCBI version accession numbers
e.g. >LC056757.1
GAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCAGTGGTTCA
1. Add your fasta file to the <metabarcode>_fasta_and_taxonomy subdirectory
2. Rename the fasta file to the following <metabarcode>_.fasta
3. Run entrez_qiime.py to build the taxonomy list. https://github.com/bakerccm/entrez_qiime
The command is as follows:
python <path_to>/entrez_qiime.py -i <path_to>/<metabarcode>/<metabarcode>_fasta_taxonomy/<metabarcode>_example_.fasta -o <path_to>/<metabarcode>/<metabarcode>_fasta_taxonomy/<metabarcode>_taxonomy -n <path_to_ncbi_Taxo_dump_directory>/TAXO -r <path_to_ncbi_accession2taxonomy_directory>/accession2taxonomy/nucl_gb.accession2taxid -r superkingdom,phylum,class,order,family,genus,species
4. Build a bowtie2 compatible index file for the the <metabarcode>_.fasta file
The command is as follows:
bowtie2-build -f <metabarcode>_.fasta <path_to>/<metabarcode>/<metabarcode>_bowtie2_database/<metabarcode>bowtie2_index
5. Now the database is ready to run in Anacapa
----------------------------
* If you have a fasta file but no NCBI version accession numbers
e.g. >my_read_1
GAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCAGTGGTTCA
1. Add your fasta file to the <metabarcode>_fasta_and_taxonomy subdirectory
2. Rename the fasta file to the following <metabarcode>_.fasta
3. Manually generate a taxonomy table with the following name: <metabarcode>_taxonomy.
The file structure needs to include:
name of the read superkingdom;phylum;class;order;family;genus;species
e.g.
NR_117000.1 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwinia aphidicola
NR_122095.1 Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Methylophilaceae;Methylophilus;Methylophilus glucosoxydans
NR_115689.1 Bacteria;Proteobacteria;Gammaproteobacteria;Chromatiales;Chromatiaceae;Arsukibacterium;Arsukibacterium ikkense
4. Build a bowtie2 compatible index file for the the <metabarcode>_.fasta file
The command is as follows:
bowtie2-build -f <metabarcode>_.fasta <path_to>/<metabarcode>/<metabarcode>_bowtie2_database/<metabarcode>bowtie2_index
5. Now the database is ready to run in Anacapa
################################################
# To add sequences to a current CRUX reference library
################################################
----------------------------
* If you have a fasta formatted file with NCBI version accession numbers and a preexisting CRUX reference database
e.g. >LC056757.1
GAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCAGTGGTTCA
1. You can either start a new CRUX database by making new directories with unique names, or delete the contents of all subdirectories with the exception of the <metabarcode>_.fasta file
2. Concatenate your fasta file with the existing <metabarcode>_.fasta file in the <metabarcode>_fasta_and_taxonomy subdirectory
You can do this by running the following command:
cat <path_to_the_original_file>/<metabarcode>_.fasta <path_to_your_file>/myfile.fasta > <path_to_the_original_file>/<metabarcode>_.fasta
3. Make sure the fasta file has the final naming convention <metabarcode>_.fasta
4. Run entrez_qiime.py to build the taxonomy list from the file just created. https://github.com/bakerccm/entrez_qiime
The command is as follows:
python <path_to>/entrez_qiime.py -i <path_to>/<metabarcode>/<metabarcode>_fasta_taxonomy/<metabarcode>_example_.fasta -o <path_to>/<metabarcode>/<metabarcode>_fasta_taxonomy/<metabarcode>_taxonomy -n <path_to_ncbi_Taxo_dump_directory>/TAXO -r <path_to_ncbi_accession2taxonomy_directory>/accession2taxonomy/nucl_gb.accession2taxid -r superkingdom,phylum,class,order,family,genus,species
5. Build a bowtie2 compatible index file for the the <metabarcode>_.fasta file
The command is as follows:
bowtie2-build -f <metabarcode>_.fasta <path_to>/<metabarcode>/<metabarcode>_bowtie2_database/<metabarcode>bowtie2_index
6. Now the database is ready to run in Anacapa
----------------------------
* If you have a fasta file but no NCBI version accession numbers and a preexisting CRUX reference database
e.g. >my_read_1
GAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCAGTGGTTCA
1. You can either start a new CRUX database by making new directories with unique names, or delete the contents of all subdirectories with the exception of the <metabarcode>_.fasta file and the <metabarcode>_taxonomy.txt file
2. Concatenate your fasta file with the existing <metabarcode>_.fasta file in the <metabarcode>_fasta_and_taxonomy subdirectory
You can do this by running the following command:
cat <path_to_the_original_file>/<metabarcode>_.fasta <path_to_your_file>/myfile.fasta > <path_to_the_original_file>/<metabarcode>_.fasta
3. Make sure the fasta file has the final naming convention <metabarcode>_.fasta
4. Manually generate a taxonomy table with the following name: <metabarcode>_taxonomy.
The file structure needs to include:
name of the read superkingdom;phylum;class;order;family;genus;species
e.g.
NR_117000.1 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Erwinia;Erwinia aphidicola
NR_122095.1 Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Methylophilaceae;Methylophilus;Methylophilus glucosoxydans
NR_115689.1 Bacteria;Proteobacteria;Gammaproteobacteria;Chromatiales;Chromatiaceae;Arsukibacterium;Arsukibacterium ikkense
5. Concatenate your taxonomy file with the existing <metabarcode>_taxonomy.txt file in the <metabarcode>_fasta_and_taxonomy subdirectory
You can do this by running the following command:
cat <path_to_the_original_file>/<metabarcode>_taxonomy.txt <path_to_your_file>/myfile_taxonomy.txt > <path_to_the_original_file>/<metabarcode>_taxonomy.txt
6. Build a bowtie2 compatible index file for the the <metabarcode>_.fasta file
The command is as follows:
bowtie2-build -f <metabarcode>_.fasta <path_to>/<metabarcode>/<metabarcode>_bowtie2_database/<metabarcode>bowtie2_index
7. Now the database is ready to run in Anacapa