GenomeTools

Tools to analyse and extract genomic data.

1. assembly_stats.pl :

Perl and R script to get assembly/ fasta info. The R script is embedded in the Perl code and is used for plotting the histogram of contig lengths.

Syntax : ./assembly_stats.pl your_assembly.fasta min_contig_length_to_consider

Output :

L10-L90, N10-N90, histogram of contig/ scaffold lengths(need an installation of R), number of contigs, assembly size, longest contig length, mean contig length, total number of N's, average number of N's per contig, IUPAC bases other than ATGC(%), GC content(%).

2. get_scaffold.py

Python script to fetch a specific chromosome/ scaffold/ contig from a fasta file. Use Python v3 or above.

Syntax : python get_scaffold.py -i input.fasta -s scaffold_or_chromosome_name -o output.fasta

Output :

Fasta file with specified chromosome/ scaffold header and sequence.

3. rev_com.py

Python script to get complement, reverse or reverse complement of sequences in a fasta file. The masked sequences will not be affected and the case is maintained. Use Python v3 or above.

Syntax : python rev_com.py -i input.fasta -c choice -o output.fasta

(-c : 0 for reverse; 1 for complement; 2 for reverse complement)

Output :

Fasta file with desired operation performed on all the sequences in the file.

4. variant_calling_pipeline.sh

Shell script for complete variant calling process. This script requires path to GATK (jar file), PICARD (jar file) and a global installation of BWA (http://bio-bwa.sourceforge.net/). It will ask for an input of the names of the paired-end read files (fastq) and the genome file (fasta). The following steps will be carried out :

Genome indexing
Quality control and trimming of reads
Alignment of reads against the genome
Sorting SAM file by coordinate and conversion to BAM
Getting sequence depth
Building index for BAM file
Creating realignment targets
Realigning indels
Variant calling
Extracting SNPs and INDELS
Filtering SNPs and INDELS
Base Quality Score Recalibration (BQSR)
Calling final variants
Extracting final SNPs and INDELS
Final filtering of SNPs and INDELS

Syntax : bash variant_calling_pipeline.sh

Output :

VCF files containing SNPs and INDELS present in sample w.r.t. the genome.

5. syn_gen.pl

Perl script to create a synthetic genome/ chromosome using a random number generator. The program will ask for the number of bases required and a file name to write the output to. Only ATGC bases are printed in the file in random.

Syntax : ./syn_gen.py

Output :

A fasta file with a header consisting of a randomly generated genome.

6. shortest_common_superstring.pl

Perl script that can :

Find all possible k-mers in a given sequence
Find all unique k-mers
Find the shortest common superstring using the unique k-mers The program will ask for a string and the size of k-mers to compute the above.

Syntax : ./shortest_common_superstring.pl

Output :

Unique k-mers, Shortest Common Superstring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenomeTools

1. assembly_stats.pl :

Syntax : ./assembly_stats.pl your_assembly.fasta min_contig_length_to_consider

Output :

2. get_scaffold.py

Syntax : python get_scaffold.py -i input.fasta -s scaffold_or_chromosome_name -o output.fasta

Output :

3. rev_com.py

Syntax : python rev_com.py -i input.fasta -c choice -o output.fasta

Output :

4. variant_calling_pipeline.sh

Syntax : bash variant_calling_pipeline.sh

Output :

5. syn_gen.pl

Syntax : ./syn_gen.py

Output :

6. shortest_common_superstring.pl

Syntax : ./shortest_common_superstring.pl

Output :

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
assembly_stats.pl		assembly_stats.pl
get_scaffold.py		get_scaffold.py
rev_com.py		rev_com.py
shortest_common_superstring.pl		shortest_common_superstring.pl
syn_gen.pl		syn_gen.pl
variant_calling_pipeline.sh		variant_calling_pipeline.sh

License

parv-sachdeva/GenomeTools

Folders and files

Latest commit

History

Repository files navigation

GenomeTools

1. assembly_stats.pl :

Syntax : ./assembly_stats.pl your_assembly.fasta min_contig_length_to_consider

Output :

2. get_scaffold.py

Syntax : python get_scaffold.py -i input.fasta -s scaffold_or_chromosome_name -o output.fasta

Output :

3. rev_com.py

Syntax : python rev_com.py -i input.fasta -c choice -o output.fasta

Output :

4. variant_calling_pipeline.sh

Syntax : bash variant_calling_pipeline.sh

Output :

5. syn_gen.pl

Syntax : ./syn_gen.py

Output :

6. shortest_common_superstring.pl

Syntax : ./shortest_common_superstring.pl

Output :

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages