Skip to content

Latest commit

 

History

History
40 lines (28 loc) · 1.23 KB

README.md

File metadata and controls

40 lines (28 loc) · 1.23 KB

PhyTreeSearch

The goal of this project is to develop a tool that is able to extract "interesting" subtrees of large phylogenetic trees, that contain sequences of proteins (or protein fragments) at the leaves.

The user can specify (by editing a simple properties file) what counts to be "interesting" in terms of:

  • an amino-acid pattern occuring in leaf sequences
  • a minimum threshold 'P' so that only subtrees that contain at least 'P' percent of leaves matching the pattern are returned
  • a minimum tree size in terms of leaf count or tree height

The program works by reading in a properties file that defines the inputs, outputs, and other parameters. A sample configuration file:

treeFilesDir = /home/.../level_4/
fastaFilesDir = /home/.../clusters_fasta/
outputTreeFilesDir = /home/.../level_5/weka_c50_min7_55/
seqPattern = HD
minLeafNum = 7
minPatternPercent = 55
treeColors = yes

How to compile

You'll need java and gradle (1.6 or newer).

Then you can build with:

gradle build

How to use

Just call the compiled java code, and a pass it a properties file with your parameters and file paths. For an example, see the 'runExample.sh' script.