Skip to content

JoanaMPereira/CASP14_high_accuracy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

CASP14_high_accuracy

This repository keeps the scripts used for high-accuracy assessment by the Lupas team in CASP14.

Content

There are 2 folders:

  • Code: Contains the python scripts used to compile and analyse data from the Prediction Center, as well as the jupyter notebook for data analysis and a script for the calculation of the DipDiff score given two structure of the same protein
  • Data: Contains the data compiled as well the scores calculated for individual targets, for CASP14, CASP13 and CASP12

The main different scripts

1. analyse_casp.py

The script that collects targets and models from the Prediction Center, finds templates from HHsearch results provided by the Prediction Center, and computes addicional scores (e.g. DipDiff). This script generates the files in the "out_tables" folder in the the "Data/CASPx" (where x is a CASP identifier) folder.

External tools required to run it:

Special python modules:

  • pandas
  • seaborn
  • matplotlib
  • BioPython
  • scipy
  • numpy
  • pdbe api

Please edit the paths to these tools in the header of the script.

ATTENTION: this script requires an assessor username and password. Replace USERNAME and PASSWORD by your assessor information in the following line: 156 password_mgr.add_password(None, top_level_url, 'USERNAME', 'PASSWORD')

How to run:

ulimit -s unlimited; python3 analyse_casp.py -casp 14 -tmp /tmp/

The only input needed to run this script is a casp identifier (e.g. CASP14, casp14 or 14). Only CASP12, 13 or 14 are possible. For parallelization, it can also take the n_threads parameter. It will then download all targets and the models and all computations will be carried out in the tmp folder as it generates multiple files while runnning. Also, you have to set the memory limit to unlimited due to LGA.

2. assess_casp.py

This script shall be run after analyse_casp.py to compute Z- and ranking scores, and collect method information from abstracts. It will generate a unique table file that can then be processed and analysed with the jupyter notebook.

External tools required to run it:

Special python modules:

  • pandas
  • matplotlib
  • numpy

How to run:

python3 assess_casp.py -casp 14 -tmp /tmp/

This will take all the data computed for the input casp with analyse_casp.py and saved in the tmp folder, and compute Z-scores and generate a unique data table.

3. DipDiff.py

This script takes 2 pdb files as input and computes the DipDiff. If mode "template" is passed, it will find residue correspondencies with LGA.

External tools required to run it:

How to run:

python3 dipdiff.py -target target.pdb -model model.pdb

If the model is a template, add -mode template and run instead:

ulimit -s unlimited; python3 dipdiff.py -target target.pdb -model model.pdb -mode template

About

Code and data for high accuracy assessment in CASP14

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published