From 98bc1ce50227ca931a7825039905408ddfbb631b Mon Sep 17 00:00:00 2001 From: Dima Molodenskiy Date: Wed, 15 Nov 2023 10:05:56 +0100 Subject: [PATCH] Use protein description instead of path to fasta. Updated instructions for developers --- Developing.md | 12 +++++------- example_3.md | 15 ++++++++------- 2 files changed, 13 insertions(+), 14 deletions(-) diff --git a/Developing.md b/Developing.md index 931d7cfa..3b886ac5 100644 --- a/Developing.md +++ b/Developing.md @@ -18,17 +18,15 @@ 1. Test your package during development using tests in ```test/```, e.g.: ``` pip install pytest - pytest - pytest test - python test/test_predict_structure.py - sbatch test/test_predict_structure.sh - python -m unittest test/test_predict_structure. + pytest -s test/ + pytest -s test/test_predictions_slurm.py + pytest -s test/test_features_with_templates.py::TestCreateIndividualFeaturesWithTemplates::test_1a_run_features_generation ``` 1. Before pushing to the remote or submitting pull request ``` pip install . - pytest test + pytest -s test/ ``` - to install the package and test + to install the package and test. Pytest for predictions only work if slurm is available. Check the created log files in your current directory. diff --git a/example_3.md b/example_3.md index d860a345..142d1b6d 100644 --- a/example_3.md +++ b/example_3.md @@ -7,21 +7,22 @@ This complex can not be modeled with vanilla AlphaFold Multimer, since it is a h Firstly, download sequences of NS1(Uniprot: [P03496](https://www.uniprot.org/uniprotkb/P03496/entry)) and P85B(uniprot:[P23726](https://www.uniprot.org/uniprotkb/P23726/entry)) proteins. Then download the multimeric template in either pdb or mmCIF format(PDB: [3L4Q](https://www.rcsb.org/structure/3L4Q)). Create directories named "fastas" and "templates" and put the sequences and pdb/cif files in the corresponding directories. -Finally, create a text file with features description (description.csv): +Finally, create a text file with description for generating features (description.csv). +**Please note**, the first column must be an exact copy of the protein description from your fasta files. Please consider shortening them in your favorite text editor for convenience. These names will be used to generate pickle files with monomeric features! +The description.csv for the NS1-P85B complex should look like: ``` -P03496.fasta, 3L4Q.cif, A -P23726.fasta, 3L4Q.cif, C +>sp|P03496|NS1_I34A1, 3L4Q.cif, A +>sp|P23726|P85B_BOVIN, 3L4Q.cif, C ``` In this example we refer to the NS1 protein as chain A and to the P85B protein as chain C in multimeric template 3L4Q.cif. -**Please note**, that your template will be renamed to a PDB code taken from *_entry_id*. If you use a *.pdb file instead of *.cif, AlphaPulldown will first try to parse the PDB code from the file. Then it will check if the filename is 4-letter long. If it is not, it will generate a random 4-letter code and use it instead. -*Please also note, that currently --use_mmseqs2 flag is not supported for this mode.* +**Please note**, that your template will be renamed to a PDB code taken from *_entry_id*. If you use a *.pdb file instead of *.cif, AlphaPulldown will first try to parse the PDB code from the file. Then it will check if the filename is 4-letter long. If it is not, it will generate a random 4-letter code and use it as the PDB code. Now run: ```bash create_individual_features_with_templates.py \ --description_file=description.csv \ - --path_to_fasta=fastas/ \ + --fasta_paths=fastas/P03496.fasta,fastas/P23726.fasta \ --path_to_mmt=templates/ \ --data_dir= \ --save_msa_files=False \ @@ -30,7 +31,7 @@ Now run: --max_template_date=2050-01-01 \ --skip_existing=False --seq_index= ``` - +It is also possible to combine all your fasta files into a single fasta file. ```create_individual_features_with_templates.py``` will compute the features similarly to the create_individual_features.py, but will utilize the provided templates instead of the PDB database. ------------------------