Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

True multimer #190

Merged
merged 134 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
7b99544
Update README.md
Sep 22, 2022
cc3a03e
Update README.md
Sep 22, 2022
dfe0887
Added alphafold just to have working version in one place
Jun 14, 2023
baf8538
True multimer 2.3.2
DimaMolod Jun 14, 2023
df08a70
Merge branch 'TrueMultimer-2.3.2' into 'master'
DimaMolod Jun 14, 2023
b987d28
New.
Jun 14, 2023
9981dfb
Merge branch 'master' of git.embl.de:grp-kosinski/AlphaPulldown
Jun 14, 2023
4425307
Garbage cleaning
Jun 16, 2023
0da6c21
AF,CB and af2plots are removed as directories to be added later as su…
Jul 3, 2023
84a19ae
Latest version of AP from github
Jul 3, 2023
5f20d21
Submodules are removed
Jul 3, 2023
5fea1d6
Added submodules, alphafold is with TM changes
Jul 3, 2023
0aee614
Update remote URL for AF submodule
Jul 3, 2023
fe5844d
Added TM features
Jul 3, 2023
2cc7edc
New.
DimaMolod Jul 3, 2023
215c47d
Reverted to Geoffrey's version
Jul 3, 2023
3223494
Merged with github
Jul 3, 2023
eccb0e5
Merging with github
Jul 3, 2023
3408173
Sync with github
DimaMolod Jul 21, 2023
d3a19ad
Sequential implementation for batch mode
DimaMolod Aug 1, 2023
2920c1d
Added new script for mmt
DimaMolod Aug 1, 2023
f803221
Parallel implementation
DimaMolod Aug 2, 2023
b2826e9
use create_db() also as a standalone script
DimaMolod Aug 2, 2023
bde8185
Multiple templates for each fasta file
DimaMolod Aug 3, 2023
68c4a0a
Keep templates with identical sequences for TrueMultimer
DimaMolod Aug 7, 2023
8f49d79
Reduce MSA depth for each next model if run with --multimeric_mode=True
DimaMolod Aug 8, 2023
3d497c9
use np.linspace() for msa depths
DimaMolod Aug 9, 2023
bb963be
c_i_f_w_t.py: Reuse functions from c_i_f.py; c_i_f.py: Remove mmt, ba…
DimaMolod Aug 10, 2023
f47546d
Remove temp files
DimaMolod Aug 14, 2023
7c6bc3d
Forking.md removed
DimaMolod Aug 14, 2023
c04827a
Commented out saving multichain mask used for debugging
DimaMolod Aug 14, 2023
086fa7f
Changed starting values for num_msa and num_extra_msa for TrueMultimer
DimaMolod Aug 14, 2023
2502017
First version of json metadata file. Added __version__
DimaMolod Aug 15, 2023
607835a
Cosmetic improvements
DimaMolod Aug 15, 2023
9fbd12a
New more detailed meta.json
DimaMolod Aug 16, 2023
4d41082
Save time to meta.json if mmseq2=True
DimaMolod Aug 17, 2023
e1a7ec6
Removed accidently added colabfold script
DimaMolod Aug 21, 2023
252724f
Update .gitmodules
dingquanyu Aug 21, 2023
0394839
Revert file to version from main branch
DimaMolod Aug 22, 2023
8e3c1b8
Merge branch 'TrueMultimer' of github.com:KosinskiLab/AlphaPulldown i…
DimaMolod Aug 22, 2023
368f7d9
Revert files to version from main branch
DimaMolod Aug 22, 2023
4afc9d0
Update run_multimer_jobs.py
dingquanyu Aug 22, 2023
d92f8e6
Updated submodule URL
DimaMolod Aug 25, 2023
c30512c
Update to the latest AlphaFold version
DimaMolod Aug 25, 2023
c6bdbb1
Remove version for openmm and pdbfixer
DimaMolod Aug 25, 2023
cf9116b
Save only particular chain to db using biopython
DimaMolod Aug 25, 2023
06dd0ab
fix truncated file, not original
DimaMolod Aug 25, 2023
ed5a85a
metadata output is json
DimaMolod Aug 25, 2023
31bdd8e
no truncation of the templates by default
DimaMolod Aug 25, 2023
fba4baa
Remove all lines with UNK residue from the mmcif string for templates
DimaMolod Aug 28, 2023
a1bcea4
Replace entity_poly_seq table by full sequence for templates
DimaMolod Aug 31, 2023
26f2e3b
Update packages, python 3.1
DimaMolod Aug 31, 2023
b15bead
Update alphafold to the latest commit from the main branch
DimaMolod Sep 4, 2023
f2056f8
Update URL for colabfold
DimaMolod Sep 4, 2023
7852ba2
Update README.md
dingquanyu Sep 5, 2023
c6e74ae
Add a flag to model with different MSA depth. Switched off by default
DimaMolod Sep 6, 2023
627a188
Add AF version to meta.json
DimaMolod Sep 6, 2023
5e754e5
Simplified meta.json structure
DimaMolod Sep 8, 2023
19a6ee3
update the urls with https
DimaMolod Sep 11, 2023
f41ea6a
Fixed parsing seqs from cif when chaid id is wrong
DimaMolod Sep 11, 2023
9a3a5bd
Unification: always pass first chain id, then seq
DimaMolod Sep 11, 2023
d28f8b5
Oops
DimaMolod Sep 11, 2023
e03e565
Modify to_mmcif() for TrueMultimer
DimaMolod Sep 13, 2023
d8a60ff
Reprt num_msa and num_extra_msa in gradient mode
DimaMolod Sep 13, 2023
fed7fc3
Adapt to_mmcif() for use with TrueMultimer
DimaMolod Sep 14, 2023
8d17eb5
Point to commit 768f35d
DimaMolod Sep 18, 2023
1b12c8b
Different MSA depth for different predictions
DimaMolod Sep 19, 2023
a3b9165
Use a custom class for handling biopython structure
DimaMolod Sep 19, 2023
e34b950
PEP8
DimaMolod Sep 19, 2023
0a19074
fix bug wrong longic with author chain
DimaMolod Sep 19, 2023
e454652
pytest for create_fake_database.py
DimaMolod Sep 26, 2023
2169185
Check number of atoms and that atoms contain only label ids from poly…
DimaMolod Sep 26, 2023
900fc12
Debug printing
DimaMolod Sep 26, 2023
2fc7dfd
Check integrity of _atom_site.label_seq_id in the template mmcifs
DimaMolod Sep 27, 2023
d4b20d7
Added test for a bare pdb template
DimaMolod Sep 27, 2023
cd42cba
Remove clashing/low_plddt residues from seqres_to_structure for consi…
DimaMolod Sep 27, 2023
ec4d364
run_multimer: added model_names and msa_depth flags
DimaMolod Sep 29, 2023
d0acd9e
New.
DimaMolod Sep 29, 2023
7a237e3
run_multimer.py: update --gradient_msa_depth description
DimaMolod Oct 2, 2023
95e4250
Update colabfold
DimaMolod Oct 2, 2023
2438c92
New.
DimaMolod Oct 2, 2023
4095d95
just renamed fake to custom :-)
DimaMolod Oct 4, 2023
979b30a
Copy templates to the custom templates to avoid conflicts with files …
DimaMolod Oct 4, 2023
a22dc69
check that features are generated for mmt using subprocess
DimaMolod Oct 4, 2023
a701936
check that template seq is not None, atom positions are not all zeros
DimaMolod Oct 4, 2023
df5e165
Check non-zero coordinates for backbone atoms for non-gap residues in…
DimaMolod Oct 5, 2023
899d1f3
Parse seqres from pdb template using SeqIO
DimaMolod Oct 6, 2023
772e21e
Residues with non-zero atom coords are identical for seqatom and seqres
DimaMolod Oct 6, 2023
7eab024
benchmark models for testing truemultimer
DimaMolod Oct 9, 2023
ab94027
Only for identical number of atoms for tests
DimaMolod Oct 9, 2023
2411d57
Create uniprot_runner, otherwise test always fails
DimaMolod Oct 9, 2023
5ae2377
Remove unused imports
DimaMolod Oct 9, 2023
9781549
Added test for truemultimer
DimaMolod Oct 9, 2023
b99d416
Run TM tests on 3L4Q
DimaMolod Oct 9, 2023
bb258fc
test/check_predict_structure.py
DimaMolod Oct 10, 2023
922c7a2
Check that the best RMSD from 3 models is below 3A
DimaMolod Oct 10, 2023
89fd217
Tests for custom msa depth
DimaMolod Oct 10, 2023
60c5f9a
New
DimaMolod Oct 10, 2023
dfe6855
Bumped to v1.00.0
DimaMolod Oct 10, 2023
8441991
sync names with the main branch
DimaMolod Oct 10, 2023
3615451
Merge remote-tracking branch 'origin/main' into TrueMultimer
DimaMolod Oct 10, 2023
a3a7c61
Updated pipeline
DimaMolod Oct 16, 2023
8d24dff
Merge branch 'main' into TrueMultimer
dingquanyu Oct 16, 2023
28d9df4
Merge branch 'main' of github.com:KosinskiLab/AlphaPulldown into True…
DimaMolod Oct 16, 2023
1356255
Fix bug in create_model_runners for monomers
DimaMolod Oct 16, 2023
5b65e3f
Generate random name for pdb without header and fn>4. More tests
DimaMolod Oct 17, 2023
0f8eea2
1.Correct meta.json and 2.test for pdb with crazy filename
DimaMolod Oct 18, 2023
a9801af
Fix assert: code for pdbs is generated randomly each time
DimaMolod Oct 18, 2023
d48e0cc
All tests are run with a single command'
DimaMolod Oct 19, 2023
79055cb
Added requirements for pdb filenames
DimaMolod Oct 19, 2023
966633c
Major update of dependencies and structure
DimaMolod Oct 20, 2023
49d7dd1
New. For developing
DimaMolod Oct 23, 2023
cf36ac8
temp disable falling tests
DimaMolod Oct 24, 2023
d340c4e
New. environment.yml for development, meta.yaml for installation
DimaMolod Oct 24, 2023
78b2eee
Fix wrong path to pickles in test 6
DimaMolod Oct 24, 2023
dba0f4c
Push to pypi-test on release
DimaMolod Oct 24, 2023
457aa43
New. Print doctstrings upon execution
DimaMolod Oct 24, 2023
83faf52
All sub modules must be added explicitly to setup.cfg
DimaMolod Oct 25, 2023
6c3edbe
update template searching process when running mmseq2 mode and using …
dingquanyu Oct 27, 2023
cc48155
Added the failing gappy PDB custom template + minor fix in test_featu…
jkosinski Nov 5, 2023
4de40e2
Symlink individual files instead of the entire directory
jkosinski Nov 5, 2023
accd168
Merge pull request #197 from KosinskiLab/gappy_pdb
DimaMolod Nov 7, 2023
69bf6fa
Changes to meta.json
DimaMolod Nov 8, 2023
c6c9bef
revert to load_module for parsing af flags
DimaMolod Nov 13, 2023
404fd65
Fix for gappy test pdb
DimaMolod Nov 13, 2023
35c6240
Clean dynamically created files for test_features_with_templates and …
DimaMolod Nov 13, 2023
1ac7012
Use protein descritpions instead of fasta files in description.csv
DimaMolod Nov 14, 2023
98bc1ce
Use protein description instead of path to fasta. Updated instruction…
DimaMolod Nov 15, 2023
741aa0e
Fixed double removal of > in fasta description
DimaMolod Nov 15, 2023
759964e
Add example slurm script for features generation
DimaMolod Nov 15, 2023
54dd2bd
Added missing '\' to slurm scripts
DimaMolod Nov 15, 2023
aa8beb9
MmcifChainFiltered takes now pathlib, not string
DimaMolod Nov 15, 2023
768fc68
Checkout from main
DimaMolod Nov 15, 2023
0ed62cd
Merge branch 'main' into TrueMultimer
DimaMolod Nov 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 5 additions & 7 deletions Developing.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,15 @@
1. Test your package during development using tests in ```test/```, e.g.:
```
pip install pytest
pytest
pytest test
python test/test_predict_structure.py
sbatch test/test_predict_structure.sh
python -m unittest test/test_predict_structure.<name of the test>
pytest -s test/
pytest -s test/test_predictions_slurm.py
pytest -s test/test_features_with_templates.py::TestCreateIndividualFeaturesWithTemplates::test_1a_run_features_generation
```
1. Before pushing to the remote or submitting pull request
```
pip install .
pytest test
pytest -s test/
```
to install the package and test
to install the package and test. Pytest for predictions only work if slurm is available. Check the created log files in your current directory.


3 changes: 1 addition & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
include ./alphafold/run_alphafold.py
include stereo_chemical_props.txt
include stereo_chemical_props.txt
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,8 @@ conda create -n AlphaPulldown -c omnia -c bioconda -c conda-forge python==3.10 o
**Secondly**, activate the AlphaPulldown environment and install AlphaPulldown
```bash
source activate AlphaPulldown
python3 -m pip install alphapulldown==0.40.4

python3 -m pip install alphapulldown==1.0.0
pip install jax==0.3.25 jaxlib==0.3.25+cuda11.cudnn805 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```

Expand Down
2 changes: 1 addition & 1 deletion alphapulldown/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.00.0"
__version__ = "1.0.0"
73 changes: 15 additions & 58 deletions alphapulldown/create_custom_template_db.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@

import os
import shutil
import sys
import random
import string
from pathlib import Path
from absl import logging, flags, app
from alphapulldown.remove_clashes_low_plddt import MmcifChainFiltered
from colabfold.batch import validate_and_fix_mmcif, convert_pdb_to_mmcif
from colabfold.batch import validate_and_fix_mmcif
from alphafold.common.protein import _from_bio_structure, to_mmcif
from Bio import SeqIO, PDB

FLAGS = flags.FLAGS

Expand Down Expand Up @@ -47,10 +47,11 @@ def parse_code(template):
for line in f:
if line.startswith("_entry.id"):
code = line.split()[1]
if len(code) != 4:
logging.error(f'Error for template {template}!\n'
f'Code must have 4 characters but is {code}\n')
sys.exit(1)

# Generate a random 4-character code if needed
if len(code) != 4:
code = ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(4))

return code.lower()


Expand Down Expand Up @@ -90,40 +91,6 @@ def create_tree(pdb_mmcif_dir, mmcif_dir, seqres_dir, templates_dir):
create_dir_and_remove_files(seqres_dir, ['pdb_seqres.txt'])


def extract_seqs(template, chain_id):
"""
Extract sequences from PDB/CIF file using Bio.SeqIO.
o input_file_path - path to the input file
o chain_id - chain ID
Returns:
o sequence_atom - sequence from ATOM records
o sequence_seqres - sequence from SEQRES records
"""
file_type = template.suffix.lower()

if template.suffix.lower() != '.pdb' and template.suffix.lower() != '.cif':
raise ValueError(f"Unknown file type for {template}!")

format_types = [f"{file_type[1:]}-atom", f"{file_type[1:]}-seqres"]
# initialize the sequences
sequence_atom = None
sequence_seqres = None
# parse
for format_type in format_types:
for record in SeqIO.parse(template, format_type):
chain = record.annotations['chain']
if chain == chain_id:
if format_type.endswith('atom'):
sequence_atom = str(record.seq)
elif format_type.endswith('seqres'):
sequence_seqres = str(record.seq)
if sequence_atom is None:
logging.error(f"No atom sequence found for chain {chain_id}")
if sequence_seqres is None:
logging.warning(f"No SEQRES sequence found for chain {chain_id}")
return sequence_atom, sequence_seqres


def create_db(out_path, templates, chains, threshold_clashes, hb_allowance, plddt_threshold):
"""
Main function that creates a custom template database for AlphaFold2
Expand All @@ -146,30 +113,20 @@ def create_db(out_path, templates, chains, threshold_clashes, hb_allowance, pldd
# Process each template/chain pair
for template, chain_id in zip(templates, chains):
code = parse_code(template)
logging.info(f"Template code: {code}")
assert len(code) == 4
# Copy the template to out_path to avoid conflicts with the same file names
shutil.copyfile(template, templates_dir / Path(template).name)
template = templates_dir / Path(template).name
logging.info(f"Processing template: {template} Chain {chain_id} Code: {code}")
logging.info("Parsing SEQRES...")
atom_seq, seqres_seq = None, None
if template.suffix == '.pdb':
atom_seq, seqres_seq = extract_seqs(template, chain_id)
logging.info(f"Converting to mmCIF: {template}")
template = Path(template)
convert_pdb_to_mmcif(template)
template = template.parent.joinpath(f"{template.stem}.cif")
new_template = templates_dir / Path(code + Path(template).suffix)
shutil.copyfile(template, new_template)
template = new_template
logging.info(f"Processing template: {template} Chain {chain_id}")
# Convert to (our) mmcif object
mmcif_obj = MmcifChainFiltered(template, code, chain_id)
# Parse SEQRES
# full sequence is either SEQRES or parsed from (original) ATOMs
if mmcif_obj.sequence_seqres:
seqres = mmcif_obj.sequence_seqres
else:
seqres = mmcif_obj.sequence_atom
# if we converted from pdb, seqres is parsed from Bio.SeqIO
if seqres_seq or atom_seq:
seqres = seqres_seq
if seqres is None:
seqres = atom_seq
sqrres_path = save_seqres(code, chain_id, seqres, seqres_dir)
logging.info(f"SEQRES saved to {sqrres_path}!")
# Remove clashes and low pLDDT regions for each template
Expand Down
51 changes: 7 additions & 44 deletions alphapulldown/create_individual_features.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,59 +4,27 @@
# This script is just to create msa and structural features for each sequences and store them in pickle
# #

import os
import pickle
import sys
from alphapulldown.objects import MonomericObject
import importlib
from absl import app
from absl import flags
from absl import logging

from alphafold.data.pipeline import DataPipeline
from alphafold.data.tools import hmmsearch
from alphafold.data import templates
import numpy as np
import os
from absl import logging, app
import numpy as np
from alphapulldown.utils import *
from alphapulldown.utils import save_meta_data, create_uniprot_runner, parse_fasta, get_flags_from_af
import contextlib
from datetime import datetime
import alphafold
from pathlib import Path
from colabfold.utils import DEFAULT_API_SERVER
import os
import sys
import pickle

@contextlib.contextmanager
def output_meta_file(file_path):
"""function that create temp file"""
with open(file_path, "w") as outfile:
yield outfile.name


def load_module(file_name, module_name):
spec = importlib.util.spec_from_file_location(module_name, file_name)
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module


PATH_TO_RUN_ALPHAFOLD = os.path.join(
os.path.dirname(alphafold.__file__), "run_alphafold.py"
)

try:
run_af = load_module(PATH_TO_RUN_ALPHAFOLD, "run_alphafold")
except FileNotFoundError:
PATH_TO_RUN_ALPHAFOLD = os.path.join(
os.path.dirname(os.path.dirname(alphafold.__file__)), "run_alphafold.py"
)

run_af = load_module(PATH_TO_RUN_ALPHAFOLD, "run_alphafold")


flags = run_af.flags
flags = get_flags_from_af()
flags.DEFINE_bool("save_msa_files", False, "save msa output or not")
flags.DEFINE_bool(
"skip_existing", False, "skip existing monomer feature pickles or not"
Expand Down Expand Up @@ -221,11 +189,7 @@ def create_and_save_monomer_objects(m, pipeline, flags_dict,use_mmseqs2=False):
else:
logging.info("running mmseq now")
m.make_mmseq_features(DEFAULT_API_SERVER=DEFAULT_API_SERVER,
pdb70_database_path=pdb70_database_path,
template_mmcif_dir=template_mmcif_dir,
max_template_date=FLAGS.max_template_date,
output_dir=FLAGS.output_dir,
obsolete_pdbs_path=FLAGS.obsolete_pdbs_path
pipeline=pipeline,output_dir=FLAGS.output_dir
)
pickle.dump(m, open(f"{FLAGS.output_dir}/{m.description}.pkl", "wb"))
del m
Expand Down Expand Up @@ -264,8 +228,7 @@ def main(argv):
)
sys.exit()
else:

pipeline=None
pipeline = create_pipeline()
uniprot_runner=None
flags_dict=FLAGS.flag_values_dict()

Expand Down
Loading
Loading