OpenfoldDoctor is an extension of the OpenFold project, designed to enhance protein folding simulations with advanced inspection capabilities. With OpenFold Doctor, users can gain deeper insights into the folding process through various export functionalities.
- Export intermediate protein structures: Capture and export all intermediate states of protein structures during the folding simulation.
- Folding progression movie: Generate a "movie" showing the trajectory of protein folding, providing a visual representation of the entire process.
- MSA and pair representations:
- MSA: Visualize Multiple Sequence Alignment (MSA) data through heatmaps.
- Pair representation: Export heatmaps of pairwise interactions at each iteration of the simulation.
- Attention mechanism visualization: Export heatmaps showing row-wise and column-wise attention.
To get started with OpenFold Doctor, follow these steps:
Begin by cloning the OpenFold repository from GitHub:
git clone https://github.com/KosinskiLab/openfold-doctor.git
Move into the cloned OpenFold Doctor directory, and switch to the dr-dev
branch:
cd openfold
git checkout dr-dev
Continue with the original OpenFold installation process (pl_upgrades
branch), however installing the conda environment as in the OpenFold Doctor environment.yml
file.
Ensure you follow all the steps outlined in the OpenFold installation guide to set up the environment correctly (e.g. using CUDA 12, as per the pl_upgrades
branch requirements) with the new dependencies introduced by OpenFold Doctor.
This section covers how to utilize all the extensions added to the pl_upgrades
branch of the official OpenFold repository through OpenfoldDoctor.
Description: Capture and save all intermediate protein structures generated during the inference process.
How to:
When executing the folding simulation, use the --intermediate_structures_export
flag to enable the export of intermediate structures in .cif
format.
python run_openfold.py [your usual openfold flags] --intermediate_structures_export
To generate a movie of the protein structure evolution during the inference, you can use the following additional flags:
--protein_movie
: enables the movie generation--frame_duration_seconds N
: duration of each frame in seconds (optional; default: 1)--low_res_movie
: exports low resolution png frames for a low resolution movie (optional; default: False)--keep_movie_data
: preserve all intermediate files used in the movie generation process, i.e., png frames and pdb files (optional; default: False)
Example:
python run_openfold.py [your usual openfold flags] --protein_movie --frame_durations_seconds 1.5 --low_res_movie --keep_movie_data
--intermediate_structures_export
can be omitted when using --protein_movie
.
Description: Capture the MSA and pair representations before and after they are processed by the evoformer stack. Export them as heatmaps, and optionally generate movies showing their evolution during the inference process.
How to:
When executing the folding simulation, use the --representation_export
flag to enable the export of intermediate structures as png heatmaps. The files will be saved in two separate folders -- msa
and pair
-- under your main output folder.
python run_openfold.py [your usual openfold flags] --representation_export
To generate a movie of the MSA and pair representation evolution during the inference, you can use the following additional flag:
--representations_movie
: enables the movie generation
The movies generated for the MSA and pair representation will be saved under the msa
and pair
folders, respectively, together with the heatmap png images.
1D3Z (Ubiquitin) MSA (top) and pair (bottom) representation heatmaps before and after the first two recycles. |
Description: Export a png chart of the MSA coverage.
How to: Use the --plot_msa_coverage
flag to save the MSA coverage plot in the alignment
folder under the main output folder.
python run_openfold.py [your usual openfold flags] --plot_msa_coverage
CASP14 target T1082 (left) and 1D3Z (Ubiquitin; right) MSA coverage plots |
Description: Export heatmaps of the row-wise and column-wise attention mechanisms.
How to: Use the --attention_export
flag to save the plots in the attn
folder under the main output folder.
python run_openfold.py [your usual openfold flags] --attention_export
CASP14 target T1082 MSA column-wise attention heatmaps |
Description: Export MSA fasta before and after filtering and embedding.
How to: Use the --msa_fasta_export
flag to save the fasta files in the alignment
folder under the main output folder.
python run_openfold.py [your usual openfold flags] --msa_fasta_export
python run_pretrained_openfold.py \
$INPUT_FASTA_DIR \
$TEMPLATE_MMCIF_DIR \
--output_dir $OUTPUT_DIR \
--config_preset model_1_ptm \
--uniref90_database_path $BASE_DATA_DIR/uniref90/uniref90.fasta \
--mgnify_database_path $BASE_DATA_DIR/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path $BASE_DATA_DIR/pdb70/pdb70 \
--uniclust30_database_path $BASE_DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--bfd_database_path $BASE_DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--model_device "cuda:0" \
--hhblits_binary_path $HHSUITE_BASE_DIR/hhblits \
--hhsearch_binary_path $HHSUITE_BASE_DIR/hhsearch \
--kalign_binary_path $KALIGN_PATH \
--save_outputs \
--cif_output \
--skip_relaxation \
--protein_movie \
--frame_duration_seconds 0.5 \
--low_res_movie \
--keep_movie_data \
--representation_movie \
--attention_export \
--plot_msa_coverage \
--msa_fasta_export
- OpenFold repository: https://github.com/aqlaboratory/openfold
- OpenFold Doctor releases: https://github.com/KosinskiLab/openfold/releases
- Issues and support: If you encounter any issues, feel free to open an issue on the OpenfoldDoctor GitHub repository.
Contributions are welcome! Please fork the repository and submit a pull request with your enhancements or bug fixes.