Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fetching from the AlphaFold structure database #492

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dacarlin
Copy link

This PR adds the ability to fetch databases from the AlphaFold structure database by querying on UniProt accession

  • new functionality in biotite.database.alphafold module for fetching structures via UniProt accession
  • new tests in tests/database/test_alphafold.py
  • [to do] documentation in the user guide
  • [to do] example in the examples folder

Copy link
Member

@padix-key padix-key left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for implementing this long sought feature request. I know this is only a draft, but I had some time for review, so I can already share some thoughts 😃.

@@ -0,0 +1,34 @@
name: Python Package using Conda
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part of the PR is out of scope. Furthermore, testing a conda build is already done in the Biotite feedstock in conda-forge. I am in favor of flake8 formatting but to me it is another topic, especially I would prefer to keep it strict and require proper flake8 formatting in all source files. Hence, this would require some effort to fix the flake8 findings in all source files.

_fetch_url = "https://alphafold.com/api/prediction"


def fetch(ids, target_path=None, format="pdb", overwrite=False, verbose=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This purely optional, but I think it would be nice to directly support the Alphafold identifiers as well. Furthermore, accepting the Alphafold identifiers given from the PDB computational models would even enable synergy between both, the rcsb and alphafold interfaces.

While using official Alphafold identifers should be straight forward (only the v4 would need to be updated when a new version appears), using the identifier from the PDB would require some string editing. Taken from the PDB documentation:

Each CSM is assigned a specific ID in its source database and a prefix indicates the source of the model (“AF” for AlphaFold DB, "MA" for ModelArchive). AlphaFold DB identifiers are then followed by the UniProt accession number for the protein and by the fragment number (usually “F1”). However, in order to enable compatibility of the IDs with many of our services, including all of our APIs and visualization tools, we identify CSMs on RCSB.org using a modified version of the ID. This ID is used on the structure summary page, in searching for structures, in the search results page, and in various tools for 3D structure visualization and analysis. For example, for the AlphaFold structure AF-B3EWR1-F1, the RCSB.org assigned CSM ID is AF_AFB3EWR1F1 and is used in the query results page as shown in Figure 4.

Which type of identifier is present can be detected from whether it starts with AF_ (PDB), AF- (Alphafold DB) or something else (Uniprot).

I would let you decide if we want this feature directly from the beginning

Comment on lines +118 to +121




Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines +68 to +70



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@padix-key padix-key marked this pull request as ready for review August 26, 2023 16:59
@padix-key padix-key marked this pull request as draft August 26, 2023 17:03
@padix-key
Copy link
Member

Hi, are you planning to continue work on this PR? If not, I could finish the AFDB interface.

@dacarlin
Copy link
Author

dacarlin commented Dec 9, 2024

Would be amazing if you’d finish it up @padix-key

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants