-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No crosslinks satisfied #9
Comments
I think this may be due to crosslinks that are provided as input as both A B and B A. When I normalize B A to be A B before generating the dictionary, I am getting the expected crosslink satisfaction (I believe it's the same prediction). |
I see that we can run into issues if you have a mix of A B and B A crosslinks. It should be fixed now. Thanks for reporting it! Note that at the moment we only report the inter-protein crosslink satisfaction (CA-CA). I should clarify it or report both. |
Great thanks, yes I also confirmed after more testing but was slow to write back. That makes sense as the % of crosslinks satisfied was still lower than expected when measuring. (not critical since we will generate a xl satisfaction report so we can see the satisfaction by xl input for the best model) edit: is it possible to describe how FDR has an impact? When I used a constant value across xls it does not seem to. We are exploring to use the upstream data to better assign confidence - should we expect this to have more of an impact if the FDR varies within? And finally, is there a way to set a given crosslink as a true restraint (I take it this was what AL1 did?). edit2: one more bug I believe I encountered - if only B-B crosslinks, the output shows similarly to no crosslinks provided. |
The FDR is included as a bias to allow the network to better weigh the information. It's rather hard to determine the impact of the FDR because there are many different factors at play (e.g., the co-evolutionary information) that influence the "final" weight/ likelihood of a crosslink. However, we have seen that with higher FDRs (ie 20%) the network gets a little more cautious which may result in lower crosslink satisfaction.
No, it's currently not possible to enforce a constraint. The v3 network has seen a very small amount of crosslinks with FDR 0 but my guess is setting the FDR to 0 for individual links will have a rather limited effect. In AL1 it is possible to force constraints to some degree with the distogram network. One thing that you could try, which is a little hacky but worked well for me with AL1, is to poke holes into the MSA for the crosslink you want to enforce. Say you have a crosslink at A i B j 0. Zero-out the MSA for A.feature.pkl.gz at (i-1) +- 2 and B.feature.pkl.gz (j-1) +- 2; same with uniprot.pkl.gz
I fixed a bug where homomeric crosslinks with small sequence separations (< 6AA) were skipped. Was this maybe the issue? Otherwise, it works fine for me. |
Hi all! Thank you for AlphaLink2, it is very useful tools for modeling. I had some problems for satisfying XL distances. I have a trimmer system A-2B with one XL linking structured regions and other two linking IDR with a structural one. In the first test using the complete system, I did not obtain the expected XL distances (XL distances ~ 50 A). So, I simplified the system using a dimer system A-B with only one XL (between structured regions). In a second test on the dimer system, I did zero-out the MSA on the features.pkl.gz and uniprot.pkl.gz files. In a third test, I also tested the alphalink2 cut-off. In all the test, I obtained very similar interfaces, with CA-CA distances close to 50 A. I would like to know if even forcing the XL through the removing the MSA information for the residues in the XL, it is possible not to satisfy the distance of 25 A in the XL. Is there some you suggest to test? Could you helping me? Thank you in advance Gabriel |
Hi Gabriel, Yes, it's possible that removing the MSA information for these particular residues doesn't force the constraint. So far I have only tested it with the distogram network in the monomer version of AlphaLink. That network was trained in a different way, which makes it possible in some cases. In general, since AlphaLink is integrative, it always takes into consideration all of the information (sequence, MSA, template, crosslinks), the other information might simply overpower the crosslink information. To truly force constraints, it would need to be enforced in the loss during training. Do your results vary between networks? The v3 network might work a little better for forcing constraints if you supply an FDR of 0. You could also try to increase the window size for removing the MSA information, e.g., up to +- 3 residues. What do you mean by you also tested the alphalink2 cut-off? We will hopefully release a distogram network for AlphaLink2 soon-ish which might work better for your use case. Your expected distance is < 25A? Removing the disordered parts was a good idea, AlphaFold and by extension, AlphaLink struggle a lot with this. |
Thank you for the quick answer!
Right! In the case on having just only inter-chain XL, when no XL was satisfied, could this suggest that the XL have a low probability to occur?
I perform the the following tests with only one XL with the full sequence: When I zero-out the MSA residues. I run the MSA, I zero-out the select residues and then I run the inference.py.
The inference.py have a option "--cutoff". I meant that option.
Yes, I expected that. Does it make sense?
I did: I all cases the XL distance is longer that 40 A. Do you have any other suggests ? Thank you in advance |
Hi, Sorry for the late response!
What does the prediction look like, just two chains floating in space? If the XL don't have any support in the MSAs, it might be hard to satisfy them. The distogram network allows to overconstrain in this case, it usually helps to bring the structures closer, but still might not be enough to build a proper interface. I will try to upload the network in the next two weeks.
This option only changes the cutoff of the satisfaction computation, but doesn't affect the actual prediction.
Yes, the networks expect < 25A.
Only to try again once the distogram network is uploaded and then overconstrain maybe with 10A. |
I met similar case that no crosslink was satisfied. My question is how to poke holes, or zero-out the MSA at specific positions for pkl.gz? Thanks !
Samuel |
Sorry, I haven't automated it. You would need to load and manipulate the feature files (same for uniprot). E.g.,
if you have crosslink A 5 B 10 you should put gaps at these specific positions in the MSA (gap = 21). Something like this:
Usually good to also put gaps in the other surrounding areas, e.g., +- 2 residues. How many crosslinks do you have and what crosslinker are you using? It's sometimes hard to overturn the MSA if you have insufficient crosslink density. |
Thanks !
I have tens of crosslinks between two subunits among four. As they did not work, I times them with 9, meaning that a CX of i-A to j-B was increased to 9, including each (i-1, i, i+1)-A to each (j-1, j, j+1)-B. However, the two subunits were still far from each other. |
What's the expected distance of your crosslinks? If there is no support in the MSA, sometimes all the network can do is bring them closer to the boundary (~25 A). |
The expected distances are different, 10 A ~ 35 A. |
Is there a difference between the v2 and v3 networks for AlphaLink2? |
Hello, I explored al2 with some of the test cases in the paper (Rpoa-Rpoc) with good results. With our own xl data I do not get any crosslinks satisfied (multimer prediction, v2 network, 3 conditions, each condition 20-30 xls). My impressions was that at least some of the crosslinks would be satisfied - no? I added FDR 0.05 to all xls.
edit: when I measure distances between xl residues of the prediction, I do see several are < 25A (I am measuring residues which may not be from Ca to Ca, but still)
The text was updated successfully, but these errors were encountered: