No crosslinks satisfied #9

roivant-matts · 2023-07-24T14:21:19Z

Hello, I explored al2 with some of the test cases in the paper (Rpoa-Rpoc) with good results. With our own xl data I do not get any crosslinks satisfied (multimer prediction, v2 network, 3 conditions, each condition 20-30 xls). My impressions was that at least some of the crosslinks would be satisfied - no? I added FDR 0.05 to all xls.

edit: when I measure distances between xl residues of the prediction, I do see several are < 25A (I am measuring residues which may not be from Ca to Ca, but still)

roivant-matts · 2023-07-24T22:12:41Z

I think this may be due to crosslinks that are provided as input as both A B and B A. When I normalize B A to be A B before generating the dictionary, I am getting the expected crosslink satisfaction (I believe it's the same prediction).

lhatsk · 2023-07-25T09:48:15Z

I see that we can run into issues if you have a mix of A B and B A crosslinks. It should be fixed now. Thanks for reporting it!

Note that at the moment we only report the inter-protein crosslink satisfaction (CA-CA). I should clarify it or report both.

roivant-matts · 2023-07-25T13:40:45Z

Great thanks, yes I also confirmed after more testing but was slow to write back. That makes sense as the % of crosslinks satisfied was still lower than expected when measuring. (not critical since we will generate a xl satisfaction report so we can see the satisfaction by xl input for the best model)

edit: is it possible to describe how FDR has an impact? When I used a constant value across xls it does not seem to. We are exploring to use the upstream data to better assign confidence - should we expect this to have more of an impact if the FDR varies within?

And finally, is there a way to set a given crosslink as a true restraint (I take it this was what AL1 did?).

edit2: one more bug I believe I encountered - if only B-B crosslinks, the output shows similarly to no crosslinks provided.

lhatsk · 2023-07-25T18:18:53Z

edit: is it possible to describe how FDR has an impact? When I used a constant value across xls it does not seem to. We are exploring to use the upstream data to better assign confidence - should we expect this to have more of an impact if the FDR varies within?

The FDR is included as a bias to allow the network to better weigh the information. It's rather hard to determine the impact of the FDR because there are many different factors at play (e.g., the co-evolutionary information) that influence the "final" weight/ likelihood of a crosslink. However, we have seen that with higher FDRs (ie 20%) the network gets a little more cautious which may result in lower crosslink satisfaction.

And finally, is there a way to set a given crosslink as a true restraint (I take it this was what AL1 did?).

No, it's currently not possible to enforce a constraint. The v3 network has seen a very small amount of crosslinks with FDR 0 but my guess is setting the FDR to 0 for individual links will have a rather limited effect. In AL1 it is possible to force constraints to some degree with the distogram network.

One thing that you could try, which is a little hacky but worked well for me with AL1, is to poke holes into the MSA for the crosslink you want to enforce. Say you have a crosslink at A i B j 0. Zero-out the MSA for A.feature.pkl.gz at (i-1) +- 2 and B.feature.pkl.gz (j-1) +- 2; same with uniprot.pkl.gz

edit2: one more bug I believe I encountered - if only B-B crosslinks, the output shows similarly to no crosslinks provided.

I fixed a bug where homomeric crosslinks with small sequence separations (< 6AA) were skipped. Was this maybe the issue? Otherwise, it works fine for me.

gabrieliacc · 2023-08-21T12:52:35Z

Hi all! Thank you for AlphaLink2, it is very useful tools for modeling.

I had some problems for satisfying XL distances. I have a trimmer system A-2B with one XL linking structured regions and other two linking IDR with a structural one.

In the first test using the complete system, I did not obtain the expected XL distances (XL distances ~ 50 A). So, I simplified the system using a dimer system A-B with only one XL (between structured regions). In a second test on the dimer system, I did zero-out the MSA on the features.pkl.gz and uniprot.pkl.gz files. In a third test, I also tested the alphalink2 cut-off. In all the test, I obtained very similar interfaces, with CA-CA distances close to 50 A.

I would like to know if even forcing the XL through the removing the MSA information for the residues in the XL, it is possible not to satisfy the distance of 25 A in the XL. Is there some you suggest to test? Could you helping me?

Thank you in advance

Gabriel

lhatsk · 2023-08-22T08:40:29Z

Hi Gabriel,

Yes, it's possible that removing the MSA information for these particular residues doesn't force the constraint. So far I have only tested it with the distogram network in the monomer version of AlphaLink. That network was trained in a different way, which makes it possible in some cases. In general, since AlphaLink is integrative, it always takes into consideration all of the information (sequence, MSA, template, crosslinks), the other information might simply overpower the crosslink information. To truly force constraints, it would need to be enforced in the loss during training.

Do your results vary between networks? The v3 network might work a little better for forcing constraints if you supply an FDR of 0. You could also try to increase the window size for removing the MSA information, e.g., up to +- 3 residues.

What do you mean by you also tested the alphalink2 cut-off?

We will hopefully release a distogram network for AlphaLink2 soon-ish which might work better for your use case.

Your expected distance is < 25A?

Removing the disordered parts was a good idea, AlphaFold and by extension, AlphaLink struggle a lot with this.

gabrieliacc · 2023-08-24T13:55:42Z

Thank you for the quick answer!

Yes, it's possible that removing the MSA information for these particular residues doesn't force the constraint. So far I have only tested it with the distogram network in the monomer version of AlphaLink. That network was trained in a different way, which makes it possible in some cases. In general, since AlphaLink is integrative, it always takes into consideration all of the information (sequence, MSA, template, crosslinks), the other information might simply overpower the crosslink information. To truly force constraints, it would need to be enforced in the loss during training.

Right! In the case on having just only inter-chain XL, when no XL was satisfied, could this suggest that the XL have a low probability to occur?

Do your results vary between networks? The v3 network might work a little better for forcing constraints if you supply an FDR of 0. You could also try to increase the window size for removing the MSA information, e.g., up to +- 3 residues.

I perform the the following tests with only one XL with the full sequence:
-v2
-v3
-v3 with zero-out up to +-3

When I zero-out the MSA residues. I run the MSA, I zero-out the select residues and then I run the inference.py.

What do you mean by you also tested the alphalink2 cut-off?

The inference.py have a option "--cutoff". I meant that option.

We will hopefully release a distogram network for AlphaLink2 soon-ish which might work better for your use case.

Your expected distance is < 25A?

Yes, I expected that. Does it make sense?

Removing the disordered parts was a good idea, AlphaFold and by extension, AlphaLink struggle a lot with this.

I did:
-v2 without IDR
-v3 without IDR
-v2 without IDR and with zero-out up to +-3
-v3 without IDR and with zero-out up to +-3

I all cases the XL distance is longer that 40 A.

Do you have any other suggests ?

Thank you in advance

lhatsk · 2023-09-11T07:37:28Z

Hi,

Sorry for the late response!

Right! In the case on having just only inter-chain XL, when no XL was satisfied, could this suggest that the XL have a low probability to occur?

What does the prediction look like, just two chains floating in space? If the XL don't have any support in the MSAs, it might be hard to satisfy them. The distogram network allows to overconstrain in this case, it usually helps to bring the structures closer, but still might not be enough to build a proper interface. I will try to upload the network in the next two weeks.

The inference.py have a option "--cutoff". I meant that option.

This option only changes the cutoff of the satisfaction computation, but doesn't affect the actual prediction.

We will hopefully release a distogram network for AlphaLink2 soon-ish which might work better for your use case.
Your expected distance is < 25A?

Yes, I expected that. Does it make sense?

Yes, the networks expect < 25A.

I all cases the XL distance is longer that 40 A.

Do you have any other suggests ?

Only to try again once the distogram network is uploaded and then overconstrain maybe with 10A.

Samuel-gwb · 2023-12-05T14:06:54Z

I met similar case that no crosslink was satisfied. My question is how to poke holes, or zero-out the MSA at specific positions for pkl.gz? Thanks !

One thing that you could try, which is a little hacky but worked well for me with AL1, is to poke holes into the MSA for the crosslink you want to enforce. Say you have a crosslink at A i B j 0. Zero-out the MSA for A.feature.pkl.gz at (i-1) +- 2 and B.feature.pkl.gz (j-1) +- 2; same with uniprot.pkl.gz

Samuel

lhatsk · 2023-12-13T10:35:55Z

Sorry, I haven't automated it. You would need to load and manipulate the feature files (same for uniprot). E.g.,

A = pickle.load(gzip.open('A.feature.pkl.gz','rb'))
B = pickle.load(gzip.open('B.feature.pkl.gz','rb'))

if you have crosslink A 5 B 10 you should put gaps at these specific positions in the MSA (gap = 21). Something like this:

A['msa'][1:,5-1] = 21
B['msa'][1:,10-1] = 21

Usually good to also put gaps in the other surrounding areas, e.g., +- 2 residues.

How many crosslinks do you have and what crosslinker are you using? It's sometimes hard to overturn the MSA if you have insufficient crosslink density.

Samuel-gwb · 2023-12-13T11:36:03Z

Thanks !

How many crosslinks do you have and what crosslinker are you using? It's sometimes hard to overturn the MSA if you have insufficient crosslink density.

I have tens of crosslinks between two subunits among four. As they did not work, I times them with 9, meaning that a CX of i-A to j-B was increased to 9, including each (i-1, i, i+1)-A to each (j-1, j, j+1)-B. However, the two subunits were still far from each other.

lhatsk · 2023-12-13T14:48:13Z

What's the expected distance of your crosslinks? If there is no support in the MSA, sometimes all the network can do is bring them closer to the boundary (~25 A).

Samuel-gwb · 2023-12-13T23:21:36Z

The expected distances are different, 10 A ~ 35 A.
I've previously got a model from AF2.2-multimer using no crosslinks, with low confidence, that satisfies most of the crosslinks. AF2.3-multimer predicts a looser model.
Then I tried AL2 using these crosslinks to see if I can got a better model, getting results we are disscussing about.

lhatsk · 2023-12-14T10:29:16Z

Is there a difference between the v2 and v3 networks for AlphaLink2?

lhatsk mentioned this issue Jul 10, 2024

MSA Subsampling Feature #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No crosslinks satisfied #9

No crosslinks satisfied #9

roivant-matts commented Jul 24, 2023 •

edited

Loading

roivant-matts commented Jul 24, 2023

lhatsk commented Jul 25, 2023 •

edited

Loading

roivant-matts commented Jul 25, 2023 •

edited

Loading

lhatsk commented Jul 25, 2023 •

edited

Loading

gabrieliacc commented Aug 21, 2023 •

edited

Loading

lhatsk commented Aug 22, 2023

gabrieliacc commented Aug 24, 2023

lhatsk commented Sep 11, 2023

Samuel-gwb commented Dec 5, 2023

lhatsk commented Dec 13, 2023 •

edited

Loading

Samuel-gwb commented Dec 13, 2023 •

edited

Loading

lhatsk commented Dec 13, 2023

Samuel-gwb commented Dec 13, 2023

lhatsk commented Dec 14, 2023

No crosslinks satisfied #9

No crosslinks satisfied #9

Comments

roivant-matts commented Jul 24, 2023 • edited Loading

roivant-matts commented Jul 24, 2023

lhatsk commented Jul 25, 2023 • edited Loading

roivant-matts commented Jul 25, 2023 • edited Loading

lhatsk commented Jul 25, 2023 • edited Loading

gabrieliacc commented Aug 21, 2023 • edited Loading

lhatsk commented Aug 22, 2023

gabrieliacc commented Aug 24, 2023

lhatsk commented Sep 11, 2023

Samuel-gwb commented Dec 5, 2023

lhatsk commented Dec 13, 2023 • edited Loading

Samuel-gwb commented Dec 13, 2023 • edited Loading

lhatsk commented Dec 13, 2023

Samuel-gwb commented Dec 13, 2023

lhatsk commented Dec 14, 2023

roivant-matts commented Jul 24, 2023 •

edited

Loading

lhatsk commented Jul 25, 2023 •

edited

Loading

roivant-matts commented Jul 25, 2023 •

edited

Loading

lhatsk commented Jul 25, 2023 •

edited

Loading

gabrieliacc commented Aug 21, 2023 •

edited

Loading

lhatsk commented Dec 13, 2023 •

edited

Loading

Samuel-gwb commented Dec 13, 2023 •

edited

Loading