Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSA tensor format #56

Open
panganqi opened this issue Mar 31, 2021 · 5 comments
Open

MSA tensor format #56

panganqi opened this issue Mar 31, 2021 · 5 comments

Comments

@panganqi
Copy link

panganqi commented Mar 31, 2021

In the Usage, there is code like
seq = torch.randint(0, 21, (1, 128)).cuda()
msa = torch.randint(0, 21, (1, 5, 64)).cuda().
If I have a a3m msa file, how to encode the file to this tensor? And why the seq length is 128 but the msa is 5 times 64 (5 timeshalf the length of seq?).
Could you give an example of how to use that or how to generate that msa tensor?

@lucidrains
Copy link
Owner

@panganqi Hi! I just wanted to demonstrate that the MSA and the primary sequence does not have to be the same length (although they would probably be aligned in practice)

The framework is in a good enough place that I'll start thinking about how to tackle data preprocessing! (I'd like to make it as seamless and easy as possible) How is the data laid out in your directory at the moment?

@lucidrains
Copy link
Owner

@panganqi are you working with templates by any chance?

@panganqi
Copy link
Author

panganqi commented Apr 1, 2021

@panganqi Hi! I just wanted to demonstrate that the MSA and the primary sequence does not have to be the same length (although they would probably be aligned in practice)

The framework is in a good enough place that I'll start thinking about how to tackle data preprocessing! (I'd like to make it as seamless and easy as possible) How is the data laid out in your directory at the moment?

I use the combined sidechainnet data which does not contain the MSA and we run hhblits on CASP data to get the MSA files. I want to combine those two to be a new dataset. And the MSA and the primary sequence are of the same length

@panganqi
Copy link
Author

panganqi commented Apr 1, 2021

@panganqi are you working with templates by any chance?

No, I'm working with Free Modelling mode

@lucidrains
Copy link
Owner

@panganqi do you want to chat about this in Discord? we have an alphafold2 channel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants