-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 3659 fixed loading for MOL2 and PDB edge cases #3662
base: develop
Are you sure you want to change the base?
Conversation
Hello @zwsmith200! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2023-11-07 19:25:28 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on the developer mailing list so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS
as part of this PR.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #3662 +/- ##
==========================================
Coverage 93.37% 93.37%
==========================================
Files 170 184 +14
Lines 22295 23465 +1170
Branches 4075 4084 +9
==========================================
+ Hits 20818 21911 +1093
- Misses 962 1036 +74
- Partials 515 518 +3
☔ View full report in Codecov by Sentry. |
Welcome to contributing to MDAnalysis! Thanks for the PR. Sorry, I don't have time to review, I just wanted to ask if this PR is going to fix #3659 ? If so, please enter the issue number in the |
Nevermind, I better understand the issue now. |
The scope has expanded slightly since the initial issue. Originally I was concerned that there were different behaviours when reading PDB and MOL2, namely the MOL2 format deleted the resname with a repeated resid while PDB did not. I then went to re-use In light of those findings, I wrote a squash function that is robust to both repeated resids with different attributes and non-contiguous residues. I have applied this change to the MOL2 and PDB parsers but I suspect experts in other formats may want to take a look at whether similar unit tests are needed. |
I need to double check this, hopefully in the next few days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zwsmith200 thanks for putting this together. I think it's an improvement for a lot of cases, but I can foresee cases where it changes behaviour, e.g. if the resids wrapped around the max value and the resname happened to be the same by bad luck then this would be considered the same residue. I think it might be better to allow this new residue grouping strategy via a keyword argument to Universe, which is passed to the parser.
package/MDAnalysis/topology/base.py
Outdated
Arguments | ||
--------- | ||
squash_attributes - list of attribute arrays (attributes used to | ||
identify the parent) | ||
*other_attributes - other arrays that need to follow the sorting of ids | ||
|
||
Returns | ||
------- | ||
child_parents_idx - an array of len(child) which points to the index of | ||
parent | ||
parent_combos - len(parent) of the unique combinations | ||
squashed_attrs - len(parent) of the attributes used for squashing | ||
*other_attrs - len(parent) of the other attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this follow the doc style a bit better? Usually it's type annotation on the same line as the variable name and description on the following line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, I have actually run into an example of this using mol2 generated from pdb in my work because it lost chain information. I’ll add a test for wrapped resids.
@zwsmith200 is this still something you are looking to contribute? |
Yes, I can finish cleaning this up. I got it working for my now-completed project then forgot about the PR. |
37c2acd
to
38a2d4b
Compare
Linter Bot Results:Hi @zwsmith200! Thanks for making this PR. We linted your code and found the following: Some issues were found with the formatting of your code.
Please have a look at the Please note: The |
I realize darker used max length of 88 and we use 79. It may be useful to add -l 79 here so that copying the command matches our requirements. |
@richardjgowers @IAlibay It looks like I have addressed everything I needed to, let me know if I missed something. |
Fixes #3659
Changes made in this Pull Request:
squash_by_attributes
that uses unique combinations of attributes to determine residuesMOL2Parser
andPDBParser
now use this function to avoid issues when loading non-unique resids and non-contiguous residues (Inconsistent resnames when loading pdb and mol2 with repeat resnums #3659)PR Checklist