Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'MDAnalysis.analysis.density' parallelization #4729

Merged
merged 12 commits into from
Dec 18, 2024

Conversation

talagayev
Copy link
Member

@talagayev talagayev commented Oct 8, 2024

Fixes #4677

Changes made in this Pull Request:

  • added backends and aggregators to DensityAnalysis in analysis.density
  • added client_DensityAnalysis in conftest.py
  • added client_DensityAnalysis to the tests in test_density.py

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4729.org.readthedocs.build/en/4729/

added backend and aggregator
Added client_DensityAnalysis
Added  client_DensityAnalysis to tests
@pep8speaks
Copy link

pep8speaks commented Oct 8, 2024

Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 426:66: W291 trailing whitespace
Line 427:68: W291 trailing whitespace
Line 428:36: W291 trailing whitespace

Comment last updated at 2024-12-18 19:05:59 UTC

Copy link

github-actions bot commented Oct 8, 2024

Linter Bot Results:

Hi @talagayev! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/11265261686/job/31326801780


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

@yuxuanzhuang yuxuanzhuang self-requested a review October 10, 2024 00:07
Copy link

codecov bot commented Oct 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.63%. Comparing base (a3672f2) to head (506597d).
Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4729      +/-   ##
===========================================
- Coverage    93.65%   93.63%   -0.03%     
===========================================
  Files          177      189      +12     
  Lines        21774    22845    +1071     
  Branches      3064     3064              
===========================================
+ Hits         20393    21391     +998     
- Misses         929     1002      +73     
  Partials       452      452              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yuxuanzhuang
Copy link
Contributor

Three aspects in my fix.

@orbeckst
Copy link
Member

@yuxuanzhuang could you please raise issues for the updates to the parallelization documentation (#4729 (comment)) — I think that's very important to make clear, given that we are now repeatedly running in these issues. Thank you!

@marinegor
Copy link
Contributor

@yuxuanzhuang thanks, the illustration is indeed wrong, as well as the accompanying text. I fixed them in #4760

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we've been figuring out: data structures need to be laid out in __init__, thus the size of _grid needs to be set up before _prepare (which is executed in each worker process).

@@ -465,7 +470,7 @@ def _prepare(self):
grid, edges = np.histogramdd(np.zeros((1, 3)), bins=bins,
range=arange, density=False)
grid *= 0.0
self._grid = grid
self.results._grid = grid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall grid shape needs to be determined in __init__, then each worker can have its own.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, yes that makes sense.

I think there was also an additional case, where I encountered this problem with the parallelization;
I will try to apply the same fix there to see if it works.

Other than that I added the Changelog and the docs, so should be ready to go :)

@orbeckst orbeckst self-assigned this Dec 17, 2024
@talagayev talagayev marked this pull request as ready for review December 18, 2024 01:01
@talagayev talagayev requested a review from orbeckst December 18, 2024 01:05
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine.

I am adding a single comment, otherwise my comments on the code are just observations.

@@ -412,7 +420,6 @@ def __init__(self, atomgroup, delta=1.0,
self._ydim = ydim
self._zdim = zdim

def _prepare(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the right change.

I am noticing that the necessary changes for parallelization really shift a lot of what _prepare() was doing back into __init__. I guess that's ok and it's better than introducing yet another method like _prepare_init() ... We probably need to review the docs for writing analysis code with AnalysisBase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment so that future readers know why we are doing everything here
and not in _prepare().

Suggested change
def _prepare(self):
# The grid with its dimensions has to be set up in __init__
# so that parallel analysis works correctly: each process
# needs to have a `results._grid` of the same size and the
# same `_bins`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manually added to the file, suggestions don't work on deleted lines

assert D.results.density.grid.shape == (8, 12, 17)

def test_warn_userdefn_padding(self, universe):
def test_warn_userdefn_padding(self, universe, client_DensityAnalysis):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure if we need to test warnings for all parallel modalities, too, but it's the safe approach and you've already converted all tests so let's leave it in. Just something to think about in the future if our test run times are too long (again...).

@orbeckst orbeckst merged commit ac145ec into MDAnalysis:develop Dec 18, 2024
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MDAnalysis.analysis.density: Implement parallelization or mark as unparallelizable
5 participants