Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make all processes use identical FFTW plans #203

Merged
merged 3 commits into from
Apr 23, 2024

Conversation

johnomotani
Copy link
Collaborator

When using flags=FFTW.MEASURE, FFTW does some run-time tests to choose the fastest version of its algorithm to use. This can lead to different processes using slightly different algorithms - these should only differ by machine-precision-level rounding errors but (surprisingly!) this may be important. This PR updates so that the FFTW plans are generated on the root process, then written out to an 'FFTW wisdom' file which is read by all other processes, ensuring that all processes use exactly the same algorithm.

While working on the kinetic electrons, I came across a very strange bug. A simulation that would run correctly on a small number of processes (e.g. 8, with the z-direction split into 8 blocks using distributed-memory, so each shared-memory block only has one process), but would fail on a large number of processes (64 or 128, so each shared memory block has 8 or 16 processes). While trying to debug, I found out that FFTW can give slightly different results between runs when using FFTW_MEASURE. To aid comparing results between runs, I switched to FFTW.ESTIMATE (which is the same every time) and the bug went away!
I don't understand why this fixed the bug. The different FFTW algorithms should only be different by machine-precision level rounding errors, so I don't understand how this can cause a numerical instability. My only guesses are that either the slight inconsistency does somehow make a difference, or that when doing the run-time testing somehow multiple instances of FFTW conflict with each other and subtly corrupt something, eventually leading (after several thousand pseudo-timesteps!) to a failure to converge.
Anyway, making this 'fix' so that the run-time testing is done only on one process, then passed to all the others so that they all use the exact same algorithm, that bug has gone away, so I think this fix is useful.

I'm not 100% sure the problem is entirely fixed yet, because even with this fix the simulation I'm testing fails to converge, although at a much later time...

@johnomotani johnomotani added the bug Something isn't working label Apr 15, 2024
@johnomotani johnomotani force-pushed the fftw-plan-consistency branch from 5a3f5fa to 2c744a9 Compare April 15, 2024 17:14
Useful to ensure that any function that needs to write into `output_dir`
is able to do so.
@johnomotani johnomotani force-pushed the fftw-plan-consistency branch from 2c744a9 to 81daf5b Compare April 15, 2024 21:53
@mrhardman
Copy link
Collaborator

A much simpler solution could be to use the matrix formulation for the Chebyshev derivatives, which wouldn't require an FFT at all. This exists for the moment only for the Gauss Lobatto elements (the Radau elements use the FFT to compute the matrix), but this could be fixed. You could try using the Gauss-Legendre grid for your physics problem to test if the FFT is in any way connected to the loss of convergence of the simulation. If 1D interpolation is required for the Gauss-Legendre grid to be usable, I can provide that routine after a short discussion to determine where it is needed.

...to ensure all processes have identical FFTW plans for consistency.
Makes it more convenient to change to, for example, FFTW.ESTIMATE if
that is desired. No run-time option to change the FFTW flag is provided
because there is not a convenient struct to put the option in (although
that could be done without a lot of effort).
@johnomotani johnomotani force-pushed the fftw-plan-consistency branch from 81daf5b to b3c5b65 Compare April 17, 2024 09:21
@johnomotani johnomotani merged commit d15d648 into master Apr 23, 2024
17 checks passed
@johnomotani johnomotani deleted the fftw-plan-consistency branch April 23, 2024 20:30
@johnomotani johnomotani mentioned this pull request Apr 24, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants