Make all processes use identical FFTW plans #203
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When using
flags=FFTW.MEASURE
, FFTW does some run-time tests to choose the fastest version of its algorithm to use. This can lead to different processes using slightly different algorithms - these should only differ by machine-precision-level rounding errors but (surprisingly!) this may be important. This PR updates so that the FFTW plans are generated on the root process, then written out to an 'FFTW wisdom' file which is read by all other processes, ensuring that all processes use exactly the same algorithm.While working on the kinetic electrons, I came across a very strange bug. A simulation that would run correctly on a small number of processes (e.g. 8, with the z-direction split into 8 blocks using distributed-memory, so each shared-memory block only has one process), but would fail on a large number of processes (64 or 128, so each shared memory block has 8 or 16 processes). While trying to debug, I found out that FFTW can give slightly different results between runs when using
FFTW_MEASURE
. To aid comparing results between runs, I switched toFFTW.ESTIMATE
(which is the same every time) and the bug went away!I don't understand why this fixed the bug. The different FFTW algorithms should only be different by machine-precision level rounding errors, so I don't understand how this can cause a numerical instability. My only guesses are that either the slight inconsistency does somehow make a difference, or that when doing the run-time testing somehow multiple instances of FFTW conflict with each other and subtly corrupt something, eventually leading (after several thousand pseudo-timesteps!) to a failure to converge.
Anyway, making this 'fix' so that the run-time testing is done only on one process, then passed to all the others so that they all use the exact same algorithm, that bug has gone away, so I think this fix is useful.
I'm not 100% sure the problem is entirely fixed yet, because even with this fix the simulation I'm testing fails to converge, although at a much later time...