-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'Alternating direction implicit' preconditioner #297
Conversation
Cleaner, and might help compile-time.
Hopefully helps the compiler to remove unneeded branches at compile time.
Hopefully improves compile/run time.
Also provide positional-arguments-only form of `steady_state_residuals()`, which helps const-propagation and therefore type stability (the compiler knows that the function returns a Vector when `only_max_abs` is passed a const `true`).
Declare the type of the thing returned by `MPI.Bcast()` to avoid type instability.
After @BoundsCheck, need to actually throw an error.
Means less hacking to do if we ever do want to change (temporarily or permanently) `mk_float` to a different type.
A variation on the 'alternating direction implicit' (ADI) method might be useful as a preconditioner. It will require split Jacobians where some 'implicit' parts only couple a subset of dimensions (i.e. z- or velocity-dimensions), while 'explicit' parts may couple all dimensions (but may also have some terms removed to make the matrix more sparse for numerical efficiency).
Avoid using more than two Jacobian-sized buffer arrays at any time in the Jacobian matrix tests. Using more than this would cause the Github Actions CI servers to run out of shared memory, causing an error.
Add a preconditioner for kinetic electrons using a variation on the 'alternating direction implicit' (ADI) method. Terms that couple velocity space are first solved implicitly (over vpa only for the 1V case), with terms with z-coupling treated 'explicitly' by being subtracted from the right-hand-side; then terms that couple in z are solved implicitly, with terms with v-coupling treated 'explicitly'. The two steps may be iterated more times if this is needed.
If delta_x is all-zero, then P^-1.delta_x is also all-zero, so no need to evaluate it.
One iteration of ADI preconditioning seems to be enough to make the JFNK solver for kinetic electrons converge. The number of linear (Krylov) iterations required increases slightly, but overall this should reduce the computational cost.
When not parallelising using shared memory, there is no need to split the preconditioner and the LU preconditioner should be the most efficient. Therefore use the LU precon in serial, and use ADI only when `block_size[] > 1`.
Indexing error meant that half the time the cutoff did not interpolate smoothly between grid points.
Will allow the calculation of these parameters to be reused when calculating a Jacobian matrix for the wall bc.
Where integrals over the into-the-sheath part of the distribution function are split into two parts at +/- vcut, modify the way that the split is done so that the 'part 2' integral between 0 and +/-vcut is calculated the same way as an integral over the out-from-the-sheath part of the distribution which is cut off at -/+vcut. Not sure if this is necessary, but seems nicer to be more consistent.
If the prefactor that sets the correction terms to be proportional to vpa^2 near vpa=0 is too broad, then it is hard for the correction terms to fix errors in low moments (e.g. density moment), so making it a bit narrower reduces the size of the coefficients of the correction terms.
If phi_wall=0, giving vcut=0, then it is not possible to apply moment constraints, which causes an error due to a singular matrix. This case is unphysical, so should not ever be a converged solution (vcut=0 corresponds to a fully electron-absorbing sheath). To avoid the problem, check if vcut==0, and if so set vcut to some small value (we choose the value at the next grid point after the one closest to 0). [skip ci]
By default (if `implicit_electron_ppar = true`), use LU when `block_size[] == 1` or ADI otherwise, but now can pass `implicit_electron_ppar = "lu"` or `implicit_electron_ppar = "adi"` to pick the precoditioner type explicitly.
Hoped this might help find a bug, but did not help with that. Do not know any particular reason to update, but might as well keep up to date.
Previous scheme just interpolated using existing values in whole element containing zero. This meant that the result of the boundary condition depended on some points that are overwritten by the boundary condition. Improve on this by doing special interpolation in the element containing zero. Instead of using the usual `interpolate_to_grid_1d!()` function, use `interpolate_symmetric!()` which does an interpolation that is forced to be symmetric around v_parallel=0, as the interpolation polynomial is a polynomial in `v_parallel^2` (the interpolating polynomial is constructed using a Lagrange polynomial method). The inputs to the interpolation are now just the function values on grid points (within the element containing zero) that are not set by the boundary condition. Also optimises the interpolation of the points in the elements not containing zero by restricting the interpolation to just the points needed for output, instead of interpolating to the full reversed grid.
Skipping the single update of vcut, which was previously done even when `bc_constraints=false` was passed to `enforce_boundary_condition_on_electron_pdf!()` makes it possible to match the result with an interpolation matrix that does not couple (due to the integral nature of the update of vcut) every point in the vpa grid.
Reduces code duplication.
Improves convergence a bit, allowing electron solver to take larger pseudo-timesteps, in at least one case.
Should reduce allocations.
Hopefully reduce allocations. Maybe save a little compilation time by removing some unnecessary `@views`.
Several small updates. In particular need to be careful with variables that are captured by a locally-defined function like `residual_func!()`.
Input parameters that control maximum number of pseudotimesteps and maximum total pseudotime for each kinetic electron pseudotimestepping loop.
Helps if something didn't previously clean up the timers.
This PR is a draft for the moment, because I want to test whether it is possible and useful to add communication between different distributed-MPI blocks after each ADI iteration - could doing that and taking 2 or 3 ADI iterations speed up the JFNK convergence? |
Initial performance testing for a 1D1V kinetic electron simulation on ARCHER2 with 32 elements in the
This seems to agree with what we would expect. The ADI solver takes a few more iterations because it is only an approximate matrix inverse, whereas the LU is exact (within each block), but the LU solve does not scale with the number of shared-memory cores, whereas the ADI solve does - with almost perfect scaling as 6.19/0.397=15.6 times speed-up with 16 times more cores. Suspect the run time for the 8 core ADI may be slightly high due to random fluctuation as based on the iteration count we would expect more like 90 s. This might make the perfect scaling mentioned above over-optimistic, depending whether the random slowdown happened (if it happened at all) during a call to the preconditioner or while the code was somewhere else. |
53c4bdc
to
db88708
Compare
This boundary condition function should act only at one `ir`, which is passed as an argument, so there should be no loop over `ir`. This bug has not impacted simulations so far because we have only used r.n=0 for kinetic electrons so far.
Can (?) skip the problematic solve, that was run to update the electron shape function without updating the electron pressure, on the 'explicit first stage' of ESDIRK schemes. This might have the effect of reducing the order of accuracy of the scheme somehow, as the qpar_e used for the 'explicit' calculation of the time derivative of electron_ppar is taken from the most recent implicit solve rather than a solve updated with the new ion/electron profiles. However, the change is probably small (?) and at least the solver does run now - it is useful to have an adaptive ion timestep as it may let the code recover by reducing the ion timestep when an electron implicit solve fails to converge.
When running kinetic electron simulations, it can cause problems to take a very short ion timestep. When writing outputs at exactly fixed output times, this can happen if the previous timestep happened to end just before the output time. To avoid the very short step default to writing output at whatever time the end of the timestep is that exceeds the set output time. There is an option to force the previous behaviour of a decreased timestep so that output is written exactly at the nominal output time.
This seems to avoid compiler errors on macOS. No idea why this should help, or what the original problem was!
This test is extremely slow when run in parallel on macOS (maybe the macOS servers on Github Actions don't have enough memory?), so skip it in this case to avoid test failures.
This might help to remove some of the performance loss due to inverting the preconditioner separately in each distributed-MPI block.
db88708
to
bda160b
Compare
The preconditioner within the Jacobian-free Newton Krylov (JFNK) solver used for implicit timestepping needs to 'invert a matrix'. The initial implementation does this by an LU-factorisation method, using Julia's LinearAlgebra package. This LU solver is serial. Some parallelisation can be achieved by constructing separate preconditioners in each distributed-MPI block, at the cost of dropping the elements of the preconditioner that would couple different blocks, but our shared-memory parallelism cannot be exploited by the LU solver, creating a bottleneck where for a 1D1V test with 32 elements in the z-direction, no speed up can be gained by using 128 cores on ARCHER2 compared to 8 cores.
An alternative which is less accurate but more parallelisable is based on the 'alternating direction implicit' (ADI) approach. The aim is to solve
for$\delta x$ .
In the ADI approach we:
* First identify all the terms that only couple in$w_\parallel$ . Split the Jacobian matrix into an 'implicit' matrix $J_{I,w}$ containing the terms that couple in $w_\parallel$ - which is block diagonal because grid points at different $z$ values are uncoupled - and an 'explicit' matrix $J_{E,w}$ with the remainder
Take an inital guess$\delta x_0 = \vec{0}$ and find an updated guess $\delta\tilde{x}_1$ from
* Second identify all the terms that only couple in$z$ , and do a similar thing
where$J_{I,z}$ is block diagonal, this time points at different $w_\parallel$ are uncoupled (also $g_e$ and $p_{e\parallel}$ are uncoupled). Solve for a new guess $\delta x_1$ from
* Optionally iterate these two steps to get$\delta x_2$ , $\delta x_3$ , etc. which should be better approximations to the solution of $J.\delta x = R$ . Initial tests suggest that a single iteration might be enough - using more iterations in the ADI solve would slightly reduce the number of JFNK iterations, but not by the factor of 2,3,4... that would be needed to cover the cost of the extra ADI iterations.
Todo: