Update scaling for adjoint problems #378

cpjordan · 2024-12-19T11:17:09Z

Updates scaling in the channel and Tohoku inversion examples and add the channel inversion back into testing.

Previously the scaling has no effect because construct_evaluator of the StationManager in inversion_tools.py has already been called and thus cost_function_scaling_0d does not get updated, even if cost_function_scaling does.
The scaling itself needs to be higher than previously and seems to be related to the magnitude of dJ/dm rather than J

Cannot be merged until Firedrake PR #3932 is merged.

- Previously the scaling failed because construct_evaluator of the StationManager has already been called and thus cost_function_scaling_0d does not get updated, even if cost_function_scaling does. - The scaling itself needs to be higher than previously and seems to be related to the magnitude of dJ/dm rather than J

jwallwork23 · 2025-01-09T18:10:40Z

examples/channel_inversion/inverse_problem.py

 # Define the scaling for the cost function so that J ~ O(1)
-J_scalar = domain_constant(solver_obj.dt / options.simulation_end_time, mesh2d)
+# TODO: is this right, or is it dJ/dm that needs to be ~ O(1)?
+J_scalar = sta_manager.cost_function_scaling


The order of magnitude of the gradient will change as the iterations of the optimisation routine progress, so surely this can't be fulfilled? Whereas the cost function value should stay at a similar order of magnitude if the initial guess is reasonable.

My TODO comment was not clear enough, sorry. My first objective was to re-introduce adjoint testing once a Firedrake update had fixed an issue that prevented the adjoint working with these examples. Once that was done, I was changing the scaling as I realised that none of the examples were getting past line search 0. I then realised the scaling was having no effect. Once I fixed that, I noticed that whether the optimisation actually occurred seemed to be linked to the starting value of dJ/dm, not J. If I run the channel inversion example:

For brevity I'll refer to scaling factor A where cost_function_scaling = domain_constant(A * solver_obj.dt / options.simulation_end_time, mesh2d)

Max. iters has been set to 5.
Bathymetry with scaling of A = 100000

line search 0: J=2.027e+05, dJdm=[6.6646e+00], grad_ev=1, duration 94.7 s ... line search 6: J=1.204e+03, dJdm=[1.4478e-01], grad_ev=1, duration 36.3 s

Optimal solution found.

Bathymetry with scaling of A = 10000

line search 0: J=2.027e+04, dJdm=[6.6646e-01], grad_ev=1, duration 36.0 s ... line search 3: J=5.082e+02, dJdm=[5.1814e-02], grad_ev=2, duration 70.7 s

Results are sub-optimal but process ends.

Bathymetry with scaling of A = 1000
line search 0: J=2.027e+03, dJdm=[6.6646e-02, grad_ev=1, duration 36.0 s
No iterations are performed after LS0.

Manning with scaling of A = 100

line search 0: J=2.731e+02, dJdm=[8.3082e-01], grad_ev=1, duration 36.7 s ... line search 6: J=4.291e-01, dJdm=[3.3220e-02], grad_ev=1, duration 31.5 s

Optimal solution found.

Manning with scaling of A = 10
line search 0: J=2.731e+01, dJdm=[8.3082e-02], grad_ev=1, duration 31.0 s
No iterations are performed after LS0.

InitialElev with scaling of A = 10000

line search 0: J=2.356e+05, dJdm=[7.8508e+00], grad_ev=1, duration 30.4 s ... line search 6: J=4.935e+01, dJdm=[5.7339e-02], grad_ev=1, duration 30.4 s

Optimal solution found.}

InitialElev with scaling of A = 1000

line search 0: J=2.356e+04, dJdm=[7.8508e-01], grad_ev=1, duration 24.3 s ... line search 3: J=7.643e+02, dJdm=[7.8171e-02], grad_ev=1, duration 31.1 s

Results are sub-optimal but process ends.

InitialElev with scaling of A = 100
line search 0: J=2.356e+03, dJdm=[7.8508e-02], grad_ev=1, duration 30.9 s
No iterations are performed after LS0.

So this suggests to me that it is cutting out based on the gradient starting at or reaching ~O(-2) rather than the cost function itself. However, this could just be a coincidence. I admit, I have not explored this properly, hence leaving the TODO comment!

Having thought about it some more, I think you're right that we want the gradient to start around O(1). Indeed, this is mentioned elsewhere in Thetis (e.g., tidalfarm example). Apologies for any confusion.

jwallwork23 · 2025-01-09T18:12:26Z

examples/tohoku_inversion/inverse_problem.py

@@ -73,6 +73,10 @@
 sta_manager = inversion_tools.StationObservationManager(
    mesh2d, output_directory=options.output_directory
 )
+# Define the scaling for the cost function so that J ~ O(1)
+# TODO: is this right, or is it dJ/dm that needs to be ~ O(1)?
+cost_function_scaling = domain_constant(10000000 * solver_obj.dt / options.simulation_end_time, mesh2d)


I notice that the hard-coded numbers in the domain_constant are different across the two examples. If it turns out we do indeed want to take this approach, it'd be nice if we could determine a value that could be computed from parameters related to the problem, to avoid such manual scaling.

Yes, I agree - the reasoning for the value was once again related to the starting gradient value. I think the problem is that I imagine that's what dt/T was for in the first place, which doesn't seem to work any more. I'm certainly not an expert in the adjoint based stuff so I've done this by trial and error observations as you can see.

The dt/T part is accounting for time, but I guess we also need the scaling to account for space, which is where the numbers you've entered come in.

Perhaps it's okay to just leave it as-is for now, with an appropriate TODO note.

cpjordan and others added 2 commits December 19, 2024 11:00

Merge branch 'master' into correct-adjoint-scaling

e49e8bc

cpjordan marked this pull request as ready for review January 6, 2025 11:12

Merge branch 'master' into correct-adjoint-scaling

9b4f40b

cpjordan requested a review from jwallwork23 January 7, 2025 09:26

jwallwork23 reviewed Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update scaling for adjoint problems #378

Update scaling for adjoint problems #378

cpjordan commented Dec 19, 2024 •

edited

Loading

jwallwork23 Jan 9, 2025

cpjordan Jan 10, 2025

jwallwork23 Jan 14, 2025

jwallwork23 Jan 9, 2025

cpjordan Jan 10, 2025

jwallwork23 Jan 14, 2025

jwallwork23 Jan 14, 2025

Update scaling for adjoint problems #378

Are you sure you want to change the base?

Update scaling for adjoint problems #378

Conversation

cpjordan commented Dec 19, 2024 • edited Loading

jwallwork23 Jan 9, 2025

Choose a reason for hiding this comment

cpjordan Jan 10, 2025

Choose a reason for hiding this comment

jwallwork23 Jan 14, 2025

Choose a reason for hiding this comment

jwallwork23 Jan 9, 2025

Choose a reason for hiding this comment

cpjordan Jan 10, 2025

Choose a reason for hiding this comment

jwallwork23 Jan 14, 2025

Choose a reason for hiding this comment

jwallwork23 Jan 14, 2025

Choose a reason for hiding this comment

cpjordan commented Dec 19, 2024 •

edited

Loading