Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update scaling for adjoint problems #378

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

cpjordan
Copy link
Contributor

@cpjordan cpjordan commented Dec 19, 2024

Updates scaling in the channel and Tohoku inversion examples and add the channel inversion back into testing.

  • Previously the scaling has no effect because construct_evaluator of the StationManager in inversion_tools.py has already been called and thus cost_function_scaling_0d does not get updated, even if cost_function_scaling does.
  • The scaling itself needs to be higher than previously and seems to be related to the magnitude of dJ/dm rather than J

Cannot be merged until Firedrake PR #3932 is merged.

cpjordan and others added 2 commits December 19, 2024 11:00
- Previously the scaling failed because construct_evaluator of the StationManager has already been called and thus cost_function_scaling_0d does not get updated, even if cost_function_scaling does.
- The scaling itself needs to be higher than previously and seems to be related to the magnitude of dJ/dm rather than J
@cpjordan cpjordan marked this pull request as ready for review January 6, 2025 11:12
@cpjordan cpjordan requested a review from jwallwork23 January 7, 2025 09:26
Comment on lines 98 to +100
# Define the scaling for the cost function so that J ~ O(1)
J_scalar = domain_constant(solver_obj.dt / options.simulation_end_time, mesh2d)
# TODO: is this right, or is it dJ/dm that needs to be ~ O(1)?
J_scalar = sta_manager.cost_function_scaling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of magnitude of the gradient will change as the iterations of the optimisation routine progress, so surely this can't be fulfilled? Whereas the cost function value should stay at a similar order of magnitude if the initial guess is reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My TODO comment was not clear enough, sorry. My first objective was to re-introduce adjoint testing once a Firedrake update had fixed an issue that prevented the adjoint working with these examples. Once that was done, I was changing the scaling as I realised that none of the examples were getting past line search 0. I then realised the scaling was having no effect. Once I fixed that, I noticed that whether the optimisation actually occurred seemed to be linked to the starting value of dJ/dm, not J. If I run the channel inversion example:

For brevity I'll refer to scaling factor A where cost_function_scaling = domain_constant(A * solver_obj.dt / options.simulation_end_time, mesh2d)

Max. iters has been set to 5.
Bathymetry with scaling of A = 100000

line search  0: J=2.027e+05, dJdm=[6.6646e+00], grad_ev=1, duration 94.7 s
...
line search  6: J=1.204e+03, dJdm=[1.4478e-01], grad_ev=1, duration 36.3 s

Optimal solution found.

Bathymetry with scaling of A = 10000

line search  0: J=2.027e+04, dJdm=[6.6646e-01], grad_ev=1, duration 36.0 s
...
line search  3: J=5.082e+02, dJdm=[5.1814e-02], grad_ev=2, duration 70.7 s

Results are sub-optimal but process ends.

Bathymetry with scaling of A = 1000
line search 0: J=2.027e+03, dJdm=[6.6646e-02, grad_ev=1, duration 36.0 s
No iterations are performed after LS0.

Manning with scaling of A = 100

line search  0: J=2.731e+02, dJdm=[8.3082e-01], grad_ev=1, duration 36.7 s
...
line search  6: J=4.291e-01, dJdm=[3.3220e-02], grad_ev=1, duration 31.5 s

Optimal solution found.

Manning with scaling of A = 10
line search 0: J=2.731e+01, dJdm=[8.3082e-02], grad_ev=1, duration 31.0 s
No iterations are performed after LS0.

InitialElev with scaling of A = 10000

line search  0: J=2.356e+05, dJdm=[7.8508e+00], grad_ev=1, duration 30.4 s
...
line search  6: J=4.935e+01, dJdm=[5.7339e-02], grad_ev=1, duration 30.4 s

Optimal solution found.}

InitialElev with scaling of A = 1000

line search  0: J=2.356e+04, dJdm=[7.8508e-01], grad_ev=1, duration 24.3 s
...
line search  3: J=7.643e+02, dJdm=[7.8171e-02], grad_ev=1, duration 31.1 s

Results are sub-optimal but process ends.

InitialElev with scaling of A = 100
line search 0: J=2.356e+03, dJdm=[7.8508e-02], grad_ev=1, duration 30.9 s
No iterations are performed after LS0.

So this suggests to me that it is cutting out based on the gradient starting at or reaching ~O(-2) rather than the cost function itself. However, this could just be a coincidence. I admit, I have not explored this properly, hence leaving the TODO comment!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about it some more, I think you're right that we want the gradient to start around O(1). Indeed, this is mentioned elsewhere in Thetis (e.g., tidalfarm example). Apologies for any confusion.

@@ -73,6 +73,10 @@
sta_manager = inversion_tools.StationObservationManager(
mesh2d, output_directory=options.output_directory
)
# Define the scaling for the cost function so that J ~ O(1)
# TODO: is this right, or is it dJ/dm that needs to be ~ O(1)?
cost_function_scaling = domain_constant(10000000 * solver_obj.dt / options.simulation_end_time, mesh2d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that the hard-coded numbers in the domain_constant are different across the two examples. If it turns out we do indeed want to take this approach, it'd be nice if we could determine a value that could be computed from parameters related to the problem, to avoid such manual scaling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree - the reasoning for the value was once again related to the starting gradient value. I think the problem is that I imagine that's what dt/T was for in the first place, which doesn't seem to work any more. I'm certainly not an expert in the adjoint based stuff so I've done this by trial and error observations as you can see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dt/T part is accounting for time, but I guess we also need the scaling to account for space, which is where the numbers you've entered come in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's okay to just leave it as-is for now, with an appropriate TODO note.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants