-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update scaling for adjoint problems #378
base: master
Are you sure you want to change the base?
Conversation
- Previously the scaling failed because construct_evaluator of the StationManager has already been called and thus cost_function_scaling_0d does not get updated, even if cost_function_scaling does. - The scaling itself needs to be higher than previously and seems to be related to the magnitude of dJ/dm rather than J
# Define the scaling for the cost function so that J ~ O(1) | ||
J_scalar = domain_constant(solver_obj.dt / options.simulation_end_time, mesh2d) | ||
# TODO: is this right, or is it dJ/dm that needs to be ~ O(1)? | ||
J_scalar = sta_manager.cost_function_scaling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of magnitude of the gradient will change as the iterations of the optimisation routine progress, so surely this can't be fulfilled? Whereas the cost function value should stay at a similar order of magnitude if the initial guess is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My TODO comment was not clear enough, sorry. My first objective was to re-introduce adjoint testing once a Firedrake update had fixed an issue that prevented the adjoint working with these examples. Once that was done, I was changing the scaling as I realised that none of the examples were getting past line search 0. I then realised the scaling was having no effect. Once I fixed that, I noticed that whether the optimisation actually occurred seemed to be linked to the starting value of dJ/dm, not J. If I run the channel inversion example:
For brevity I'll refer to scaling factor A where cost_function_scaling = domain_constant(A * solver_obj.dt / options.simulation_end_time, mesh2d)
Max. iters has been set to 5.
Bathymetry with scaling of A = 100000
line search 0: J=2.027e+05, dJdm=[6.6646e+00], grad_ev=1, duration 94.7 s
...
line search 6: J=1.204e+03, dJdm=[1.4478e-01], grad_ev=1, duration 36.3 s
Optimal solution found.
Bathymetry with scaling of A = 10000
line search 0: J=2.027e+04, dJdm=[6.6646e-01], grad_ev=1, duration 36.0 s
...
line search 3: J=5.082e+02, dJdm=[5.1814e-02], grad_ev=2, duration 70.7 s
Results are sub-optimal but process ends.
Bathymetry with scaling of A = 1000
line search 0: J=2.027e+03, dJdm=[6.6646e-02, grad_ev=1, duration 36.0 s
No iterations are performed after LS0.
Manning with scaling of A = 100
line search 0: J=2.731e+02, dJdm=[8.3082e-01], grad_ev=1, duration 36.7 s
...
line search 6: J=4.291e-01, dJdm=[3.3220e-02], grad_ev=1, duration 31.5 s
Optimal solution found.
Manning with scaling of A = 10
line search 0: J=2.731e+01, dJdm=[8.3082e-02], grad_ev=1, duration 31.0 s
No iterations are performed after LS0.
InitialElev with scaling of A = 10000
line search 0: J=2.356e+05, dJdm=[7.8508e+00], grad_ev=1, duration 30.4 s
...
line search 6: J=4.935e+01, dJdm=[5.7339e-02], grad_ev=1, duration 30.4 s
Optimal solution found.}
InitialElev with scaling of A = 1000
line search 0: J=2.356e+04, dJdm=[7.8508e-01], grad_ev=1, duration 24.3 s
...
line search 3: J=7.643e+02, dJdm=[7.8171e-02], grad_ev=1, duration 31.1 s
Results are sub-optimal but process ends.
InitialElev with scaling of A = 100
line search 0: J=2.356e+03, dJdm=[7.8508e-02], grad_ev=1, duration 30.9 s
No iterations are performed after LS0.
So this suggests to me that it is cutting out based on the gradient starting at or reaching ~O(-2) rather than the cost function itself. However, this could just be a coincidence. I admit, I have not explored this properly, hence leaving the TODO comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thought about it some more, I think you're right that we want the gradient to start around O(1). Indeed, this is mentioned elsewhere in Thetis (e.g., tidalfarm example). Apologies for any confusion.
@@ -73,6 +73,10 @@ | |||
sta_manager = inversion_tools.StationObservationManager( | |||
mesh2d, output_directory=options.output_directory | |||
) | |||
# Define the scaling for the cost function so that J ~ O(1) | |||
# TODO: is this right, or is it dJ/dm that needs to be ~ O(1)? | |||
cost_function_scaling = domain_constant(10000000 * solver_obj.dt / options.simulation_end_time, mesh2d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that the hard-coded numbers in the domain_constant
are different across the two examples. If it turns out we do indeed want to take this approach, it'd be nice if we could determine a value that could be computed from parameters related to the problem, to avoid such manual scaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree - the reasoning for the value was once again related to the starting gradient value. I think the problem is that I imagine that's what dt/T was for in the first place, which doesn't seem to work any more. I'm certainly not an expert in the adjoint based stuff so I've done this by trial and error observations as you can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dt/T part is accounting for time, but I guess we also need the scaling to account for space, which is where the numbers you've entered come in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's okay to just leave it as-is for now, with an appropriate TODO note.
Updates scaling in the channel and Tohoku inversion examples and add the channel inversion back into testing.
construct_evaluator
of theStationManager
ininversion_tools.py
has already been called and thuscost_function_scaling_0d
does not get updated, even ifcost_function_scaling
does.Cannot be merged until Firedrake PR #3932 is merged.