-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken GPU tests for CLUBB code #1226
base: cam_development
Are you sure you want to change the base?
Fix broken GPU tests for CLUBB code #1226
Conversation
This fix confused me for a while, but I think I can explain it. If we have the setting I am unclear on why this didn't break the ECT test though. The values of those 5 member arrays should be different on the GPU than the CPU due to the lack of correct copying, and I would expect that to change answers. |
Thanks @huebleruwm for your feedback. The five variables mentioned by you are initialized here and I think it should be fine to use In addition, I guess even if we need the initial values of those variables, we should just do a |
They're only initialized if it's the first restart step, otherwise we rely on the values from the previous timestep being maintained. It might be a more robust solution to remove the Right now we would need them as |
Ah I see. Thanks for your clarification. That is clear to me now. Would you like to implement the more robust solution or move on with my fix here? For the former one, you can either issue a PR to my branch or issue a different PR here to drop mine. For the latter one, I can simply add the Regarding the ECT test, I am also surprised that the GPU code can pass given that it seems like a code bug. Are these five variables for diagnostic purpose only and not output to the history file? If so, they may not be used for ECT at all. |
I think it's simplest to go with your fix and move Those variables aren't just diagnostic and should affect the output. I took a look at where they're used internally though, it's to set the bounds of a complicated clipping routine that is rarely called (actually the same place where the initial ECT test breaking bug was). So I suppose it's possible that we need to run more timesteps before the initial value of those variables ever become important, and the ECT test just doesn't run enough timesteps to tease out the bug. |
Thanks @huebleruwm . That sounds good. I have added the Your comments on the ECT make sense to me. If that clipping routine and five variables are rarely called/used, the output may not be changed significantly within a few time steps between CPU and GPU (though the answer is not BFB). Thus the ECT test won't treat it as an error. As you said, maybe we need to run a longer ECT test to capture such kind of bug. |
This PR fixes the broken ERS tests due to the recent GPU changes of CLUBB code (PR #1175).
Note that I need a new
ccs_config
tag from ESMCI/ccs_config_cesm#204 to complete this PR.Closes #1220