Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU test fails restart comparison #1220

Open
nusbaume opened this issue Jan 7, 2025 · 1 comment
Open

GPU test fails restart comparison #1220

nusbaume opened this issue Jan 7, 2025 · 1 comment
Assignees
Labels
bug Something isn't working correctly CoupledEval3

Comments

@nusbaume
Copy link
Collaborator

nusbaume commented Jan 7, 2025

What happened?

It was found that the changes brought in by PR #1175 caused the aux_cam GPU test:

ERS_Ln9.ne30pg3_ne30pg3_mg17.F2000dev.derecho_nvhpc.cam-outfrq9s_gpu_default

To fail the restart comparison (i.e. restarting the model changes the answers versus a no-restart run). All restart tests on CPUs pass as expected.

What are the steps to reproduce the bug?

Run the CAM regression tests with the nvhpc compiler option on Derecho with CAM tag cam6_4_052 or later.

What CAM tag were you using?

cam6_4_052

What machine were you running CAM on?

CISL machine (e.g. cheyenne)

What compiler were you using?

NVHPC

Path to a case directory, if applicable

No response

Will you be addressing this bug yourself?

Yes, but I will need some help

Extra info

@huebleruwm @sjsprecious I will likely need your help on this, especially given that it only occurs for the GPU test.

@nusbaume nusbaume added the bug Something isn't working correctly label Jan 7, 2025
@huebleruwm
Copy link

I took a look at the code and found these variables "pdf_zm_w_1, pdf_zm_w_2, pdf_zm_varnce_w_1, pdf_zm_varnce_w_2, pdf_zm_mixt_frac" that are allocated on the GPU with a create statement, but they should be copied in and out using a copy statement.

I thought moving line 2896, in clubb_intr.F90, to the copy section in the same clause around line 2870 would fix the issue, but I tried this and the test still failed. I don't think there's any other code specific to restarts in clubb_intr, so I'm not immediately sure what else to try, I'll have more time to look at this after next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly CoupledEval3
Projects
Status: To Do
Development

No branches or pull requests

4 participants