-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update hpc-stack and compiler versions #812
Comments
Hi @WalterKolczynski-NOAA. I encountered an issue with the load of met/9.1 on Orion in S2SW (coupled, forecast-only) and ATM (uncoupled, forecast only) tests using the develop branch. The same error is produced in the gfsfcst task of the ATM case and gfswaveinit task of the S2SW case: I was able to bypass the issue by removing line 55 from module_base.orion.lua (load(pathJoin("met", "9.1"))). When I check the modules loaded by load_fv3gfs_modules.sh when testing outside of runtime, met/9.1 seems to still load despite removing this line: I did not know if this was worth its own issue, and thought perhaps it was related to this issue. Thank you, Cameron. |
This looks like an unrelated issue. Please open a new one. |
Hello, was an issue created for the Metplus problem ? |
hpc-stack has had a PR open since Sept to add it: NOAA-EMC/hpc-stack/pull/324 Unless you are talking about Cameron's issue, which was just a permissions issue on my hacked-up stopgap solution to a completely separate problem than this issue. |
Thanks Walter ! I believe that is the thing. |
@WalterKolczynski-NOAA I'm working on porting the global workflow and subcomponents to a google cloud instance (not an RDHPCS instance) using Intel 2021 compilers and hpc-stack 1.2.0. Would it be beneficial to this issue to document hurdles and workarounds? |
Yes, although AFAIK the big hurdle is GSI. |
@WalterKolczynski-NOAA Good to know, thanks. So far, I have only built the ufs from build_ufs.sh, which just required changing intel versions to 2021, hpc-stack version to 1.2.0, and |
No hurdles to build the GSI. I won't be able to run RTs on the GCP as they contain restricted data. I'm guessing the issues were at runtime? Will update when I get cycling running. |
The GSI and ENKF were a little tricky. The GSI in particular does not play well with Intel 2021/2 compilers and -O3 optimization. I tried to get regression tests to run using Intel 2021.3.0, 2022.1.2, and 2022.3.0, but all continue to crash at the same line (see NOAA-EMC/GSI#447). Turning the optimization down to -O0 does allow the regression tests to run to completion, while -O2 allows some regression tests to pass but others still fail, albeit at different lines. On the GCP, I have been able to cycle with -O2 with only one disruption. The job gdasanalcalc often (but not always) hangs when executing calc_anl.x at gsi_utils/src/netcdf_io/calc_analysis.fd/inc2anl.f90:139 when attempting to write the sfcanl netcdf file via ncio (hanging at write_vardata_code.f90:73). Rebooting the job enough times does clear this hurdle, but it is annoying. I am in the process of building a global workflow on Hera with Intel 2022.1.2 with the GSI's optimization turned down to -O2 and will see if I can tweak flags for the gsi_utils build as well to get cycling to run more smoothly. |
@WalterKolczynski-NOAA I will lose access to this GCP instance tomorrow, so I won't be able to continue this work there. I will note one other important issue that cropped up at C384. One of the EnKF recentering applications in gdasecen000-2 produces completely nonsensical values (e.g. temperatures on the order of 10^30). This is either a silent failure or possibly an issue with ncio. Unfortunately, I did not have time to root it out. Is there any other information you would like me to gather before I lose access? |
I don't believe so. The DA group is well aware of the compiler version problems, and we reminded them again last week and asked to have it made a higher priority as it is blocking standardization work. |
Capturing a discussion by @RussTreadon-NOAA, @WalterKolczynski-NOAA, @KateFriedman-NOAA, @hu5970, and myself from the GSI upgrade to Intel 2022 (NOAA-EMC/GSI#447 NOAA-EMC/GSI#571). The miniconda versions in the GSI are being updated to newer, EPIC-managed locations. These should be updated with an upgrade to Intel 2022 in the global workflow as well. Note that these locations will update again when migrating to spack-stack. |
I am moving towards using spack-stack (#1868) on all systems and will bypass the newer Intel 2022 hpc-stack builds. Closing. |
* update submodule pointer for regression testing of ccpp-physics#812
Description
HPCs are moving away from the Intel 2018 compiler, so we should update to one of the newer compiler versions. We should also take the opportunity to move to the hpc-stack/1.2.0
Requirements
Programs should compile and run using Intel 2021 or 2022 compilers
Acceptance Criteria (Definition of Done)
(These changes may also require other library updates, depending on what is available.)
Dependencies
Workflow-controlled programs have no dependencies, but those controlled by other components will need similar changes to move the entire system to a newer compiler. These may also be needed to prevent mismatches at runtime.
The text was updated successfully, but these errors were encountered: