-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gaea C6 support for UFSWM #2448
Conversation
cpld_control_p8 intel fails for timing out, so there's work to tweak the configs to better match the C6 hardware. I think there's still lots of other items to check here, this is just a placeholder for now. Please feel free to send PR's to my fork/branch to add/adjust/fix any issues etc... |
Also, once things start falling into place, we'll need to make sure intelllvm support is available for c6. |
@BrianCurtis-NOAA, name change suggestion:
|
@BrianCurtis-NOAA Shall I re-try building with these |
cpld_control_p8 fails with:
and control_p8 runs to completion:
|
@DusanJovic-NOAA this look ok?:
|
Yes. |
@BrianCurtis-NOAA @jkbk2004 @FernandoAndrade-NOAA i believe EPIC now has full access to the |
@BrianCurtis-NOAA can you sync up branch? I think I am able to create baseline on c6: /gpfs/f6/bil-fire8/world-shared/role.epic/UFS-WM_RT/NEMSfv3gfs. |
Continue to see failures with various cases.
About 3 different behaviors and error messages:
@ulmononian @RatkoVasic-NOAA we need troubleshooting from lib side. |
@RatkoVasic-NOAA @BrianCurtis-NOAA |
Any combination is OK, as long as they are same length. |
@MichaelLueken just fyi regarding c5/c6 naming conventions. i recall there was a desire to sync the srw ci/cd pipeline w/ certain gaea c5/c6 naming conventions. |
I'll be going with gaeac6 and gaeac5, FYI. I'll make those changes at some point tomorrow. |
@BrianCurtis-NOAA @ulmononian @jkbk2004
Also adding in ./tests/fv3_conf/fv3_slurm.IN_gaea: |
please try what @RatkoVasic-NOAA has suggested in your job cards, before fv3.exe is run: export FI_VERBS_PREFER_XRC=0. this is a known issue inherent to the c5 system. may also try for c6. |
@jkbk2004 @BrianCurtis-NOAA |
@BrianCurtis-NOAA @jkbk2004 @ulmononian
If there is need more work on Gaea C6, I can make PR now. There are only 4 files that needed change, provided here. |
Let me put all of this together and update this PR. |
This is not up-to-date for either CMEPS or CDEPS. |
It will be interesting to see how ESMF version 8.8.0 will affect things on Gaea C6 with ESMF-managed threading. The latest feature frozen beta for 8.8.0 is v8.8.0b10. A lot of work for 8.8.0 was in one of the areas of the framework that affects high core count ESMF-managed threading runs. @GeorgeVandenberghe-NOAA reported some positive effects (using the earlier snapshot v8.8.0b09) for large core count runs on Gaea C5. |
I was going to build v8.8.b902 but should I instead just make b8.8.0b10
available everywhere I can and freeze on that for the next few months for
large core count runs? I can build it on hercules, orion, hera
(irrelevant there) , gaeaC5 and GaeaC6. I am forbidden of course from
building it on WCOSS2. I build in my private stack outside of spack-stack
before spack-stack is ready to include it.
…On Tue, Dec 17, 2024 at 5:36 PM Gerhard Theurich ***@***.***> wrote:
It will be interesting to see how ESMF version 8.8.0 will affect things on
Gaea C6 with ESMF-managed threading. The latest feature frozen beta for
8.8.0 is v8.8.0b10
<https://github.com/esmf-org/esmf/releases/tag/v8.8.0b10>. A lot of work
for 8.8.0 was in one of the areas of the framework that affects high core
count ESMF-managed threading runs. @GeorgeVandenberghe-NOAA
<https://github.com/GeorgeVandenberghe-NOAA> reported some positive
effects (using the earlier snapshot v8.8.0b09) for large core count runs on
Gaea C5.
—
Reply to this email directly, view it on GitHub
<#2448 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FXW6GI732CO4RHHPVT2GBOKTAVCNFSM6AAAAABPIFNEJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBZGEZTENRXGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
@GeorgeVandenberghe-NOAA it's probably worth some coordination with the spack-stack folks on the UFS side, like @AlexanderRichert-NOAA and @RatkoVasic-NOAA. Spack-stack is moving forward with the latest ESMF beta tag v8.8.0b10: JCSDA/spack-stack#1409 |
FYI: WCOSS2 won't accept a beta snapshot, so if we want to get the latest ESMF in WCOSS2, it will need an official release at some point soon. Also since the process has typically been slow, we will want to try getting that started as soon as there is an official release. |
@BrianCurtis-NOAA The official ESMF 8.8.0 release date is planned for early/mid January. |
But of course we do need the beta testing, so we understand how 8.8.0 will be doing in the field. |
Yep. Won't happen on WCOSS2 by policy 😡
…On Tue, Dec 17, 2024 at 8:03 PM Gerhard Theurich ***@***.***> wrote:
But of course we do need the beta testing, so we understand how 8.8.0 will
be doing in the field.
—
Reply to this email directly, view it on GitHub
<#2448 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FVDA4QYC7GUAW2O5AD2GB7STAVCNFSM6AAAAABPIFNEJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBZGQ4TMNJRGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
I will likely be ahead of spack-stack in making this available. You've
answered the question.. I will go with v8.8.0b10. Probably be available
on Gaea C5 under my stack tomorrow and Orion/Hercules on Thursday.
…On Tue, Dec 17, 2024 at 7:54 PM Gerhard Theurich ***@***.***> wrote:
I was going to build v8.8.b902 but should I instead just make b8.8.0b10
available everywhere I can and freeze on that for the next few months for
large core count runs? I can build it on hercules, orion, hera (irrelevant
there) , gaeaC5 and GaeaC6. I am forbidden of course from building it on
WCOSS2. I build in my private stack outside of spack-stack before
spack-stack is ready to include it.
@GeorgeVandenberghe-NOAA <https://github.com/GeorgeVandenberghe-NOAA>
it's probably worth some coordination with the spack-stack folks on the UFS
side, like @AlexanderRichert-NOAA
<https://github.com/AlexanderRichert-NOAA> and @RatkoVasic-NOAA
<https://github.com/RatkoVasic-NOAA>. Spack-stack is moving forward with
the latest ESMF beta tag v8.8.0b10: JCSDA/spack-stack#1409
<JCSDA/spack-stack#1409>
—
Reply to this email directly, view it on GitHub
<#2448 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FR2STURI2ZSR6ITUST2GB6NLAVCNFSM6AAAAABPIFNEJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBZGQ3TOOJVGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some slow compile times (s2swa_32bit_pdlib_sfs_intel, s2swa_debug_intel, s2s_intel, s2swa_faster_intel), may be worth monitoring if they persist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Nick, we definitely need to look into those compile times for C6.
A lot of thanks to EPIC group in helping to get this PR to the finish line. |
Is ESMF/8-8-10 in spack-stack on Gaea C6 or should I try to build it
there. I haven't focused much on C6 because ufs-weather-model didn't work
there until
just now and due to policies NCEP does not have much footprint on C6. I
can cobble together build and run systems but it's great I will no longer
need to
and I dream of being able to sunset all of my private hacks forever.
…On Wed, Dec 18, 2024 at 5:22 PM Brian Curtis ***@***.***> wrote:
A lot of thanks to EPIC group in helping to get this PR to the finish line.
—
Reply to this email directly, view it on GitHub
<#2448 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FRCW4IXG66EARKW67T2GGVNVAVCNFSM6AAAAABPIFNEJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJRHA4DINBWGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
@GeorgeVandenberghe-NOAA not for now. UFS-WM is using [email protected] (with esmf 8.6.0) and latest spack-stack (1.8.0) is installed with [email protected]. It can be added as chained environment, like @AlexanderRichert-NOAA did it on Hercules:
|
This PR adds the following: * converting from inp -> nml (@sbanihash) * turning on PIO for waves for restarts (@sbanihash) * enabling cycling for WW3 which required some updates to wave prep jobs + changing what restarts are being saved/etc * changed the way CEMPS, MOM6, CICE and WW3 write restarts to be in sync with FV3 for IAU, which required moving the ufs-weather-model forward one hash to use the new flexible restart feature. (UFS PR ufs-community/ufs-weather-model#2419) * adds uglo_15km - the targeted new wave grid. * Update to use new esmf_threading ufs.configure files which changes the toggle between how you use esmf threading versus traditional threading (UFS PR ufs-community/ufs-weather-model#2538) Notes on ufs-weather-model updates: | Commit date | Commit hash/ PR | Notes for g-w changes | Baseline Changes | | :------------- | :------------- | :------------- | :------------- | | Dec 11, 2024 | ufs-community/ufs-weather-model@409bc85 ufs-community/ufs-weather-model#2419 | Enables flexible restart writes - changes included in g-w PR | none| | Dec 16, 2024 | ufs-community/ufs-weather-model@6ec6b45 ufs-community/ufs-weather-model#2528 ufs-community/ufs-weather-model#2469 | n/a | HAFs test changes, no global changes | | Dec 18, 2024 |ufs-community/ufs-weather-model@e119370 ufs-community/ufs-weather-model#2448 | Adds Gaea C6 support (changes in other g-w PRs, not here) | none | |Dec 23, 2024 | ufs-community/ufs-weather-model@2950089 ufs-community/ufs-weather-model#2533 ufs-community/ufs-weather-model#2538 | changes for ESMF vs traditional threading | none | |Dec 30, 2024 | ufs-community/ufs-weather-model@241dd8e ufs-community/ufs-weather-model#2485 | n/a | changes in conus13km, no global changes| |Jan 3, 2025 | ufs-community/ufs-weather-model@76471dc ufs-community/ufs-weather-model#2530 | n/a | changes in regional tests, no global changes | Note this PR requires the following: * update to fix files to add uglo_15km * staging ICs for high resolution test case for uglo_15km Co-author: @sbanihash Related Issues: - Fixes #1457 - Fixes #3154 - Fixes #1795 - related to #1776 --------- Co-authored-by: Rahul Mahajan <[email protected]> Co-authored-by: Saeideh Banihashemi <[email protected]> Co-authored-by: David Huber <[email protected]> Co-authored-by: Walter Kolczynski - NOAA <[email protected]>
Commit Queue Requirements:
Description:
This PR will bring in all changes necessary to provide Gaea C6 support for UFSWM
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
Testing Log: