-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support IntelLLVM compiler #353
Conversation
driver/fvGFS/atmosphere.F90
Outdated
@@ -447,8 +447,20 @@ subroutine atmosphere_init (Time_init, Time, Time_step, Grid_box, area) | |||
! First, read atmos_model_nml namelist section - this is a workaround to avoid | |||
! unnecessary additional changes to the input namelists, in anticipation of the | |||
! implementation of a generic interface for GFDL and CCPP fast physics soon | |||
#ifdef __INTEL_LLVM_COMPILER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bensonr @DusanJovic-NOAA this looks like a compiler bug since reading internal files is part of the Fortran Standard, AFAIK. Also why would it only affect atmos_model_nml
and not the other namelists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lharris4 - As you noted, this is only being implemented here and not at every place namelists are being read from an internal file. Hopefully this has been tested with the latest OneAPI (2024.2.0) and if it is still an issue a bug has been filed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just confirmed that with the updated compiler on Hera (ifx (IFX) 2023.2.0 20230622) there is no internal compiler error anymore. I am going to remove this change.
One of the ufs-weather-model debug regression tests is failing with floating-point invalid exception. Here is the backtrace:
Line 433: 428 do k=2,km+1
429 do i=is1, ie1
430 pem(i,k) = pem(i,k-1) + dm(i,k-1)
431 #ifdef USE_COND
432 ! Excluding contribution from condensates:
433 peg(i,k) = peg(i,k-1) + dm(i,k-1)*(1.-q_con(i,j,k-1))
434 #endif
435 enddo
436 enddo I found that diff --git a/model/fv_arrays.F90 b/model/fv_arrays.F90
index e6e65a6..d18ca37 100644
--- a/model/fv_arrays.F90
+++ b/model/fv_arrays.F90
@@ -1554,6 +1554,7 @@ contains
#else
allocate ( Atm%q_con(isd:isd,jsd:jsd,1) )
#endif
+ Atm%q_con = 0.0
! Notes by SJL
! Place the memory in the optimal shared mem space regression test finishes successfully. Should I add this q_con initialization to zero? |
Hi, @DusanJovic-NOAA Which specific test is failing? This may indicate that q_con is not correctly filled with the proper data in some situations. You are right that it would be best to explicitly initialize |
Test that fails is 'control_debug_p8'. |
When I initialize diff --git a/model/fv_arrays.F90 b/model/fv_arrays.F90
index e6e65a6..b2c1fc9 100644
--- a/model/fv_arrays.F90
+++ b/model/fv_arrays.F90
@@ -1566,6 +1566,9 @@ contains
Atm%va(i,j,k) = real_big
Atm%pt(i,j,k) = real_big
Atm%delp(i,j,k) = real_big
+#ifdef USE_COND
+ Atm%q_con(i,j,k) = real_big
+#endif
enddo
enddo
do j=jsd, jed+1
@@ -1616,6 +1619,9 @@ contains
Atm%ts(i,j) = 300.
Atm%phis(i,j) = real_big
+#ifndef USE_COND
+ Atm%q_con(i,j,1) = real_big
+#endif
enddo
enddo model still fails with invalid floating point exception. If I add debug prints, like: diff --git a/model/nh_utils.F90 b/model/nh_utils.F90
index 2b89aea..2a56f7e 100644
--- a/model/nh_utils.F90
+++ b/model/nh_utils.F90
@@ -430,6 +430,9 @@ CONTAINS
pem(i,k) = pem(i,k-1) + dm(i,k-1)
#ifdef USE_COND
! Excluding contribution from condensates:
+ if(i==is1 .and. j==(js-1) .and. k==2) then
+ write(0,*)i,j,k-1, q_con(i,j,k-1)
+ endif
peg(i,k) = peg(i,k-1) + dm(i,k-1)*(1.-q_con(i,j,k-1))
#endif
enddo
@@ -439,6 +442,9 @@ CONTAINS
do i=is1, ie1
dz2(i,k) = gz(i,j,k+1) - gz(i,j,k)
#ifdef USE_COND
+ if(i==is1 .and. j==(js-1) .and. k==1) then
+ write(0,*)i,j,k, peg(i,k),peg(i,k+1)
+ endif
pm2(i,k) = (peg(i,k+1)-peg(i,k))/log(peg(i,k+1)/peg(i,k))
#ifdef MOIST_CAPPA
I see that the backtrace:
Isn't better to initialize all arrays to NaN? That way model will fail as soon as NaN value is used, instead of initializing them to |
@DusanJovic-NOAA I think you are right; it is better to initialize to NaN so they can be caught immediately. Thank you for your suggestion. |
Ok. Initializing diff --git a/model/fv_arrays.F90 b/model/fv_arrays.F90
index e6e65a6..c239083 100644
--- a/model/fv_arrays.F90
+++ b/model/fv_arrays.F90
@@ -33,6 +33,12 @@ module fv_arrays_mod
public
+#ifdef OVERLOAD_R4
+ real, parameter:: real_snan=real(Z'FFBFFFFF')
+#else
+ real, parameter:: real_snan=real(Z'FFF7FFFFFFFFFFFF')
+#endif
+
integer, public, parameter :: R_GRID = r8_kind
!Several 'auxiliary' structures are introduced here. These are for
@@ -1566,6 +1572,10 @@ contains
Atm%va(i,j,k) = real_big
Atm%pt(i,j,k) = real_big
Atm%delp(i,j,k) = real_big
+#ifdef USE_COND
+ ! Atm%q_con(i,j,k) = real_big
+ Atm%q_con(i,j,k) = real_snan
+#endif
enddo
enddo
do j=jsd, jed+1
@@ -1616,6 +1626,10 @@ contains
Atm%ts(i,j) = 300.
Atm%phis(i,j) = real_big
+#ifndef USE_COND
+ ! Atm%q_con(i,j,1) = real_big
+ Atm%q_con(i,j,1) = real_snan
+#endif
enddo
enddo and then printing all values of diff --git a/model/nh_utils.F90 b/model/nh_utils.F90
index 2b89aea..2502e94 100644
--- a/model/nh_utils.F90
+++ b/model/nh_utils.F90
@@ -400,6 +400,12 @@ CONTAINS
enddo
endif
+ do j=js-1, je+1
+ do i=is-1, ie+1
+ write(0,*)i,j,q_con(i,j,1)
+ end do
+ end do
+ call flush(0)
!$OMP parallel do default(none) shared(js,je,is1,ie1,km,delp,pef,ptop,gz,rgrav,w3,pt, &
#ifdef MULTI_GASES
@@ -430,6 +436,9 @@ CONTAINS
pem(i,k) = pem(i,k-1) + dm(i,k-1)
#ifdef USE_COND
! Excluding contribution from condensates:
+ ! if(i==is1 .and. j==(js-1) .and. k==2) then
+ ! write(0,*)i,j,k-1, q_con(i,j,k-1)
+ ! endif
peg(i,k) = peg(i,k-1) + dm(i,k-1)*(1.-q_con(i,j,k-1))
#endif
enddo
@@ -439,6 +448,9 @@ CONTAINS
do i=is1, ie1
dz2(i,k) = gz(i,j,k+1) - gz(i,j,k)
#ifdef USE_COND
+ ! if(i==is1 .and. j==(js-1) .and. k==1) then
+ ! write(0,*)i,j,k, peg(i,k),peg(i,k+1)
+ ! endif
pm2(i,k) = (peg(i,k+1)-peg(i,k))/log(peg(i,k+1)/peg(i,k))
#ifdef MOIST_CAPPA I see:
There are exactly 24 NaN values, 4 per tile, and 4 values are at the corners of each tile. For example:
are lower-left, lower-right, upper-left and upper-right tile corners. The layout is (3,8) 24 PES per tile, and ranks 0, 2, 21 and 23 are corner PES of tile 1. Should't these corner (halo) points be filled from neighboring tiles? |
Now I think I understand what is happening. On the cubed sphere the values in the outermost corners are meaningless, since that is where three faces come to a point. So the column-wise calculations on these corner cells have no effect, but since they are filled with NaN an FPE is raised. To satisfy the compiler, we could initialize the corner values to 0 while the others are set to NaN; or alternately your original idea to just initialize Thanks, |
Thank you Lucas. Please see 1a8820f |
model/fv_arrays.F90
Outdated
#ifndef USE_COND | ||
do j=jsd, jed | ||
do i=isd, ied | ||
Atm%q_con(i,j,1) = 0. | ||
enddo | ||
enddo | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding another ifndef block here, you can safely put the Atm%q_con=0
, along with proper documentation, where it is allocated at line 1555. In this case, you are not initializing the memory in an OpenMP region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making that change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DusanJovic-NOAA thank you for your work on this. @bensonr thanks also for correcting my misconception about OpenMP.
I am testing this code on my laptop with Intel 2024.2.0 and I'm getting these compile errors:
and in fv_dynamics.F90:
Previously I was testing this one on few of the RDHPCS systems like Hercules, Hera, Gaea with slightly older compilers and I did not see these errors. I do not know if this is the regression in the latest compiler, or an actual bug fixed. FYI @climbfuji |
@DusanJovic-NOAA Can you check if the error is within the "if intel" CPP branch? For background: Older Intel compilers up to 2021.1.0 had a bug that forced us to add all these #ifdef statements for Intel. GNU was following the Fortran standard for specifying or not specifying pointers in OMP directives, Intel didn't (the Intel compiler developers confirmed that). They fixed it in 2021.2.0, but kept their compilers backward compatible. We left the workaround for Intel in place, because we had (and still have ... wcoss2 ...) systems with compilers older than 2021.2.0. If you are seeing the errors in the Is that the case? |
If I simply remove all variables that are 'an associate name' from all OMP directives code compiles without errors (I did not run the code yet!). Effectively all OMP directives for the gnu and non-gnu compilers are identical. |
@DusanJovic-NOAA @climbfuji - I can't believe Intel would change anything in |
@bensonr The errors I mentioned above are with ifx, not ifort. |
That would make sense to me! Time to add more logic to those ifdefs until WCOSS2 gets an updated software stack ;-) |
I expected those errors with |
So, if I understand correctly we need two different OMP directives, one without variables that are an associate names listed as shared for GNU and IntelLLVM >= 2024.2.0, and the other one with those variables listed as shared for all Intel Classic and IntelLLVM < 2024.2.0. Is my understanding correct? |
I think what you want is |
How about this: if ( hydrostatic ) then
-#ifdef __GFORTRAN__
+#if defined(__GFORTRAN__) || __INTEL_LLVM_COMPILER >= 20240200
!$OMP parallel do default(none) shared(is,ie,js,je,isd,ied,jsd,jed,npz,zvir,nwat,q,q_con,sphum,liq_wat, &
#else
!$OMP parallel do default(none) shared(is,ie,js,je,isd,ied,jsd,jed,npz,dp1,zvir,nwat,q,q_con,sphum,liq_wat, & |
Sorry for playing devil's advocate here. So you are saying the earlier versions of |
Yes. I started seeing these compile errors only when I tried using ifx 2024.2.0, which I have on my laptop. On Ursa test system (replacement for Hera) they have 2024.1.0 and I also did not see these errors on all other RDHPCS machines I tried hera(2021.5.1), orion(2021.9.0), hercules(2021.9.0), gaea (2023.1.0). Based on this I guess the behavior changed in ifx between 2024.1.0 and 2024.2.0. It would be nice to confirm this with Intel. |
Do we need these associate names? Can't we declare local pointers, point them to GFDL_interstitial arrays:
and just use one set of OMP directives. Do 'associate names' behave differently than pointers? |
I think they are both doing the same. But whether it makes a difference for how |
In this commit b947391 I made changes to fv_mapz and fv_dynamics in an attempt to simplify the OMP directives and make them work with GNU, Intel classic and Intel LLVM compilers In f_mapz: in fv_dynamic:
I ran few tests that use threading on wcoss2, and they passed, and full test passed on Hera (gnu and intel) |
@DusanJovic-NOAA - can you check the performance. Some of the things I found online seemed to indicate pointers might be slower than an associate. I don't understand why that would be, but... |
These are some of the regression test on Hera. Lines stating with '-' is current develop branch, lines starting with '+' are this branch. Second number is square brackets it wall clock time:
|
Looks good and thanks for confirming. |
All tests are done at ufs-community/ufs-weather-model#2224. @laurenchilutti @bensonr @lharris4 @XiaqiongZhou-NOAA this pr can be merged. |
Description
This PR adds support for Intel LLVM compiler.
Changes:
cmake/compiler_flags_IntelLLVM_Fortran.cmake
andcmake/fv3_compiler_flags.cmake
to define Intel LLVM compiler flagsdriver/fvGFS/atmosphere.F90
to read namelist from a file instead from a character variable to avoid internal compiler error.Fixes # (issue)
How Has This Been Tested?
ufs-weather-model regression test has been run on Hera. See ufs-community/ufs-weather-model#2224
Checklist:
Please check all whether they apply or not