Skip to content

Commit

Permalink
Reinstate product groups (NOAA-EMC#3208)
Browse files Browse the repository at this point in the history
Testing with full-sized GEFS found that the sheer number of tasks
overloads rocoto, resulting in `rocotorun` taking over 10 min to
complete or hanging entirely. To reduce the number of tasks, product
groups are reimplemented so that multiple forecast hour are processed in
a single task. However, the implementation is a little different than
previously.

The jobs where groups are enabled (atmos_products, oceanice_products,
wavepostsbs, atmos_ensstat, and gempak) have a new variable,
`MAX_TASKS`, that controls how many groups to use. This setting is
currently *per member*. The forecast hours to be processed are then
divided into this many groups as evenly as possible without crossing
forecast segment boundaries. The walltime for those jobs is then
multiplied by the number of times in the largest group. For the gridded
wave post job, the dependencies were also updated to trigger off of
either the data being available or the appropriate segment completing
(the dependencies had not been updated when the job was initially broken
into fhrs).

A number of helper methods are added to Tasks to determine these groups
and make a standard metatask variable dict in a centralized location.
There is also a function to multiply the walltime, but this may be
better off relocated to wxflow with the other time functions.

As part of switching from a single value to a list, hours are no longer
passed by rocoto as zero-padded values. The lists are comma-delimited
(without spaces) and split apart in the job stub (`jobs/rocoto/*`), so
each j-job call is still a single forecast hour.

The offline post (upp) job is not broken into groups, since it really
isn't used outside the analysis anymore.

Resolves NOAA-EMC#2999
Resolves NOAA-EMC#3210
  • Loading branch information
WalterKolczynski-NOAA authored Jan 15, 2025
1 parent aea82a8 commit e27f5df
Show file tree
Hide file tree
Showing 22 changed files with 431 additions and 126 deletions.
21 changes: 14 additions & 7 deletions jobs/rocoto/atmos_ensstat.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,20 @@ status=$?
if (( status != 0 )); then exit "${status}"; fi

export job="atmos_ensstat"
export jobid="${job}.$$"

export FORECAST_HOUR=$(( 10#${FHR3} ))
# shellcheck disable=SC2153
IFS=', ' read -r -a fhr_list <<< "${FHR_LIST}"

###############################################################
# Execute the JJOB
###############################################################
"${HOMEgfs}/jobs/JGLOBAL_ATMOS_ENSSTAT"
export FORECAST_HOUR jobid
for FORECAST_HOUR in "${fhr_list[@]}"; do
fhr3=$(printf '%03d' "${FORECAST_HOUR}")
jobid="${job}_f${fhr3}.$$"
###############################################################
# Execute the JJOB
###############################################################
"${HOMEgfs}/jobs/JGLOBAL_ATMOS_ENSSTAT"
status=$?
[[ ${status} -ne 0 ]] && exit "${status}"
done

exit $?
exit 0
23 changes: 14 additions & 9 deletions jobs/rocoto/atmos_products.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,20 @@ status=$?
if (( status != 0 )); then exit "${status}"; fi

export job="atmos_products"
export jobid="${job}.$$"

# Negatation needs to be before the base
fhr3_base="10#${FHR3}"
export FORECAST_HOUR=$(( ${fhr3_base/10#-/-10#} ))
# shellcheck disable=SC2153
IFS=', ' read -r -a fhr_list <<< "${FHR_LIST}"

###############################################################
# Execute the JJOB
###############################################################
"${HOMEgfs}/jobs/JGLOBAL_ATMOS_PRODUCTS"
export FORECAST_HOUR jobid
for FORECAST_HOUR in "${fhr_list[@]}"; do
fhr3=$(printf '%03d' "${FORECAST_HOUR}")
jobid="${job}_f${fhr3}.$$"
###############################################################
# Execute the JJOB
###############################################################
"${HOMEgfs}/jobs/JGLOBAL_ATMOS_PRODUCTS"
status=$?
[[ ${status} -ne 0 ]] && exit "${status}"
done

exit $?
exit 0
19 changes: 14 additions & 5 deletions jobs/rocoto/gempak.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,20 @@ status=$?
if (( status != 0 )); then exit "${status}"; fi

export job="gempak"
export jobid="${job}.$$"

# shellcheck disable=SC2153
IFS=', ' read -r -a fhr_list <<< "${FHR_LIST}"

# Execute the JJOB
"${HOMEgfs}/jobs/J${RUN^^}_ATMOS_GEMPAK"
export FHR3 jobid
for fhr in "${fhr_list[@]}"; do
FHR3=$(printf '%03d' "${fhr}")
jobid="${job}_f${FHR3}.$$"
###############################################################
# Execute the JJOB
###############################################################
"${HOMEgfs}/jobs/J${RUN^^}_ATMOS_GEMPAK"
status=$?
[[ ${status} -ne 0 ]] && exit "${status}"
done

status=$?
exit "${status}"
exit 0
21 changes: 14 additions & 7 deletions jobs/rocoto/oceanice_products.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,20 @@ status=$?
if (( status != 0 )); then exit "${status}"; fi

export job="oceanice_products"
export jobid="${job}.$$"

export FORECAST_HOUR=$(( 10#${FHR3} ))
# shellcheck disable=SC2153
IFS=', ' read -r -a fhr_list <<< "${FHR_LIST}"

###############################################################
# Execute the JJOB
###############################################################
"${HOMEgfs}/jobs/JGLOBAL_OCEANICE_PRODUCTS"
export FORECAST_HOUR jobid
for FORECAST_HOUR in "${fhr_list[@]}"; do
fhr3=$(printf '%03d' "${FORECAST_HOUR}")
jobid="${job}_${COMPONENT}_f${fhr3}.$$"
###############################################################
# Execute the JJOB
###############################################################
"${HOMEgfs}/jobs/JGLOBAL_OCEANICE_PRODUCTS"
status=$?
[[ ${status} -ne 0 ]] && exit "${status}"
done

exit $?
exit 0
21 changes: 14 additions & 7 deletions jobs/rocoto/wavepostsbs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,24 @@ source "${HOMEgfs}/ush/preamble.sh"
###############################################################
# Source FV3GFS workflow modules
#. ${HOMEgfs}/ush/load_fv3gfs_modules.sh
. ${HOMEgfs}/ush/load_ufswm_modules.sh
source "${HOMEgfs}/ush/load_ufswm_modules.sh"
status=$?
[[ ${status} -ne 0 ]] && exit ${status}
[[ ${status} -ne 0 ]] && exit "${status}"

export job="wavepostsbs"
export jobid="${job}.$$"

###############################################################
# Execute the JJOB
${HOMEgfs}/jobs/JGLOBAL_WAVE_POST_SBS
status=$?
[[ ${status} -ne 0 ]] && exit ${status}
# shellcheck disable=SC2153
IFS=', ' read -r -a fhr_list <<< "${FHR_LIST}"

export FHR3 jobid
for FORECAST_HOUR in "${fhr_list[@]}"; do
FHR3=$(printf '%03d' "${FORECAST_HOUR}")
jobid="${job}_f${FHR3}.$$"
# Execute the JJOB
"${HOMEgfs}/jobs/JGLOBAL_WAVE_POST_SBS"
status=$?
[[ ${status} -ne 0 ]] && exit "${status}"
done

exit 0
3 changes: 3 additions & 0 deletions parm/config/gefs/config.atmos_ensstat
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@

echo "BEGIN: config.atmos_ensstat"

# Maximum number of rocoto tasks
export MAX_TASKS=25

# Get task specific resources
. "${EXPDIR}/config.resources" atmos_ensstat

Expand Down
4 changes: 2 additions & 2 deletions parm/config/gefs/config.atmos_products
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ echo "BEGIN: config.atmos_products"
# Get task specific resources
. "${EXPDIR}/config.resources" atmos_products

# No. of forecast hours to process in a single job
export NFHRS_PER_GROUP=3
# Maximum number of rocoto tasks per member
export MAX_TASKS=25

# Scripts used by this job
export INTERP_ATMOS_MASTERSH="${USHgfs}/interp_atmos_master.sh"
Expand Down
4 changes: 2 additions & 2 deletions parm/config/gefs/config.oceanice_products
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ source "${EXPDIR}/config.resources" oceanice_products

export OCEANICEPRODUCTS_CONFIG="${PARMgfs}/post/oceanice_products_gefs.yaml"

# No. of forecast hours to process in a single job
export NFHRS_PER_GROUP=3
# Maximum number of rocoto tasks per member
export MAX_TASKS=25

echo "END: config.oceanice_products"
10 changes: 7 additions & 3 deletions parm/config/gefs/config.resources
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,7 @@ case ${step} in
;;

"atmos_products")
# Walltime is per forecast hour; will be multipled by group size
export walltime="00:15:00"
export ntasks=24
export threads_per_task=1
Expand All @@ -242,14 +243,16 @@ case ${step} in
;;

"atmos_ensstat")
export walltime="00:30:00"
# Walltime is per forecast hour; will be multipled by group size
export walltime="00:15:00"
export ntasks=6
export threads_per_task=1
export tasks_per_node="${ntasks}"
export is_exclusive=True
;;

"oceanice_products")
# Walltime is per forecast hour; will be multipled by group size
export walltime="00:15:00"
export ntasks=1
export tasks_per_node=1
Expand All @@ -258,7 +261,8 @@ case ${step} in
;;

"wavepostsbs")
export walltime="03:00:00"
# Walltime is per forecast hour; will be multipled by group size
export walltime="00:15:00"
export ntasks=1
export threads_per_task=1
export tasks_per_node=$(( max_tasks_per_node / threads_per_task ))
Expand Down Expand Up @@ -328,7 +332,7 @@ case ${step} in
;;

"cleanup")
export walltime="00:15:00"
export walltime="00:30:00"
export ntasks=1
export tasks_per_node=1
export threads_per_task=1
Expand Down
3 changes: 3 additions & 0 deletions parm/config/gefs/config.wavepostsbs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ echo "BEGIN: config.wavepostsbs"
# Get task specific resources
source "${EXPDIR}/config.resources" wavepostsbs

# Maximum number of rocoto tasks per member
export MAX_TASKS=25

# Subgrid info for grib2 encoding
export WAV_SUBGRBSRC=""
export WAV_SUBGRB=""
Expand Down
4 changes: 2 additions & 2 deletions parm/config/gfs/config.atmos_products
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ echo "BEGIN: config.atmos_products"
# Get task specific resources
. "${EXPDIR}/config.resources" atmos_products

# No. of forecast hours to process in a single job
export NFHRS_PER_GROUP=3
## Maximum number of rocoto tasks per member
export MAX_TASKS=25

# Scripts used by this job
export INTERP_ATMOS_MASTERSH="${USHgfs}/interp_atmos_master.sh"
Expand Down
5 changes: 4 additions & 1 deletion parm/config/gfs/config.gempak
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@

echo "BEGIN: config.gempak"

# Maximum number of rocoto tasks per member
export MAX_TASKS=25

# Get task specific resources
. $EXPDIR/config.resources gempak
source "${EXPDIR}/config.resources" gempak

echo "END: config.gempak"
3 changes: 3 additions & 0 deletions parm/config/gfs/config.oceanice_products
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ echo "BEGIN: config.oceanice_products"
# Get task specific resources
source "${EXPDIR}/config.resources" oceanice_products

# Maximum number of rocoto tasks per member
export MAX_TASKS=25

export OCEANICEPRODUCTS_CONFIG="${PARMgfs}/post/oceanice_products.yaml"

# No. of forecast hours to process in a single job
Expand Down
8 changes: 6 additions & 2 deletions parm/config/gfs/config.resources
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,9 @@ case ${step} in
;;

"wavepostsbs")
walltime_gdas="00:20:00"
walltime_gfs="03:00:00"
# Walltime is per forecast hour; will be multipled by group size
walltime_gdas="00:15:00"
walltime_gfs="00:15:00"
ntasks=8
threads_per_task=1
tasks_per_node=$(( max_tasks_per_node / threads_per_task ))
Expand Down Expand Up @@ -911,6 +912,7 @@ case ${step} in
;;

"oceanice_products")
# Walltime is per forecast hour; will be multipled by group size
walltime="00:15:00"
ntasks=1
tasks_per_node=1
Expand Down Expand Up @@ -949,6 +951,7 @@ case ${step} in
;;

"atmos_products")
# Walltime is per forecast hour; will be multipled by group size
walltime="00:15:00"
ntasks=24
threads_per_task=1
Expand Down Expand Up @@ -1278,6 +1281,7 @@ case ${step} in
;;

"gempak")
# Walltime is per forecast hour; will be multipled by group size
walltime="00:30:00"
ntasks_gdas=2
ntasks_gfs=28
Expand Down
3 changes: 3 additions & 0 deletions parm/config/gfs/config.wavepostsbs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ echo "BEGIN: config.wavepostsbs"
# Get task specific resources
source "${EXPDIR}/config.resources" wavepostsbs

# Maximum number of rocoto tasks per member
export MAX_TASKS=25

# Subgrid info for grib2 encoding
export WAV_SUBGRBSRC=""
export WAV_SUBGRB=""
Expand Down
Loading

0 comments on commit e27f5df

Please sign in to comment.