Skip to content

Load gsi-specific modules on WCOSS2 at runtime#4052

Merged
DavidHuber-NOAA merged 18 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:fix/gsi
Sep 16, 2025
Merged

Load gsi-specific modules on WCOSS2 at runtime#4052
DavidHuber-NOAA merged 18 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:fix/gsi

Conversation

@DavidHuber-NOAA

@DavidHuber-NOAA DavidHuber-NOAA commented Sep 11, 2025

Copy link
Copy Markdown
Member

Description

This adds a lightweight module file for GSI/EnKF jobs to prevent library clashes.

Resolves #4044

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

How has this been tested?

  • Initial C96C48_hybatmDA gdas_anal job on WCOSS2.
  • Full suite of CI tests on WCOSS2
  • C1152 test on WCOSS2

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@DavidHuber-NOAA

Copy link
Copy Markdown
Member Author

@CoryMartin-NOAA When you get a chance, would you mind testing this out? I think it may be a lighter solution that does not have to unset any cray variables.

@RussTreadon-NOAA

Copy link
Copy Markdown
Contributor

WCOSS2 gw-ci
Install DavidHuber-NOAA:fix/gsi at 70bdfaf on Cactus. Run g-w CI with the following results

/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48_ATM_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103231200        Done    Sep 11 2025 23:13:16    Sep 12 2025 02:05:54
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48mx500_3DVarAOWCDA_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103241800        Done    Sep 11 2025 23:13:18    Sep 11 2025 23:26:02
202103250000        Done    Sep 11 2025 23:13:18    Sep 12 2025 02:11:05
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48mx500_hybAOWCDA_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103241800        Done    Sep 11 2025 23:13:20    Sep 11 2025 23:26:05
202103250000        Done    Sep 11 2025 23:13:20    Sep 12 2025 02:06:00
 
SKIP C48_S2SWA_gefs_RT on wcoss2
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48_S2SWA_gefs_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103231200        Done    Sep 11 2025 23:13:22    Sep 12 2025 00:26:07
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48_S2SW_extended_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103231200        Done    Sep 11 2025 23:13:25    Sep 12 2025 03:45:46
202103231800        Done    Sep 11 2025 23:13:25    Sep 12 2025 03:56:06
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48_S2SW_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103231200        Done    Sep 11 2025 23:13:27    Sep 12 2025 02:06:09
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96_atm3DVar_extended_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201800        Done    Sep 11 2025 23:13:29    Sep 11 2025 23:31:51
202112210000        Done    Sep 11 2025 23:13:29    Sep 12 2025 04:51:21
202112210600        Done    Sep 11 2025 23:13:29    Sep 12 2025 05:36:02
202112211200        Done    Sep 11 2025 23:36:32    Sep 12 2025 08:21:12
202112211800        Done    Sep 12 2025 04:56:04    Sep 12 2025 11:27:25
 
SKIP C96_atm3DVar on wcoss2
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_hybatmDA_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201800        Done    Sep 11 2025 23:13:31    Sep 11 2025 23:31:56
202112210000        Done    Sep 11 2025 23:13:31    Sep 12 2025 04:06:09
202112210600        Done    Sep 11 2025 23:13:31    Sep 12 2025 03:41:28
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_hybatmsnowDA_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201200        Done    Sep 11 2025 23:13:33    Sep 11 2025 23:32:02
202112201800        Done    Sep 11 2025 23:13:33    Sep 12 2025 03:56:24
202112210000        Done    Sep 11 2025 23:13:33    Sep 12 2025 03:41:31
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_hybatmsoilDA_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202205150600        Done    Sep 11 2025 23:13:35    Sep 11 2025 23:36:53
202205151200        Done    Sep 11 2025 23:13:35    Sep 12 2025 04:06:15
202205151800        Done    Sep 11 2025 23:13:35    Sep 12 2025 03:46:11
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48mx500_S2SW_cyc_gfs_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201200        Done    Sep 11 2025 23:13:37    Sep 11 2025 23:36:59
202112201800        Done    Sep 11 2025 23:13:37    Sep 12 2025 04:06:21
202112210000        Done    Sep 11 2025 23:13:37    Sep 12 2025 04:16:20
202112211800        Done    Sep 11 2025 23:42:28    Sep 12 2025 04:11:31
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_ufs_hybatmDA_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202402231800        Done    Sep 11 2025 23:13:40    Sep 11 2025 23:37:04
202402240000        Done    Sep 11 2025 23:13:40    Sep 12 2025 04:11:34
202402240600        Done    Sep 11 2025 23:13:40    Sep 12 2025 03:51:51
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96_gcafs_cycled_noDA_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201200        Done    Sep 11 2025 23:13:41    Sep 11 2025 23:37:07
202112201800        Done    Sep 11 2025 23:13:41    Sep 12 2025 03:46:24
202112210000        Done    Sep 11 2025 23:13:41    Sep 12 2025 02:11:44
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96_gcafs_cycled_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201200        Done    Sep 11 2025 23:13:43    Sep 11 2025 23:37:10
202112201800        Done    Sep 11 2025 23:13:43    Sep 12 2025 03:46:27
202112210000        Done    Sep 11 2025 23:13:43    Sep 12 2025 02:26:27
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96mx100_S2S_pr4052
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
199405010000        Done    Sep 11 2025 23:13:45    Sep 12 2025 00:57:12

A check of gsi.x minimizations across all cases found no occurrences of the message PCGSOI: WARNING **** Reset to steepest descent.

russ.treadon@clogin09:/lfs/h2/emc/ptmp/russ.treadon/COMROOT> grep "PCGSOI: WARNING" -r C*pr4052/logs/*/*anal*log | grep Reset | wc
      0       0       0

A check of log files from PR #4035 found 2585 occurrences of PCGSOI: WARNING **** Reset to steepest descent across all the cases run on WCOSS2.

russ.treadon@clogin09:/lfs/h2/emc/ptmp/russ.treadon/COMROOT> grep "PCGSOI: WARNING" -r C*pr4035/logs/*/*anal*log | grep Reset | wc
   2585   33600  631035

@DavidHuber-NOAA

Copy link
Copy Markdown
Member Author

Thank you for the test, Russ! Since there is a quick fix in for the retros already, I will expand this PR to cover other GSI, GSI-utils, and GSI-monitor executables. I will open PRs to those repos shortly. Adding the blocked label to this PR while submodules are updated.

@DavidHuber-NOAA DavidHuber-NOAA added the blocked Issue is currently being blocked by another issue. Include blocking issue # in description label Sep 12, 2025
@CoryMartin-NOAA

Copy link
Copy Markdown
Contributor

Thank you @DavidHuber-NOAA . One question, do we know for sure that only GSI is impacted by this LD_LIBRARY_PATH addition? Does this change model forecast results? I worry about the implications of what adding that to the runtime modulefile are given that we saw tests pass, but bad results, from GSI.

@DavidHuber-NOAA

DavidHuber-NOAA commented Sep 12, 2025

Copy link
Copy Markdown
Member Author

BongiEmail.txt
@CoryMartin-NOAA Good question. This was a bugfix that was identified by @bongi-NOAA that was needed for UFS-Utils. Reviewing the email thread (attached), I see that he recommended adding this bugfix at the build step. There was no specific guidance about it at the run step, so perhaps we should remove it from the runtime module file. I will do so in a test of this PR.

If it is needed for the UFS-utils executables at runtime, then we should only set it for those jobs.

@JessicaMeixner-NOAA

Copy link
Copy Markdown
Contributor

@CoryMartin-NOAA - the forecast model uses it's own module load: https://github.com/NOAA-EMC/global-workflow/blob/develop/dev/jobs/fcst.sh#L9-L13 however, there could be concern for other jobs that use workflow modules. Particularly if things were not carefully checked.

I can run a high res test on wcoss2 when we think we are ready for that.

@DavidHuber-NOAA DavidHuber-NOAA added needs submodule update Requires submodule PRs to be merged and removed blocked Issue is currently being blocked by another issue. Include blocking issue # in description labels Sep 12, 2025
@DavidHuber-NOAA

Copy link
Copy Markdown
Member Author

All CI tests passed on WCOSS2. I will now start opening submodule PRs.

@JessicaMeixner-NOAA

Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA - Ready for a high res test?

@DavidHuber-NOAA

Copy link
Copy Markdown
Member Author

@JessicaMeixner-NOAA Yes, please go ahead.

@RussTreadon-NOAA

Copy link
Copy Markdown
Contributor

hi-res tests are prudent before we get too far down the road.

@JessicaMeixner-NOAA

Copy link
Copy Markdown
Contributor

I have a C1152 retro-like test going. Its using ICs from rt13_upd01_stream3 I'm using this branch plus a minor configuration update for a marine job.

g-w clone: /lfs/h2/emc/couple/noscrub/jessica.meixner/gwpr4052/global-workflow
expdir: /lfs/h2/emc/couple/noscrub/jessica.meixner/gwpr4052/expdir/hirest01
comroot: /lfs/h2/emc/ptmp/jessica.meixner/comroot/hirest01

We should definitely have stuff by Monday - probably not before we should all sign off for the day though.

@JessicaMeixner-NOAA

Copy link
Copy Markdown
Contributor

For my high res test, the gdas analysis succeeded. @CatherineThomas-NOAA or @CoryMartin-NOAA can you check the logs as well:

/lfs/h2/emc/ptmp/jessica.meixner/comroot/hirest01/logs/2024122318/gfs_anal.log
/lfs/h2/emc/ptmp/jessica.meixner/comroot/hirest01/logs/2024122318/gdas_anal.log

There were failures related to marine DA which I'm hoping #4048 will help with. I'm going to start-up a new experiment combining those updates in as well.

@CoryMartin-NOAA

Copy link
Copy Markdown
Contributor

GSI minimization looks good in both the GFS and GDAS variational analyses

@DavidHuber-NOAA

Copy link
Copy Markdown
Member Author

Thanks @CoryMartin-NOAA and @JessicaMeixner-NOAA! I will go ahead and mark NOAA-EMC/GSI#931, NOAA-EMC/GSI-utils#85, and NOAA-EMC/GSI-Monitor#195 as ready for review.

@DavidHuber-NOAA

Copy link
Copy Markdown
Member Author

All submodule hashes now point at authoritative repository heads. Marking ready for review.

@DavidHuber-NOAA DavidHuber-NOAA marked this pull request as ready for review September 15, 2025 18:44
@emcbot emcbot added the CI-Ursa-Ready **CM use only** PR is ready for CI testing on Ursa label Sep 15, 2025

@aerorahul aerorahul left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@emcbot emcbot added CI-Ursa-Building **Bot use only** CI testing is cloning/building on Ursa and removed CI-Ursa-Ready **CM use only** PR is ready for CI testing on Ursa labels Sep 15, 2025

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @DavidHuber-NOAA for helping get this sorted out

@DavidHuber-NOAA

Copy link
Copy Markdown
Member Author

The Ursa test here is just a CI health check.

I think that only WCOSS2 is needed. This was completed before point to authoritative repositories. Does WCOSS2 need to be tested again? Or any other platforms?

@JessicaMeixner-NOAA

Copy link
Copy Markdown
Contributor

Myabe gaea c6 since we're targeting that for running retros too just to dot is and cross ts? Unless nothing changed for other platforms - it looks a little like it did (maybe thats just me).

@emcbot emcbot added CI-Ursa-Running **Bot use only** CI testing on Ursa for this PR is in-progress CI-Gaeac6-Ready **CM use only** PR is ready for CI testing on Gaea C6 CI-Gaeac6-Building **Bot use only** CI testing is cloning/building on Gaea C6 CI-Gaeac6-Running CI-Gaeac6-Passed **Bot use only** CI testing on Gaea C6 for this PR has completed successfully CI-Ursa-Passed **Bot use only** CI testing on Ursa for this PR has completed successfully and removed CI-Ursa-Building **Bot use only** CI testing is cloning/building on Ursa CI-Gaeac6-Ready **CM use only** PR is ready for CI testing on Gaea C6 CI-Gaeac6-Building **Bot use only** CI testing is cloning/building on Gaea C6 CI-Gaeac6-Running CI-Ursa-Running **Bot use only** CI testing on Ursa for this PR is in-progress labels Sep 15, 2025
@DavidHuber-NOAA

Copy link
Copy Markdown
Member Author

All tests passed. Merging.

@DavidHuber-NOAA DavidHuber-NOAA merged commit 0cfa8c4 into NOAA-EMC:develop Sep 16, 2025
5 checks passed
@DavidHuber-NOAA DavidHuber-NOAA deleted the fix/gsi branch September 22, 2025 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Gaeac6-Passed **Bot use only** CI testing on Gaea C6 for this PR has completed successfully CI-Ursa-Passed **Bot use only** CI testing on Ursa for this PR has completed successfully

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GSI runtime modules are a mess on WCOSS2

6 participants