Skip to content

[Bug]: e3sm_diags has non-reproducible, intermittent hang #822

Description

@forsyth2

What happened?

While running 2026-05-12 Chrysalis E3SM Unified 1.13.0rc10, post-fix (zppy v3.2.0rc2), two chrysalis cfgs produced one hanging job each: legacy_3.1.0_comprehensive_v2_e3sm_diags & legacy_3.0.0_comprehensive_v2_e3sm_diags. Specifically, the hanging jobs were for e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1982-1983. They appeared to run for 45 minutes and then just hang until the 5-hour time limit was reached.

However, after re-launching the jobs, they ran to completion, so the hang appears intermittent.

What machine were you running on?

Chrysalis.

Environment

E3SM Unified 1.130rc10 (after load-script fix)

What command did you run?

zppy -c <test cfg>

Copy your cfg file

Two cfgs, see linked discussion post

What jobs are failing?

What stack trace are you encountering?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Non-reproducible bugBug that can't be reproduced consistently

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions