Skip to content

[Bug]: Issues in zppy e3sm_diags workflow variable handling for global rgn_avg section #824

Description

@zhangshixuan1987

What happened?

I encountered several related issues in the zppy [ts] and [e3sm_diags workflow], especially for the EAMxx model output.

One specific issue appears in zppy/zppy/templates/ts.bash script logic for mapping_file = glb:

{% if mapping_file == 'glb' -%}
vars={{ vars }}
# Remove U, since it is a 3D variable and thus will not work with rgn_avg
vars=${vars//,U}
{%- else %}
vars={{ vars }}
{%- endif %}

This logic removes U because it is a 3D variable and does not work with rgn_avg. However, this command line is fundamentally problematic because
(1) the workflow may still include other 3D variables, such as V, that are also incompatible with rgn_avg. As a result, only U is filtered while other incompatible variables may still be passed into the global regional-average step and cause failures.
(2) When addressing EAMxx model output, the variable list such as "surf_evap,U_at_10m_above_surface," can be unintentionally transformed into "surf_evap_at_10m_above_surface", which triggers the following errors on my test:

ncclimo: ERROR Failed to split. cmd_sbs[16] failed. Debug this:
OMP_PROC_BIND=false ncrcat -O -v surf_evap_at_10m_above_surface,area,lat --no_tmp_fl --hdr_pad=10000 -p /pscratch/sd/z/zhan391/e3smv4_project/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1/post/scripts/tmp.ts_atm_monthly_glb_1995-1999-0005.52950621.2y7s 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-01-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-02-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-03-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-04-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-05-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-06-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-07-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-08-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-09-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-10-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-11-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1995-12-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-01-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-02-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-03-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-04-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-05-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-06-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-07-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-08-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-09-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-10-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-11-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1996-12-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-01-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-02-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-03-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-04-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-05-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-06-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-07-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-08-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-09-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-10-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-11-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1997-12-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-01-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-02-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-03-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-04-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-05-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-06-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-07-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-08-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-09-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-10-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-11-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1998-12-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-01-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-02-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-03-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-04-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-05-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-06-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-07-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-08-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-09-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-10-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-11-01-00000.nc 1ma_ne30pg2.AVERAGE.nmonths_x1.1999-12-01-00000.nc output/surf_evap_at_10m_above_surface_199501_199912.nc

This suggests a broader issue in the zppy/e3sm_diags workflow variable handling in the global average section. A more robust treatment should be implemented here.

What machine were you running on?

I am running on Perlmutter but the bug should not depend on the specific machine.

Environment

I am using zppy master branch

What command did you run?

cd /pscratch/sd/z/zhan391/e3smv4_project/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1

zppy -c post.eamxx.diag.cfg

Copy your cfg file

# Directions to run:
# 1. Update <output>, <www>, <environment_commands_secondary> below.
# 2. Run with `zppy -c examples/post.v3.LR.amip.0101.cfg`.
# Direction to create stand-alone test data for zppy-interfaces:
# 3. Once the jobs finish, `cd <output>/post/scripts`.
# 4. Run `grep -n "Running a zi-pcmdi command" pcmdi_diags*.o*` to find the pcmdi_diags commands.
# 5. Then, you can run those lines stand-alone.
[default]
input = /pscratch/sd/z/zhan391/e3smv4_project/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1
output = /pscratch/sd/z/zhan391/e3smv4_project/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1
case = ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1
www = /global/cfs/cdirs/e3sm/www/zhan391/eamxx-pcmdi

partition = "debug"
account = "e3sm"
#account = "priority"
campaign = "water_cycle"
debug = False
environment_commands = "source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh"


[climo]
active = True
walltime = "2:00:00"
years = "1995:2004:10",
# Another example of `years`:
# years = "1985:2014:30", "1985:2014:15"

  [[ atm_monthly_180x360_aave ]]
  # The following e3sm_diags sets require it:
  # "lat_lon", "zonal_mean_xy", "zonal_mean_2d", "polar", "cosp_histogram", "meridional_mean_2d", "annual_cycle_zonal_mean", "zonal_mean_2d_stratosphere" "aerosol_aeronet", "aerosol_budget"
  input_component = "eamxx"
  #cmip_plevdata = "/lcrc/group/e3sm/diagnostics/e3sm_to_cmip_data/grids/vrt_remap_plev19.nc"
  case = "1ma_ne30pg2"
  input_files = "AVERAGE.nmonths_x1"
  frequency = "monthly"
  #input_subdir = "archive/atm/hist"
  input_subdir = "run/"
  mapping_file = /global/cfs/cdirs/e3sm/diagnostics/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc

  [[ atm_monthly_diurnal_8xdaily_180x360_aave ]]
  # The following e3sm_diags sets require it:
  # "diurnal_cycle"
  input_component = "eamxx"
  #cmip_plevdata = "/lcrc/group/e3sm/diagnostics/e3sm_to_cmip_data/grids/vrt_remap_plev19.nc"
  case = "3ha_ne30pg2"
  input_files = "AVERAGE.nhours_x3"
  #input_subdir = "archive/atm/hist"
  input_subdir = "run/"
  frequency = "diurnal_8xdaily"
  mapping_file = /global/cfs/cdirs/e3sm/diagnostics/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
  vars = "precip_liq_surf_mass_flux,precip_ice_surf_mass_flux"

  [[ land_monthly_climo ]]
  active = True
  # This subtask is a dependency for the e3sm_diags task's lnd_monthly_mvm_lnd subtask.
  # The following e3sm_diags sets require it:
  # "lat_lon_land",
  input_component = "elm"
  #note: if not specify case then the default will be used 
  frequency = "monthly"
  input_files = "elm.h0"
  #input_subdir = archive/lnd/hist
  input_subdir = "run/"
  vars = "" # Setting this as "" will tell zppy to use ALL variables

  [[ land_monthly_180x360_traave ]]
  active = True
  input_component = "elm"
  #note: if not specify case then the default will be used 
  frequency = "monthly"
  input_files = "elm.h0"
  #input_subdir = "archive/lnd/hist"
  input_subdir = "run/"
  mapping_file = "/global/cfs/cdirs/e3sm/diagnostics/maps/map_ne256pg2_to_cmip6_180x360_traave.20250301.nc"
  vars = ""

[ts]
active = True
walltime = "01:00:00"
years = "1995:2004:5"
ts_num_years=5


  [[ atm_daily_180x360_aave ]]
  active = True
  # This subtask is a dependency for the e3sm_diags task's atm_monthly_180x360 subtask.
  # The following e3sm_diags sets require it:
  # "tropical_subseasonal", "precip_pdf"
  input_component = "eamxx"
  case = "1da_ne30pg2"
  input_files = "AVERAGE.ndays_x1"
  frequency = "daily"
  #input_subdir = "archive/atm/hist"
  input_subdir = "run/"
  mapping_file = /global/cfs/cdirs/e3sm/diagnostics/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
  # Needed for Wheeler Kiladis
  vars = "LW_flux_up_at_model_top,precip_liq_surf_mass_flux,precip_ice_surf_mass_flux,U_at_850hPa"
  [[ atm_monthly_glb ]]
  active = True
  # This subtask is a dependency for the global_time_series task.
  input_component = "eam"
  #input_subdir = "archive/atm/hist"
  input_subdir = "run/"
  case = "1ma_ne30pg2"
  input_files = "AVERAGE.nmonths_x1"
  frequency = "monthly"
  mapping_file = "glb"
  vars="ps,surf_radiative_T,SeaLevelPressure,IceWaterPath,qv_2m,precip_liq_surf_mass_flux,precip_ice_surf_mass_flux"
  #vars="omega_at_500hPa,omega_at_700hPa,omega_at_850hPa,T_mid_at_700hPa,T_2m,surface_upward_latent_heat_flux,surf_sens_flux,z_mid_at_700hPa,wind_speed_10m,surf_evap,U_at_10m_above_surface,V_at_10m_above_surface,LW_clrsky_flux_dn_at_model_bot,LW_clrsky_flux_up_at_model_top,LW_flux_dn_at_model_bot,LW_flux_up_at_model_bot,LW_flux_up_at_model_top,SW_clrsky_flux_dn_at_model_bot,SW_clrsky_flux_dn_at_model_top,SW_clrsky_flux_up_at_model_bot,SW_clrsky_flux_up_at_model_top,SW_flux_dn_at_model_bot,SW_flux_dn_at_model_top,SW_flux_up_at_model_bot,SW_flux_up_at_model_top,ShortwaveCloudForcing,LongwaveCloudForcing,isccp_cldtot"

  [[ land_monthly ]]
  active = True
  # This subtask is a dependency for the e3sm_to_cmip task's land_monthly subtask.
  input_component = "elm"
  frequency = "monthly"
  input_files = "elm.h0"
  #input_subdir = "archive/lnd/hist"
  input_subdir = "run/"
  mapping_file = "/global/cfs/cdirs/e3sm/diagnostics/maps/map_ne256pg2_to_cmip6_180x360_traave.20250301.nc"
  # Variables:
  #vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILICE,SOILLIQ,SOILWATER_10CM,TSA,TSOI,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
  vars = "SOILWATER_10CM"
  extra_vars = "landfrac"

  [[ lnd_monthly_glb ]]
  active = True
  # This subtask is a dependency for the global_time_series task.
  input_component = "elm"
  frequency = "monthly"
  input_files = "elm.h0"
  #input_subdir = "archive/lnd/hist"
  input_subdir = "run/"
  mapping_file = "glb"
  job_nbr = 50 # This reduces paralllel processes in ncclimo time-series splitting for memory management.
  #vars = "" # This will tell zppy to use all available variables.
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"

  [[ land_monthly_energy ]]
  active = True
  input_component = "elm"
  frequency = "monthly"
  input_files = "elm.h0"
  #input_subdir = "archive/lnd/hist"
  input_subdir = "run/"
  mapping_file = ""
  vars = "EFLX_LH_TOT,FIRA,FLDS,FSA,FSDS,FSRND,FSRVD,FSDSND,FSDSVD,FSH,TSA"


  [[ rof_monthly ]]
  active = True
  # The following e3sm_diags sets require it:
  # "streamflow"
  input_component = "mosart"
  frequency = "monthly"
  input_files = "mosart.h0"
  #input_subdir = "archive/lnd/hist"
  input_subdir = "run/"
  mapping_file = ""
  # Variables:
  vars = "RIVER_DISCHARGE_OVER_LAND_LIQ"
  extra_vars = 'areatotal2'

[e3sm_to_cmip]
active = True
frequency = "monthly"
ts_grid = "180x360_aave"
ts_num_years=5
walltime = "00:10:00"
years = "1995:2004:5"
environment_commands = "source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh; conda activate zi-pcmdi-diags"

  [[ atm_2d_monthly_180x360_aave ]]
  input_component = "eamxx"
  #cmip_plevdata = "/lcrc/group/e3sm/diagnostics/e3sm_to_cmip_data/grids/vrt_remap_plev19.nc"
  case = "1ma_ne30pg2"
  input_files = "AVERAGE.nmonths_x1"
  ts_subsection = "atm_2d_monthly_180x360_aave"
  vars="ps,surf_radiative_T,SeaLevelPressure,IceWaterPath,qv_2m,precip_liq_surf_mass_flux,precip_ice_surf_mass_flux,omega_at_500hPa,omega_at_700hPa,omega_at_850hPa,T_mid_at_700hPa,T_2m,surface_upward_latent_heat_flux,surf_sens_flux,z_mid_at_700hPa,wind_speed_10m,surf_evap,U_at_10m_above_surface,V_at_10m_above_surface,LW_clrsky_flux_dn_at_model_bot,LW_clrsky_flux_up_at_model_top,LW_flux_dn_at_model_bot,LW_flux_up_at_model_bot,LW_flux_up_at_model_top,SW_clrsky_flux_dn_at_model_bot,SW_clrsky_flux_dn_at_model_top,SW_clrsky_flux_up_at_model_bot,SW_clrsky_flux_up_at_model_top,SW_flux_dn_at_model_bot,SW_flux_dn_at_model_top,SW_flux_up_at_model_bot,SW_flux_up_at_model_top,ShortwaveCloudForcing,LongwaveCloudForcing,isccp_cldtot"
  cmip_vars = "pr,cltisccp,evspsbl,hfls,hfss,huss,ps,psl,rlds,rldscs,rlus,rlut,rlutcs,rsds,rsdscs,rsdt,rsus,rsuscs,rtmt,uas,vas,sfcWind,tas,ts"

  [[ atm_3d_monthly_180x360_aave ]]
  input_component = "eamxx"
  #cmip_plevdata = "/lcrc/group/e3sm/diagnostics/e3sm_to_cmip_data/grids/vrt_remap_plev19.nc"
  case = "1ma_ne30pg2"
  input_files = "AVERAGE.nmonths_x1"
  interp_vars = "U,V,T_mid,z_mid,omega,RelativeHumidity,p_mid,qv"
  ts_subsection = "atm_3d_monthly_180x360_aave"
  vars="U,V,T_mid,z_mid,omega,RelativeHumidity,p_mid,qv"
  cmip_vars = "ta,ua,va,zg"

  [[ land_monthly ]]
  active = True
  # This subtask is a dependency for the ilamb task.
  # This subtask depends on the ts task's land_monthly subtask.
  # Notice this subtask name matches a subtask in the `ts` task.
  # If it did not, then the `ts_land_subsection` parameter would be required here to tell zppy which subtask to use.
  input_component = "elm"
  ts_grid = "180x360_traave"
  ts_land_subsection = "land_monthly"
  frequency = "monthly"
  input_files = "elm.h0"
  cmip_vars = "mrsos"

[e3sm_diags]
active = True
multiprocessing = True
num_workers = 8
ref_final_yr = 1995
ref_start_yr = 2004
ts_num_years = 5
walltime = "4:00:00"
years = "1995:2004:10",
environment_commands = "source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh"

  [[ atm_monthly_180x360_aave ]]
  # `e3sm_diags` is largely driven by which e3sm_diags sets are requested:
  sets="lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","enso_diags","qbo","diurnal_cycle","zonal_mean_2d_stratosphere","aerosol_aeronet","mp_partition","tropical_subseasonal","precip_pdf","tc_analysis","streamflow",
  climo_diurnal_frequency = "diurnal_8xdaily"
  climo_diurnal_subsection = "atm_monthly_diurnal_8xdaily_180x360_aave"
  ts_daily_subsection = "atm_daily_180x360_aave"
  grid = '180x360_aave'
  short_name = 'e3sm.amip.EAMXX.test2_1'

  [[ lnd_monthly_mvm_lnd ]]
  # Depends on the climo task's land_monthly_climo subtask.
  sets = "lat_lon_land",
  climo_subsection = "land_monthly_climo"
  # Other parameters:
  diff_title = "Difference"
  grid = 'native'
  # The reference_data_path should point to pre-computed climatology files from a nclimo/zppy run
  reference_data_path = "/pscratch/sd/z/zhan391/e3smv4_project/20250906.wcycl1850.ne120pg2_r025_RRSwISC6to18E3r5.test6.1.chrysalis/post/lnd/native/clim"
  ref_name = "20250906.wcycl1850.ne120pg2_r025_RRSwISC6to18E3r5.test6.1.chrysalis"
  ref_final_yr = 96
  ref_start_yr = 105
  ref_years = "96-105",
  run_type = "model_vs_model"
  short_name = "e3sm.amip.EAMXX.test2_1"
  short_ref_name = "v3.HR.piControl-test6.1"
  swap_test_ref = False
  tag = "model_vs_model"

What jobs are failing?

The failure only occurred in 

/pscratch/sd/z/zhan391/e3smv4_project/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1/post/scripts/ts_atm_monthly_glb_1995-1999-0005.bash

/pscratch/sd/z/zhan391/e3smv4_project/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1/post/scripts/ts_atm_monthly_glb_2000-2004-0005.bash

but currently success as I fixed the issue by removing "vars=${vars//,U}" line

What stack trace are you encountering?

Metadata

Metadata

Assignees

No one assigned

    Labels

    semver: bugBug fix (will increment patch version)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions