Skip to content

Bypass NoPendingSlot limit for Merge job type to prevent blocking by …#12490

Open
hassan11196 wants to merge 1 commit intomasterfrom
jobsumitter-patch-merge-limit
Open

Bypass NoPendingSlot limit for Merge job type to prevent blocking by …#12490
hassan11196 wants to merge 1 commit intomasterfrom
jobsumitter-patch-merge-limit

Conversation

@hassan11196
Copy link
Copy Markdown
Member

This PR bypasses the pendingslot limit for jobs at a given site.

Fixes #12361

Status

In development

Description

This PR is a test change to bypass the NoPendingSlots limit in JobSubmitter.
For Example, The following site has reached its threshold of max Pending slots and cannot submit anymore jobs.

T2_CH_CERN - 1863 running, 9139 pending, 55934 running slots total, 6991 pending slots total, Site is Normal:
  Harvesting - 0 running, 0 pending, 1000 max running, 112 max pending, priority 5
  Merge - 0 running, 0 pending, 1000 max running, 112 max pending, priority 4
  Skim - 0 running, 0 pending, 1000 max running, 112 max pending, priority 3
  LogCollect - 0 running, 0 pending, 1000 max running, 112 max pending, priority 2
  Cleanup - 0 running, 0 pending, 1000 max running, 112 max pending, priority 1
  Processing - 0 running, 0 pending, 55934 max running, 6292 max pending, priority 0
  Production - 1863 running, 9139 pending, 55934 max running, 6292 max pending, priority 0
    1420 - /cmsunified_task_BPH-RunIISummer20UL18GEN-00297__v1_T_250903_174407_2621/BPH-RunIISummer20UL18GEN-00297_0
    8 - /cmsunified_task_EXO-RunIISummer20UL16wmLHEGEN-07759__v1_T_251219_221159_9607/EXO-RunIISummer20UL16wmLHEGEN-07759_0
    1 - /cmsunified_task_GEN-Run3Summer22EEwmLHEGS-00402__v1_T_251112_134101_6157/GEN-Run3Summer22EEwmLHEGS-00402_0
    42 - /cmsunified_task_GEN-Run3Summer22EEwmLHEGS-00578__v1_T_250902_212750_7636/GEN-Run3Summer22EEwmLHEGS-00578_0
    1 - /cmsunified_task_GEN-Run3Summer22wmLHEGS-00393__v1_T_250902_211403_894/GEN-Run3Summer22wmLHEGS-00393_0
    2 - /cmsunified_task_GEN-Run3Summer22wmLHEGS-00894__v1_T_250902_211427_7628/GEN-Run3Summer22wmLHEGS-00894_0
    24 - /cmsunified_task_GEN-Run3Summer23BPixwmLHEGS-00487__v1_T_250902_210945_8930/GEN-Run3Summer23BPixwmLHEGS-00487_0
    42 - /cmsunified_task_GEN-Run3Summer23BPixwmLHEGS-00543__v1_T_250902_213036_9637/GEN-Run3Summer23BPixwmLHEGS-00543_0
    100 - /cmsunified_task_GEN-Run3Summer23wmLHEGS-00337__v1_T_250902_211139_9857/GEN-Run3Summer23wmLHEGS-00337_0
    88 - /cmsunified_task_GEN-Run3Summer23wmLHEGS-00339__v1_T_250902_211617_3360/GEN-Run3Summer23wmLHEGS-00339_0
    28 - /cmsunified_task_GEN-Run3Summer23wmLHEGS-00519__v1_T_250902_211407_6148/GEN-Run3Summer23wmLHEGS-00519_0
    909 - /cmsunified_task_HIG-RunIII2024Summer24wmLHEGS-00943__v1_T_251209_221017_467/HIG-RunIII2024Summer24wmLHEGS-00943_0
    665 - /cmsunified_task_HIN-HINPbPbWinter24GSHIMix-00002__v1_T_250916_150033_4280/HIN-HINPbPbWinter24GSHIMix-00002_0
    4463 - /cmsunified_task_HIN-HINPbPbWinter24GSHIMix-00006__v1_T_250828_152535_8280/HIN-HINPbPbWinter24GSHIMix-00006_0
    1000 - /cmsunified_task_HIN-HINPbPbWinter24GSHIMix-00013__v1_T_250828_152620_6274/HIN-HINPbPbWinter24GSHIMix-00013_0
    33 - /cmsunified_task_HIN-HINPbPbWinter24GSHIMix-00018__v1_T_250828_152501_1462/HIN-HINPbPbWinter24GSHIMix-00018_0
    1000 - /cmsunified_task_HIN-HINPbPbWinter24GSHIMix-00019__v1_T_250828_152514_1644/HIN-HINPbPbWinter24GSHIMix-00019_0
    1050 - /cmsunified_task_HIN-HINPbPbWinter24GSHIMix-00025__v1_T_250828_152656_3412/HIN-HINPbPbWinter24GSHIMix-00025_0
    126 - /cmsunified_task_HIN-HINPbPbWinter24GSHIMix-00026__v1_T_250828_152726_5187/HIN-HINPbPbWinter24GSHIMix-00026_0

and the JobSubmitter has alot of merge jobs pending which cannot be submitted, despite the fact that the Merge pending job limit is not reached, in this case 0 Merge jobs are pending or running.

T2_CH_CERN : {"Merge": {"NoPendingSlot": 2678}, "LogCollect": {"NoPendingSlot": 22}, "Cleanup": {"NoPendingSlot": 139}, "Production": {"NoPendingSlot": 5045}}

This PR is to bypass the NoPendingSlot limit but keep the NoTaskPendingSlot limit so that the Merge jobs are not held in the agents due to Production jobs pending.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

#12415

External dependencies / deployment changes

No

Previous PR

#12471

@hassan11196
Copy link
Copy Markdown
Member Author

test this please

1 similar comment
@hassan11196
Copy link
Copy Markdown
Member Author

test this please

@dmwm-bot
Copy link
Copy Markdown

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 12 warnings and errors that must be fixed
    • 2 warnings
    • 36 comments to review
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/1313/artifact/artifacts/PullRequestReport.html

hassan11196 added a commit that referenced this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Relax merge job thresholds

2 participants