Skip to content

fix: pass explicit 60s timeout to get_background_job in poll_job_completion#1206

Merged
rasdani merged 4 commits intomainfrom
fix/poll-job-completion-timeout
Apr 20, 2026
Merged

fix: pass explicit 60s timeout to get_background_job in poll_job_completion#1206
rasdani merged 4 commits intomainfrom
fix/poll-job-completion-timeout

Conversation

@rasdani
Copy link
Copy Markdown
Contributor

@rasdani rasdani commented Apr 20, 2026

Summary

CliAgentEnv.poll_job_completion polls background-job status via self.sandbox_client.get_background_job(...). Without an explicit timeout, that call inherits the prime-sandboxes APIClient default of 30s (httpx.Timeout(30.0, connect=10.0)). When the sandbox gateway is briefly slow on the status read, httpx.ReadTimeout fires and poll_job_completion wraps it as AgentError('Agent polling failed: Read file timed out after 30s: /tmp/job_XXX.exit').

Observed ~51 such failures on one 30B opencode-math run.

Change

One line, one kwarg:

status: BackgroundJobStatus = await self.sandbox_client.get_background_job(
    sandbox_id, background_job, timeout=60.0
)

Why 60s (and not higher)?

  • 60s doubles the tolerance for a transient gateway blip (which is the actual observed failure mode — short stalls, not permanently-dead reads).
  • A much larger value would let a truly stuck gateway wedge the rollout for a long time before the outer retry/timeout machinery gets a chance to intervene; 60s is long enough to absorb a blip, short enough to fail fast on a real hang.
  • Raising the APIClient default globally would also affect unrelated fast lookups — this keeps the change scoped to the known-slow path.

Dependency

Requires prime-sandboxes PR PrimeIntellect-ai/prime#540 (merged), which adds the timeout kwarg to get_background_job. Consumed via prime-sandboxes release PR PrimeIntellect-ai/prime#541 (bumps to 0.2.21).

The pyproject.toml constraint has been bumped to prime-sandboxes>=0.2.21 (see commit 874faa8f). Lockfile update is deferred until the 0.2.21 release lands on the index — uv lock currently fails to resolve because 0.2.21 is not yet published. Once PR #541 merges and the package is published, a follow-up commit will run uv lock to pin the resolved version.

Test plan

  • uv run ruff check verifiers/envs/experimental/cli_agent_env.py — clean.
  • uv run ruff format --check verifiers/envs/experimental/cli_agent_env.py — clean.
  • uv run pytest tests/test_cli_agent_env.py — 17 passed. (No existing coverage of poll_job_completion; this change is a kwarg passthrough, so no new tests added.)
  • Pre-commit hooks (ruff check, ruff format, ty) passed on commit and push.

🤖 Generated with Claude Code


Note

Low Risk
Small, localized change to background-job polling behavior plus a minor dependency bump; main risk is compatibility/regression if the new prime-sandboxes timeout handling behaves unexpectedly.

Overview
Reduces transient ReadTimeout failures when polling sandbox background-job status by passing an explicit timeout=60.0 to sandbox_client.get_background_job() in CliAgentEnv.poll_job_completion.

Bumps prime-sandboxes dependency to >=0.2.21 to pick up the new timeout parameter support.

Reviewed by Cursor Bugbot for commit 9f807c5. Bugbot is set up for automated code reviews on this repo. Configure here.

rasdani and others added 3 commits April 20, 2026 14:11
Before: each poll iteration inherited the prime-sandboxes APIClient
default of 30s. When the sandbox gateway was briefly slow (>30s for
the background-job status read), the httpx read-timeout fired and
poll_job_completion wrapped it as AgentError — observed ~51 times on
one 30B opencode-math run.

Pass timeout=60.0 explicitly. Doubles the tolerance for a transient
gateway blip without globally bumping the APIClient default (which
would affect unrelated fast lookups).

Requires prime-sandboxes PR adding the timeout kwarg to
get_background_job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… kwarg

Consumes prime-sandboxes PR #540 (PrimeIntellect-ai/prime#540). The
timeout=60.0 kwarg added in the previous commit requires the new
signature; this bump locks in the minimum version.

Lockfile update deferred until the prime-sandboxes 0.2.21 release
(PrimeIntellect-ai/prime#541) lands on the index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e136b0c. Configure here.

Comment thread verifiers/envs/experimental/cli_agent_env.py
… kwarg

0.2.21 is now published to PyPI, re-applying the pin bump reverted
earlier to unblock CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rasdani rasdani merged commit c2034fa into main Apr 20, 2026
6 checks passed
@rasdani rasdani requested a review from mikasenghaas April 20, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant