Fail fast when eval dataset builds stall by d42me · Pull Request #1124 · PrimeIntellect-ai/verifiers

d42me · 2026-04-10T23:52:07Z

Summary

add a bounded dataset-build guard so lazy dataset builders raise instead of hanging forever
wrap dataset build failures with environment context for clearer hosted eval errors
prepare the evaluation dataset before starting the env server so pre-rollout stalls are attributed correctly

Testing

uv run pytest tests/test_environment.py tests/test_environment_extra.py tests/test_run_evaluation.py
uv run ruff check verifiers/envs/environment.py verifiers/utils/eval_utils.py tests/test_environment.py tests/test_run_evaluation.py

Note

Medium Risk
Changes the evaluation startup sequence and introduces a threaded timeout guard around dataset building, which could affect environments with unusual get_eval_dataset behavior or long first-load times.

Overview
run_evaluation() now prepares the eval dataset before starting the env server, calling get_eval_dataset(n=1) to surface dataset-access failures early.

This adds a timeout guard (default 5 minutes) controlled by VF_DATASET_BUILD_TIMEOUT; when enabled, dataset prep runs in a background thread and raises a RuntimeError on timeout or build failure with the env id included.

Adds focused async tests covering call ordering (dataset before server), error wrapping, and timeout behavior, and updates docs/FAQs/skill guidance to document the new guard and troubleshooting knob.

^{Reviewed by Cursor Bugbot for commit 23e6ed0. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 23e6ed0. Configure here.}

cursor · 2026-04-14T01:14:00Z

+        except BaseException as exc:
+            raise RuntimeError(
+                f"Failed to prepare evaluation dataset for {env_id}: {exc}"
+            ) from exc


BaseException catch swallows KeyboardInterrupt as RuntimeError

Medium Severity

In the non-threaded path (when timeout is disabled via VF_DATASET_BUILD_TIMEOUT=0), except BaseException catches KeyboardInterrupt and SystemExit and wraps them in RuntimeError. This converts a KeyboardInterrupt into an Exception subclass, which changes downstream handling — the except KeyboardInterrupt handler in run_evaluations_tui won't match it, and except Exception handlers that are not meant to catch interrupts will. Using except Exception here would let KeyboardInterrupt and SystemExit propagate naturally. The except BaseException in the threaded guarded_build_eval_dataset (line 93) is fine since KeyboardInterrupt is normally delivered to the main thread.

^{Reviewed by Cursor Bugbot for commit 23e6ed0. Configure here.}

Fail fast when eval dataset builds stall

8ff6e43

cursor Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread verifiers/envs/environment.py Outdated

Document dataset build timeout guard

b3b4536

cursor Bot reviewed Apr 11, 2026

View reviewed changes

Comment thread docs/environments.md Outdated

Simplify eval dataset preparation timeout

2989b03

cursor Bot reviewed Apr 11, 2026

View reviewed changes

Comment thread verifiers/utils/eval_utils.py

Comment thread docs/evaluation.md

Tighten eval dataset preparation errors

23e6ed0

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail fast when eval dataset builds stall#1124

Fail fast when eval dataset builds stall#1124
d42me wants to merge 4 commits intoPrimeIntellect-ai:mainfrom
d42me:fix/hle-dataset-build-timeout

d42me commented Apr 10, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

d42me commented Apr 10, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 14, 2026

Choose a reason for hiding this comment

BaseException catch swallows KeyboardInterrupt as RuntimeError

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

d42me commented Apr 10, 2026 •

edited by cursor Bot

Loading