Fail fast when eval dataset builds stall#1124
Fail fast when eval dataset builds stall#1124d42me wants to merge 4 commits intoPrimeIntellect-ai:mainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 23e6ed0. Configure here.
| except BaseException as exc: | ||
| raise RuntimeError( | ||
| f"Failed to prepare evaluation dataset for {env_id}: {exc}" | ||
| ) from exc |
There was a problem hiding this comment.
BaseException catch swallows KeyboardInterrupt as RuntimeError
Medium Severity
In the non-threaded path (when timeout is disabled via VF_DATASET_BUILD_TIMEOUT=0), except BaseException catches KeyboardInterrupt and SystemExit and wraps them in RuntimeError. This converts a KeyboardInterrupt into an Exception subclass, which changes downstream handling — the except KeyboardInterrupt handler in run_evaluations_tui won't match it, and except Exception handlers that are not meant to catch interrupts will. Using except Exception here would let KeyboardInterrupt and SystemExit propagate naturally. The except BaseException in the threaded guarded_build_eval_dataset (line 93) is fine since KeyboardInterrupt is normally delivered to the main thread.
Reviewed by Cursor Bugbot for commit 23e6ed0. Configure here.


Summary
Testing
Note
Medium Risk
Changes the evaluation startup sequence and introduces a threaded timeout guard around dataset building, which could affect environments with unusual
get_eval_datasetbehavior or long first-load times.Overview
run_evaluation()now prepares the eval dataset before starting the env server, callingget_eval_dataset(n=1)to surface dataset-access failures early.This adds a timeout guard (default 5 minutes) controlled by
VF_DATASET_BUILD_TIMEOUT; when enabled, dataset prep runs in a background thread and raises aRuntimeErroron timeout or build failure with the env id included.Adds focused async tests covering call ordering (dataset before server), error wrapping, and timeout behavior, and updates docs/FAQs/skill guidance to document the new guard and troubleshooting knob.
Reviewed by Cursor Bugbot for commit 23e6ed0. Bugbot is set up for automated code reviews on this repo. Configure here.