refactor: propagate shared config via before validator by mikasenghaas · Pull Request #2291 · PrimeIntellect-ai/prime-rl

mikasenghaas · 2026-04-16T22:53:36Z

Summary

Reorganize all shared-config propagation on RLConfig into a single mode="before" validator (auto_setup_shared_configs), so sub-config validators see final values at construction time.

Covers model, log, ckpt, wandb, tokenizer, seq_len, max_steps, max_async_level, output_dir
Collapses eight per-field validate_shared_* after-validators into one validate_shared_configs after-validator
Removes the fix-up hack in auto_setup_tokenizer that had to correct state written by sub-config validators running before model propagation
Moves auto_setup_session_headers onto OrchestratorConfig (it only touches orchestrator state, no cross-sub-config logic)
Deletes RLConfig.max_model_len — declared as a shared field but never read or propagated anywhere

Why

Pydantic constructs nested models before running parent validators. Under the old design, auto_setup_model ran after ModelConfig had already been constructed with default values, so any sub-config validator depending on the final shared value (e.g. parser auto-resolution) saw the wrong input. Moving propagation into a mode=\"before\" validator lets sub-configs see the final shared values on their first validation pass.

This also unblocks parser auto-resolution at config time (PR #2290).

🤖 Generated with Claude Code

Note

Medium Risk
Changes Pydantic validation order for RLConfig by moving shared-field propagation to a mode="before" validator, which can subtly alter how defaults/CLI/TOML merges resolve and what sub-config validators see. Moderate risk of breaking existing config edge cases despite added unit coverage.

Overview
Refactors RLConfig shared config propagation to run in a single mode="before" validator (auto_setup_shared_configs), so model/log/ckpt/wandb/tokenizer/seq_len/max_steps/max_async_level/output_dir are filled into trainer/orchestrator/inference before nested models are constructed (with sub-config values taking precedence).

Collapses multiple per-field after-validators into one validate_shared_configs after-validator, extracts the seq_len consistency check into validate_shared_seq_len, moves X-Session-ID header auto-setup onto OrchestratorConfig, removes unused RLConfig.max_model_len, and adds focused unit tests covering propagation/precedence and CLI merge behavior.

^{Reviewed by Cursor Bugbot for commit 1077bcc. Bugbot is set up for automated code reviews on this repo. Configure here.}

Move model name/VLM propagation from auto_setup_model (after validator) to a new _propagate_shared_model (before validator). This ensures sub-config validators see the final model name when they run, which is critical for features like parser auto-resolution that depend on the model name at config construction time. The after validator auto_setup_model is simplified to only validate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Move output_dir, ckpt, wandb, tokenizer, and seq_len propagation from after-validators into the before-validator auto_setup_shared_configs, using uniform fill-if-absent semantics. Sub-config values now always take precedence; mismatches surface via the merged validate_shared_configs after-validator instead of being silently clobbered. Session header setup moves onto OrchestratorConfig since it only touches orchestrator state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… callers - Extract seq_len consistency check from rl.py into validate_shared_seq_len alongside the other validate_shared_* helpers - Remove the "skip if value is None" guard from propagate; each caller now explicitly gates on "if val is not None" which makes propagation intent visible at the call site Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a61e1d4. Configure here.}

When RLConfig is built via `cli(RLConfig, ...)` tyro first realizes sub-configs from the TOML default, then constructs a new RLConfig with CLI overrides. The `mode=before` validator then sees `data["trainer"]` as a TrainerConfig instance rather than a dict, so fill-if-absent silently skips every shared field — `--output-dir` never reached sub-configs, leaving trainer and orchestrator writing to different directories. Dump BaseModel sub-configs with `exclude_defaults=True` before filling, recursively preserving discriminator `type` fields so union variants still resolve after re-validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mikasenghaas · 2026-04-20T02:32:36Z

+
+        data = deepcopy(data)
+
+        # tyro may pass already-constructed sub-config instances rather than


@samsja this is the only ugliness in the pr atm which i think we can/should fix at the tyro/pydantic config level. imo we should try to avoid resolving any config more than once

mikasenghaas force-pushed the refactor/before-validator-propagation branch from 58da201 to df61dec Compare April 16, 2026 22:59

mikasenghaas force-pushed the refactor/before-validator-propagation branch from df61dec to 1f66066 Compare April 16, 2026 23:08

mikasenghaas and others added 4 commits April 17, 2026 17:48

test: add shared model name propagation test for RLConfig

9ee6cbe

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cleanup

a61e1d4

mikasenghaas changed the title ~~refactor: propagate shared model config via before validator~~ refactor: propagate shared config via before validator Apr 17, 2026

mikasenghaas marked this pull request as ready for review April 17, 2026 19:37

mikasenghaas requested a review from samsja April 17, 2026 19:37

cursor bot reviewed Apr 17, 2026

View reviewed changes

Comment thread src/prime_rl/configs/rl.py

mikasenghaas mentioned this pull request Apr 17, 2026

refactor: regex-based auto-resolution for tool call and reasoning parsers #2290

Draft

mikasenghaas commented Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: propagate shared config via before validator#2291

refactor: propagate shared config via before validator#2291
mikasenghaas wants to merge 6 commits intomainfrom
refactor/before-validator-propagation

mikasenghaas commented Apr 16, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

mikasenghaas Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		data = deepcopy(data)

		# tyro may pass already-constructed sub-config instances rather than

Conversation

mikasenghaas commented Apr 16, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikasenghaas Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mikasenghaas commented Apr 16, 2026 •

edited by cursor bot

Loading