Skip to content

refactor: propagate shared config via before validator#2291

Open
mikasenghaas wants to merge 6 commits intomainfrom
refactor/before-validator-propagation
Open

refactor: propagate shared config via before validator#2291
mikasenghaas wants to merge 6 commits intomainfrom
refactor/before-validator-propagation

Conversation

@mikasenghaas
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas commented Apr 16, 2026

Summary

Reorganize all shared-config propagation on RLConfig into a single mode="before" validator (auto_setup_shared_configs), so sub-config validators see final values at construction time.

  • Covers model, log, ckpt, wandb, tokenizer, seq_len, max_steps, max_async_level, output_dir
  • Collapses eight per-field validate_shared_* after-validators into one validate_shared_configs after-validator
  • Removes the fix-up hack in auto_setup_tokenizer that had to correct state written by sub-config validators running before model propagation
  • Moves auto_setup_session_headers onto OrchestratorConfig (it only touches orchestrator state, no cross-sub-config logic)
  • Deletes RLConfig.max_model_len — declared as a shared field but never read or propagated anywhere

Why

Pydantic constructs nested models before running parent validators. Under the old design, auto_setup_model ran after ModelConfig had already been constructed with default values, so any sub-config validator depending on the final shared value (e.g. parser auto-resolution) saw the wrong input. Moving propagation into a mode=\"before\" validator lets sub-configs see the final shared values on their first validation pass.

This also unblocks parser auto-resolution at config time (PR #2290).

🤖 Generated with Claude Code


Note

Medium Risk
Changes Pydantic validation order for RLConfig by moving shared-field propagation to a mode="before" validator, which can subtly alter how defaults/CLI/TOML merges resolve and what sub-config validators see. Moderate risk of breaking existing config edge cases despite added unit coverage.

Overview
Refactors RLConfig shared config propagation to run in a single mode="before" validator (auto_setup_shared_configs), so model/log/ckpt/wandb/tokenizer/seq_len/max_steps/max_async_level/output_dir are filled into trainer/orchestrator/inference before nested models are constructed (with sub-config values taking precedence).

Collapses multiple per-field after-validators into one validate_shared_configs after-validator, extracts the seq_len consistency check into validate_shared_seq_len, moves X-Session-ID header auto-setup onto OrchestratorConfig, removes unused RLConfig.max_model_len, and adds focused unit tests covering propagation/precedence and CLI merge behavior.

Reviewed by Cursor Bugbot for commit 1077bcc. Bugbot is set up for automated code reviews on this repo. Configure here.

@mikasenghaas mikasenghaas force-pushed the refactor/before-validator-propagation branch from 58da201 to df61dec Compare April 16, 2026 22:59
Move model name/VLM propagation from auto_setup_model (after validator)
to a new _propagate_shared_model (before validator). This ensures
sub-config validators see the final model name when they run, which is
critical for features like parser auto-resolution that depend on the
model name at config construction time.

The after validator auto_setup_model is simplified to only validate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mikasenghaas mikasenghaas force-pushed the refactor/before-validator-propagation branch from df61dec to 1f66066 Compare April 16, 2026 23:08
mikasenghaas and others added 4 commits April 17, 2026 17:48
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move output_dir, ckpt, wandb, tokenizer, and seq_len propagation from
after-validators into the before-validator auto_setup_shared_configs, using
uniform fill-if-absent semantics. Sub-config values now always take precedence;
mismatches surface via the merged validate_shared_configs after-validator
instead of being silently clobbered. Session header setup moves onto
OrchestratorConfig since it only touches orchestrator state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… callers

- Extract seq_len consistency check from rl.py into validate_shared_seq_len
  alongside the other validate_shared_* helpers
- Remove the "skip if value is None" guard from propagate; each caller now
  explicitly gates on "if val is not None" which makes propagation intent
  visible at the call site

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikasenghaas mikasenghaas changed the title refactor: propagate shared model config via before validator refactor: propagate shared config via before validator Apr 17, 2026
@mikasenghaas mikasenghaas marked this pull request as ready for review April 17, 2026 19:37
@mikasenghaas mikasenghaas requested a review from samsja April 17, 2026 19:37
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a61e1d4. Configure here.

Comment thread src/prime_rl/configs/rl.py
When RLConfig is built via `cli(RLConfig, ...)` tyro first realizes sub-configs
from the TOML default, then constructs a new RLConfig with CLI overrides. The
`mode=before` validator then sees `data["trainer"]` as a TrainerConfig instance
rather than a dict, so fill-if-absent silently skips every shared field —
`--output-dir` never reached sub-configs, leaving trainer and orchestrator
writing to different directories.

Dump BaseModel sub-configs with `exclude_defaults=True` before filling,
recursively preserving discriminator `type` fields so union variants still
resolve after re-validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

data = deepcopy(data)

# tyro may pass already-constructed sub-config instances rather than
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsja this is the only ugliness in the pr atm which i think we can/should fix at the tyro/pydantic config level. imo we should try to avoid resolving any config more than once

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant