feat: per-turn timing + ms→s migration by mikasenghaas · Pull Request #1182 · PrimeIntellect-ai/verifiers

mikasenghaas · 2026-04-18T23:56:09Z

Summary

Add StepTiming (llm_s, env_s, turn_s) as a NotRequired field on TrajectoryStep, instrumented in MultiTurnEnv.rollout() with backfill of env/tool time onto the previous step
Migrate RolloutTiming from milliseconds to seconds, renaming generation_ms/scoring_ms/total_ms → generation_s/scoring_s/total_s and removing * 1000 / / 1000 conversions throughout

Test plan

New tests/test_per_turn_timing.py — single-turn timing, multi-turn backfill, values-are-seconds
All affected existing tests pass (900 passed)
Manual verification with a real eval run

🤖 Generated with Claude Code

…conds Add StepTiming (llm_s, env_s, turn_s) as a first-class field on TrajectoryStep, instrumented in MultiTurnEnv.rollout() with backfill for env/tool time onto the previous step. Migrate RolloutTiming from milliseconds to seconds, renaming generation_ms/scoring_ms/total_ms to generation_s/scoring_s/total_s and removing the * 1000 / / 1000 conversions throughout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Make timing a required field on TrajectoryStep (no longer NotRequired) - Add per-turn llm/env timing stats to print_timing() (non-TUI mode) - Add per-turn timing breakdown to TUI Usage tab - Use print_time() for human-readable formatting in both display paths - Update rlm_env.py and all test TrajectoryStep constructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Time setup_state() → setup_s - Redefine generation_s as sum of per-step turn_s (llm + env time) - Compute overhead_s = total - setup - generation - scoring - Display setup/overhead in print_timing(), TUI, and orchestrator metrics - Refactor print_timing() with _print_timing_row helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add avg_timing to EnvEvalState, computed from all_outputs in the progress callback - Show timing row (setup/gen/score/overhead/total) in live panel alongside tokens - Add "timing (avg)" panel to final summary next to "usage (avg)" - Use print_time() for human-readable formatting throughout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… and side-by-side columns - Add format_timing_line() shared formatter for compact timing breakdown - Add section titles (metrics/usage/timing) in live display panel - Use Columns for side-by-side panels in final summary - Wire format_timing_line into all display paths (live, summary, TUI, non-TUI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add section_label and value_style params to make_kv_line() - Combine usage tokens into a single line with "usage" label - Add "metrics" and "timing" inline labels with bold dim style - Make timing values white to match other metric values - Remove separate section title lines from live panel Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… dim - Remove "total Xs (...)" wrapper — show breakdown parts directly - Add format_timing_rich() returning styled Text with dim labels and white values - Extract _timing_parts() for shared structured breakdown logic - Use format_timing_rich in live panel and final summary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add llm_s/env_s to RolloutTiming (computed in _render_timing from steps) - Simplify progress callback to read llm_s/env_s from timing directly - Replace side-by-side Columns panels in final summary with compact inline ╰─ rows matching the live panel style Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…l section - Rename llm_s -> model_s in StepTiming, RolloutTiming, and all references - Rename "tools" label to "env" in timing breakdown display - Remove metrics/usage/timing section below reward distribution in summary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mikasenghaas changed the title ~~feat: per-turn timing + RolloutTiming ms→s migration~~ feat: per-turn timing + ms→s migration Apr 18, 2026

mikasenghaas and others added 10 commits April 19, 2026 00:05

fix: type annotation for make_kv_line compatibility

369fc39

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: type annotations for ty checker

6bd4f4a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: per-turn timing + ms→s migration#1182

feat: per-turn timing + ms→s migration#1182
mikasenghaas wants to merge 11 commits intomainfrom
feat/turn-timings

mikasenghaas commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikasenghaas commented Apr 18, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant