Skip to content

feat: per-turn timing + ms→s migration#1182

Draft
mikasenghaas wants to merge 11 commits intomainfrom
feat/turn-timings
Draft

feat: per-turn timing + ms→s migration#1182
mikasenghaas wants to merge 11 commits intomainfrom
feat/turn-timings

Conversation

@mikasenghaas
Copy link
Copy Markdown
Member

Summary

  • Add StepTiming (llm_s, env_s, turn_s) as a NotRequired field on TrajectoryStep, instrumented in MultiTurnEnv.rollout() with backfill of env/tool time onto the previous step
  • Migrate RolloutTiming from milliseconds to seconds, renaming generation_ms/scoring_ms/total_msgeneration_s/scoring_s/total_s and removing * 1000 / / 1000 conversions throughout

Test plan

  • New tests/test_per_turn_timing.py — single-turn timing, multi-turn backfill, values-are-seconds
  • All affected existing tests pass (900 passed)
  • Manual verification with a real eval run

🤖 Generated with Claude Code

…conds

Add StepTiming (llm_s, env_s, turn_s) as a first-class field on
TrajectoryStep, instrumented in MultiTurnEnv.rollout() with backfill
for env/tool time onto the previous step.

Migrate RolloutTiming from milliseconds to seconds, renaming
generation_ms/scoring_ms/total_ms to generation_s/scoring_s/total_s
and removing the * 1000 / / 1000 conversions throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mikasenghaas mikasenghaas changed the title feat: per-turn timing + RolloutTiming ms→s migration feat: per-turn timing + ms→s migration Apr 18, 2026
mikasenghaas and others added 10 commits April 19, 2026 00:05
- Make timing a required field on TrajectoryStep (no longer NotRequired)
- Add per-turn llm/env timing stats to print_timing() (non-TUI mode)
- Add per-turn timing breakdown to TUI Usage tab
- Use print_time() for human-readable formatting in both display paths
- Update rlm_env.py and all test TrajectoryStep constructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Time setup_state() → setup_s
- Redefine generation_s as sum of per-step turn_s (llm + env time)
- Compute overhead_s = total - setup - generation - scoring
- Display setup/overhead in print_timing(), TUI, and orchestrator metrics
- Refactor print_timing() with _print_timing_row helper

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add avg_timing to EnvEvalState, computed from all_outputs in
  the progress callback
- Show timing row (setup/gen/score/overhead/total) in live panel
  alongside tokens
- Add "timing (avg)" panel to final summary next to "usage (avg)"
- Use print_time() for human-readable formatting throughout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… and side-by-side columns

- Add format_timing_line() shared formatter for compact timing breakdown
- Add section titles (metrics/usage/timing) in live display panel
- Use Columns for side-by-side panels in final summary
- Wire format_timing_line into all display paths (live, summary, TUI, non-TUI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add section_label and value_style params to make_kv_line()
- Combine usage tokens into a single line with "usage" label
- Add "metrics" and "timing" inline labels with bold dim style
- Make timing values white to match other metric values
- Remove separate section title lines from live panel

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… dim

- Remove "total Xs (...)" wrapper — show breakdown parts directly
- Add format_timing_rich() returning styled Text with dim labels and white values
- Extract _timing_parts() for shared structured breakdown logic
- Use format_timing_rich in live panel and final summary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add llm_s/env_s to RolloutTiming (computed in _render_timing from steps)
- Simplify progress callback to read llm_s/env_s from timing directly
- Replace side-by-side Columns panels in final summary with compact
  inline ╰─ rows matching the live panel style

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l section

- Rename llm_s -> model_s in StepTiming, RolloutTiming, and all references
- Rename "tools" label to "env" in timing breakdown display
- Remove metrics/usage/timing section below reward distribution in summary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant