Draft
Conversation
…conds Add StepTiming (llm_s, env_s, turn_s) as a first-class field on TrajectoryStep, instrumented in MultiTurnEnv.rollout() with backfill for env/tool time onto the previous step. Migrate RolloutTiming from milliseconds to seconds, renaming generation_ms/scoring_ms/total_ms to generation_s/scoring_s/total_s and removing the * 1000 / / 1000 conversions throughout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Make timing a required field on TrajectoryStep (no longer NotRequired) - Add per-turn llm/env timing stats to print_timing() (non-TUI mode) - Add per-turn timing breakdown to TUI Usage tab - Use print_time() for human-readable formatting in both display paths - Update rlm_env.py and all test TrajectoryStep constructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Time setup_state() → setup_s - Redefine generation_s as sum of per-step turn_s (llm + env time) - Compute overhead_s = total - setup - generation - scoring - Display setup/overhead in print_timing(), TUI, and orchestrator metrics - Refactor print_timing() with _print_timing_row helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add avg_timing to EnvEvalState, computed from all_outputs in the progress callback - Show timing row (setup/gen/score/overhead/total) in live panel alongside tokens - Add "timing (avg)" panel to final summary next to "usage (avg)" - Use print_time() for human-readable formatting throughout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… and side-by-side columns - Add format_timing_line() shared formatter for compact timing breakdown - Add section titles (metrics/usage/timing) in live display panel - Use Columns for side-by-side panels in final summary - Wire format_timing_line into all display paths (live, summary, TUI, non-TUI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add section_label and value_style params to make_kv_line() - Combine usage tokens into a single line with "usage" label - Add "metrics" and "timing" inline labels with bold dim style - Make timing values white to match other metric values - Remove separate section title lines from live panel Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… dim - Remove "total Xs (...)" wrapper — show breakdown parts directly - Add format_timing_rich() returning styled Text with dim labels and white values - Extract _timing_parts() for shared structured breakdown logic - Use format_timing_rich in live panel and final summary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add llm_s/env_s to RolloutTiming (computed in _render_timing from steps) - Simplify progress callback to read llm_s/env_s from timing directly - Replace side-by-side Columns panels in final summary with compact inline ╰─ rows matching the live panel style Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l section - Rename llm_s -> model_s in StepTiming, RolloutTiming, and all references - Rename "tools" label to "env" in timing breakdown display - Remove metrics/usage/timing section below reward distribution in summary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
StepTiming(llm_s,env_s,turn_s) as aNotRequiredfield onTrajectoryStep, instrumented inMultiTurnEnv.rollout()with backfill of env/tool time onto the previous stepRolloutTimingfrom milliseconds to seconds, renaminggeneration_ms/scoring_ms/total_ms→generation_s/scoring_s/total_sand removing* 1000// 1000conversions throughoutTest plan
tests/test_per_turn_timing.py— single-turn timing, multi-turn backfill, values-are-seconds🤖 Generated with Claude Code