Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## 2024-04-20 - [Avoid setdefault with complex defaults in loops]
**Learning:** `dict.setdefault(key, complex_default)` evaluates `complex_default` on every iteration, even if the key already exists. This causes significant performance overhead in hot loops when the default is a dictionary or list, as it constantly creates and immediately discards these objects.
**Action:** Replace `dict.setdefault(key, complex_default)` with `if key not in dict: dict[key] = complex_default` to avoid expensive eager evaluation.
9 changes: 4 additions & 5 deletions src/model/free_running_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -697,14 +697,13 @@ def evaluate_free_running_programs(

outcomes.append(outcome)
bucket = bucket_name(outcome.program_steps)
bucket_state = per_bucket.setdefault(
bucket,
{
if bucket not in per_bucket:
per_bucket[bucket] = {
"program_count": 0,
"exact_trace_count": 0,
"exact_final_state_count": 0,
},
)
}
bucket_state = per_bucket[bucket]
bucket_state["program_count"] += 1
Comment on lines +700 to 707

Copilot AI Apr 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This membership-check + indexing pattern does two dict lookups per program (bucket in per_bucket then per_bucket[bucket]). Since the stated goal is improving hot-loop performance, consider a single-lookup approach (get with initialization or try/except KeyError) to avoid the extra lookup.

Copilot uses AI. Check for mistakes.
bucket_state["exact_trace_count"] += int(outcome.exact_trace_match)
bucket_state["exact_final_state_count"] += int(outcome.exact_final_state_match)
Expand Down
13 changes: 7 additions & 6 deletions src/model/softmax_baseline.py
Original file line number Diff line number Diff line change
Expand Up @@ -582,15 +582,14 @@ def evaluate_teacher_forced_model(
total_correct += correct

bucket = baseline_bucket_name(example.program_steps)
bucket_state = per_bucket.setdefault(
bucket,
{
if bucket not in per_bucket:
per_bucket[bucket] = {
"example_count": 0,
"token_count": 0,
"correct_tokens": 0,
"weighted_loss": 0.0,
},
)
}
bucket_state = per_bucket[bucket]
bucket_state["example_count"] = int(bucket_state["example_count"]) + 1
Comment on lines +585 to 593

Copilot AI Apr 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a tight loop this if bucket not in per_bucket: followed by per_bucket[bucket] does two dictionary lookups per example on the common path. If this function is performance-sensitive, consider using a single-lookup approach (e.g., state = per_bucket.get(bucket) and initialize when missing, or try/except KeyError).

Copilot uses AI. Check for mistakes.
bucket_state["token_count"] = int(bucket_state["token_count"]) + token_count
bucket_state["correct_tokens"] = int(bucket_state["correct_tokens"]) + correct
Expand Down Expand Up @@ -753,7 +752,9 @@ def evaluate_free_running_rollout(
)

bucket = baseline_bucket_name(example.program_steps)
bucket_state = per_bucket.setdefault(bucket, {"example_count": 0, "exact_count": 0})
if bucket not in per_bucket:
per_bucket[bucket] = {"example_count": 0, "exact_count": 0}
bucket_state = per_bucket[bucket]
Comment on lines +755 to +757

Copilot AI Apr 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: if bucket not in per_bucket: then indexing per_bucket[bucket] incurs two dict lookups per iteration. If this rollout evaluation is on a hot path, a single-lookup pattern (get/initialize or try/except KeyError) will shave overhead.

Suggested change
if bucket not in per_bucket:
per_bucket[bucket] = {"example_count": 0, "exact_count": 0}
bucket_state = per_bucket[bucket]
bucket_state = per_bucket.setdefault(bucket, {"example_count": 0, "exact_count": 0})

Copilot uses AI. Check for mistakes.
bucket_state["example_count"] = int(bucket_state["example_count"]) + 1
bucket_state["exact_count"] = int(bucket_state["exact_count"]) + int(exact)

Expand Down
18 changes: 14 additions & 4 deletions src/model/trainable_latest_write.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,9 @@ def exact_program_accuracy(scorer: TrainableLatestWriteScorer, samples: Sequence
return 0.0
per_program: dict[str, list[bool]] = {}
for sample in samples:
per_program.setdefault(sample.program_name, []).append(scorer.predict_index(sample) == sample.target_index)
if sample.program_name not in per_program:
per_program[sample.program_name] = []
per_program[sample.program_name].append(scorer.predict_index(sample) == sample.target_index)
Comment on lines +174 to +176

Copilot AI Apr 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hot-loop pattern does two dict lookups on the common path (if key not in ... then ... [key]). Since the PR is performance-motivated, consider a single-lookup pattern (e.g., lst = per_program.get(name) / initialize if None, or try/except KeyError) to avoid the extra hash lookup per iteration.

Copilot uses AI. Check for mistakes.
exact = sum(1 for outcomes in per_program.values() if all(outcomes))
return exact / len(per_program)

Expand All @@ -198,12 +200,20 @@ def evaluate_scorer(
correct_samples += int(correct)

bucket = bucket_name(sample.program_steps)
bucket_state = per_bucket.setdefault(bucket, {"sample_count": 0, "sample_correct": 0, "programs": {}})
if bucket not in per_bucket:
per_bucket[bucket] = {"sample_count": 0, "sample_correct": 0, "programs": {}}
bucket_state = per_bucket[bucket]
bucket_state["sample_count"] = int(bucket_state["sample_count"]) + 1
bucket_state["sample_correct"] = int(bucket_state["sample_correct"]) + int(correct)
bucket_state["programs"].setdefault(sample.program_name, []).append(correct)

per_program.setdefault(sample.program_name, []).append(correct)
programs: dict = bucket_state["programs"] # type: ignore
if sample.program_name not in programs:
programs[sample.program_name] = []
programs[sample.program_name].append(correct)
Comment on lines +209 to +212

Copilot AI Apr 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

programs: dict = bucket_state["programs"] # type: ignore drops type information and introduces a blanket type: ignore. This makes it easy to accidentally store the wrong structure and hides type errors. Prefer giving per_bucket a more precise type (e.g., a TypedDict) or using typing.cast to dict[str, list[bool]] for bucket_state["programs"] rather than ignoring types.

Copilot uses AI. Check for mistakes.

if sample.program_name not in per_program:
per_program[sample.program_name] = []
per_program[sample.program_name].append(correct)
program_steps[sample.program_name] = sample.program_steps

exact_programs = sum(1 for outcomes in per_program.values() if all(outcomes))
Expand Down