Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

## 2024-05-18 - Eager instantiation with dict.setdefault
**Learning:** `dict.setdefault(key, expensive_default)` inside hot loops eagerly evaluates `expensive_default` on *every* iteration, which is a major performance bottleneck for Python loops where defaults are objects or lists.
**Action:** Always replace `setdefault` with an explicit `if key not in dict: dict[key] = ...` membership check to prevent eager evaluation.

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guidance here is written as an absolute (β€œAlways replace setdefault …”), but setdefault is only problematic when the default expression allocates/is expensive (and especially inside hot loops). Consider qualifying this recommendation (and/or recommending the one-lookup d.get(key) + initialize-on-miss pattern) to avoid encouraging unnecessary rewrites or potential slowdowns where the default is cheap/constant.

Suggested change
**Action:** Always replace `setdefault` with an explicit `if key not in dict: dict[key] = ...` membership check to prevent eager evaluation.
**Action:** When the default is expensive to construct or allocates new objectsβ€”especially in hot loopsβ€”avoid `setdefault(...)` and use lazy initialization instead, for example with `value = d.get(key)` followed by initialize-on-miss logic (or an explicit `if key not in d: d[key] = ...`). For cheap or constant defaults, `setdefault` can still be acceptable.

Copilot uses AI. Check for mistakes.
11 changes: 6 additions & 5 deletions src/geometry/hull_kv.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
HardmaxResult,
NumberLike,
ValueLike,
_as_fraction,
_coerce_key,
_coerce_value,
_normalize_number,
Expand Down Expand Up @@ -205,10 +204,12 @@ def _rebuild_if_needed(self) -> None:
total_value_sum = [Fraction(0) for _ in range(self._value_width or 0)]

for index, (key, value) in enumerate(self._entries):
bucket = aggregates.setdefault(
key,
{"value_sum": [Fraction(0) for _ in value], "count": 0, "entry_indices": []},
)
# Avoid eager instantiation of expensive default dict/list for every iteration
# by using an explicit membership check instead of setdefault.
if key not in aggregates:
aggregates[key] = {"value_sum": [Fraction(0) for _ in value], "count": 0, "entry_indices": []}
bucket = aggregates[key]
Comment on lines +208 to +211

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern does two dict lookups for existing keys (if key not in aggregates then aggregates[key]). In this hot loop you can keep the lazy default creation while avoiding the second lookup by using bucket = aggregates.get(key) and initializing only when it’s None/missing.

Suggested change
# by using an explicit membership check instead of setdefault.
if key not in aggregates:
aggregates[key] = {"value_sum": [Fraction(0) for _ in value], "count": 0, "entry_indices": []}
bucket = aggregates[key]
# by initializing the bucket lazily only when the key is first seen.
bucket = aggregates.get(key)
if bucket is None:
bucket = {"value_sum": [Fraction(0) for _ in value], "count": 0, "entry_indices": []}
aggregates[key] = bucket

Copilot uses AI. Check for mistakes.

for coord_index, coord in enumerate(value):
bucket["value_sum"][coord_index] += coord
total_value_sum[coord_index] += coord
Expand Down
10 changes: 5 additions & 5 deletions src/model/free_running_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -697,14 +697,14 @@ def evaluate_free_running_programs(

outcomes.append(outcome)
bucket = bucket_name(outcome.program_steps)
bucket_state = per_bucket.setdefault(
bucket,
{
# Avoid eager instantiation of default dict on every iteration
if bucket not in per_bucket:
per_bucket[bucket] = {
"program_count": 0,
Comment on lines 699 to 703

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses if bucket not in per_bucket followed by bucket_state = per_bucket[bucket], which does two dict lookups per program when the bucket already exists. Using .get() to fetch the existing bucket_state and only initializing on miss avoids the extra lookup.

Copilot uses AI. Check for mistakes.
"exact_trace_count": 0,
"exact_final_state_count": 0,
},
)
}
bucket_state = per_bucket[bucket]
bucket_state["program_count"] += 1
bucket_state["exact_trace_count"] += int(outcome.exact_trace_match)
bucket_state["exact_final_state_count"] += int(outcome.exact_final_state_match)
Expand Down
5 changes: 4 additions & 1 deletion src/model/induced_causal.py
Original file line number Diff line number Diff line change
Expand Up @@ -455,7 +455,10 @@ def fit_transition_library(
examples = build_transition_examples(programs, interpreter=interpreter)
by_opcode: dict[Opcode, list[TransitionExample]] = {}
for example in examples:
by_opcode.setdefault(example.opcode, []).append(example)
# Avoid eager instantiation of default list on every iteration
if example.opcode not in by_opcode:
by_opcode[example.opcode] = []
by_opcode[example.opcode].append(example)
Comment on lines +458 to +461

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses an explicit membership check and then immediately indexes by_opcode[example.opcode], which results in two dict lookups per example when the opcode is already present. Consider using examples_for_opcode = by_opcode.get(example.opcode) and initializing only when missing to reduce lookup overhead.

Suggested change
# Avoid eager instantiation of default list on every iteration
if example.opcode not in by_opcode:
by_opcode[example.opcode] = []
by_opcode[example.opcode].append(example)
examples_for_opcode = by_opcode.get(example.opcode)
if examples_for_opcode is None:
examples_for_opcode = []
by_opcode[example.opcode] = examples_for_opcode
examples_for_opcode.append(example)

Copilot uses AI. Check for mistakes.

rules = tuple(_fit_rule_for_opcode(by_opcode[opcode], opcode) for opcode in sorted(by_opcode, key=str))
return InducedTransitionLibrary(rules=rules)
Expand Down
15 changes: 9 additions & 6 deletions src/model/softmax_baseline.py
Original file line number Diff line number Diff line change
Expand Up @@ -582,15 +582,15 @@ def evaluate_teacher_forced_model(
total_correct += correct

bucket = baseline_bucket_name(example.program_steps)
bucket_state = per_bucket.setdefault(
bucket,
{
# Avoid eager instantiation of default dict on every iteration
if bucket not in per_bucket:
per_bucket[bucket] = {
"example_count": 0,
Comment on lines 584 to 588

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern does a membership check and then later indexes per_bucket[bucket], resulting in two dict lookups per iteration for existing buckets. Using bucket_state = per_bucket.get(bucket) and initializing only when missing avoids the extra lookup while keeping defaults lazy.

Copilot uses AI. Check for mistakes.
"token_count": 0,
"correct_tokens": 0,
"weighted_loss": 0.0,
},
)
}
bucket_state = per_bucket[bucket]
bucket_state["example_count"] = int(bucket_state["example_count"]) + 1
bucket_state["token_count"] = int(bucket_state["token_count"]) + token_count
bucket_state["correct_tokens"] = int(bucket_state["correct_tokens"]) + correct
Expand Down Expand Up @@ -753,7 +753,10 @@ def evaluate_free_running_rollout(
)

bucket = baseline_bucket_name(example.program_steps)
bucket_state = per_bucket.setdefault(bucket, {"example_count": 0, "exact_count": 0})
# Avoid eager instantiation of default dict on every iteration
if bucket not in per_bucket:
per_bucket[bucket] = {"example_count": 0, "exact_count": 0}
bucket_state = per_bucket[bucket]
Comment on lines 755 to +759

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does if bucket not in per_bucket followed by bucket_state = per_bucket[bucket], which costs two dict lookups per iteration when the bucket already exists. Consider using .get() to retrieve the existing state and only create the dict on miss.

Copilot uses AI. Check for mistakes.
bucket_state["example_count"] = int(bucket_state["example_count"]) + 1
bucket_state["exact_count"] = int(bucket_state["exact_count"]) + int(exact)

Expand Down
20 changes: 16 additions & 4 deletions src/model/trainable_latest_write.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,10 @@ def exact_program_accuracy(scorer: TrainableLatestWriteScorer, samples: Sequence
return 0.0
per_program: dict[str, list[bool]] = {}
for sample in samples:
per_program.setdefault(sample.program_name, []).append(scorer.predict_index(sample) == sample.target_index)
# Avoid eager instantiation of default list on every iteration
if sample.program_name not in per_program:
per_program[sample.program_name] = []
per_program[sample.program_name].append(scorer.predict_index(sample) == sample.target_index)
Comment on lines +174 to +177

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses if program_name not in per_program followed by per_program[program_name], which performs two dictionary lookups per iteration when the key already exists. Consider using outcomes = per_program.get(program_name) and only allocating/assigning the list when it’s missing, to keep the laziness while avoiding the extra lookup in the hot loop.

Copilot uses AI. Check for mistakes.
exact = sum(1 for outcomes in per_program.values() if all(outcomes))
return exact / len(per_program)

Expand All @@ -198,12 +201,21 @@ def evaluate_scorer(
correct_samples += int(correct)

bucket = bucket_name(sample.program_steps)
bucket_state = per_bucket.setdefault(bucket, {"sample_count": 0, "sample_correct": 0, "programs": {}})
# Avoid eager instantiation of default dict on every iteration
if bucket not in per_bucket:
per_bucket[bucket] = {"sample_count": 0, "sample_correct": 0, "programs": {}}
bucket_state = per_bucket[bucket]
Comment on lines +205 to +207

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initialization pattern performs a membership check and then a second lookup (per_bucket[bucket]) each iteration. You can avoid the extra lookup by using bucket_state = per_bucket.get(bucket) and initializing when it’s missing.

Suggested change
if bucket not in per_bucket:
per_bucket[bucket] = {"sample_count": 0, "sample_correct": 0, "programs": {}}
bucket_state = per_bucket[bucket]
bucket_state = per_bucket.get(bucket)
if bucket_state is None:
bucket_state = {"sample_count": 0, "sample_correct": 0, "programs": {}}
per_bucket[bucket] = bucket_state

Copilot uses AI. Check for mistakes.
bucket_state["sample_count"] = int(bucket_state["sample_count"]) + 1
bucket_state["sample_correct"] = int(bucket_state["sample_correct"]) + int(correct)
bucket_state["programs"].setdefault(sample.program_name, []).append(correct)

per_program.setdefault(sample.program_name, []).append(correct)
programs_dict = bucket_state["programs"]
if sample.program_name not in programs_dict:
programs_dict[sample.program_name] = []
programs_dict[sample.program_name].append(correct)
Comment on lines +212 to +214

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

programs_dict = bucket_state["programs"] is followed by a membership check and then another index lookup. If this loop is performance-sensitive, consider using lst = programs_dict.get(sample.program_name) and initializing only when missing to reduce dict lookups.

Suggested change
if sample.program_name not in programs_dict:
programs_dict[sample.program_name] = []
programs_dict[sample.program_name].append(correct)
program_outcomes = programs_dict.get(sample.program_name)
if program_outcomes is None:
program_outcomes = []
programs_dict[sample.program_name] = program_outcomes
program_outcomes.append(correct)

Copilot uses AI. Check for mistakes.

if sample.program_name not in per_program:
per_program[sample.program_name] = []
per_program[sample.program_name].append(correct)
Comment on lines +216 to +218

Copilot AI Apr 17, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses if sample.program_name not in per_program followed by per_program[sample.program_name], which does two lookups for existing keys. Using .get() to fetch the list and only allocating on miss avoids the extra lookup.

Copilot uses AI. Check for mistakes.
program_steps[sample.program_name] = sample.program_steps

exact_programs = sum(1 for outcomes in per_program.values() if all(outcomes))
Expand Down