Skip to content

fix: recognize batch-existing.json in incremental merge (graph no longer collapses to changed files)#527

Open
rodcon wants to merge 1 commit into
Egonex-AI:mainfrom
rodcon:fix/incremental-batch-existing-dropped
Open

fix: recognize batch-existing.json in incremental merge (graph no longer collapses to changed files)#527
rodcon wants to merge 1 commit into
Egonex-AI:mainfrom
rodcon:fix/incremental-batch-existing-dropped

Conversation

@rodcon

@rodcon rodcon commented Jun 30, 2026

Copy link
Copy Markdown

Problem

The incremental /understand path collapses the knowledge graph down to only the changed files.

skills/understand/SKILL.md (Phase 2, "Incremental update path") writes the pruned prior graph to batch-existing.json and states:

  1. Run the same merge script - it will combine batch-existing.json with the fresh batch-*.json files

But merge-batch-graphs.py discovers batch files and filters them with:

m = re.match(r"batch-(\d+)(?:-part-(\d+))?\.json", f.name)

batch-existing.json has no numeric index, so it does not match, is appended to unrecognized_batch_files, and is skipped at load:

unrecognized_set = set(unrecognized_batch_files)
...
if f.name in unrecognized_set:
    continue

Result: on every incremental update, all retained nodes/edges from the prior graph are dropped and the assembled graph shrinks to just the files that changed in that commit. The script prints a generic "unrecognized filenames will be DROPPED" warning, but the file it is dropping is the one the skill's own incremental path is documented to produce.

Fix

Recognize batch-existing.json explicitly and sort it first (key = -1) so the fresh per-file batches still override it during dedup (the merge keeps the last occurrence per node id):

  • _batch_sort_key returns -1 for batch-existing.json so it loads before the numbered batches.
  • The grouping loop handles batch-existing.json as a valid batch instead of marking it unrecognized.

Minimal change, one file, no behavior change for full analysis or for the existing batch-<N>.json / batch-<N>-part-<K>.json handling.

Verification

Standalone repro against the patched script:

  • batch-existing.json = files a, b, c + an imports edge (the retained prior graph)
  • batch-0.json = changed file d
assembled nodes edges
before file:d.py only (1) 0
after file:a.py, file:b.py, file:c.py, file:d.py (4) 1

Dedup still works (fresh batch overrides existing for a changed file, because existing loads first).

Notes

Happy to add a test in whatever harness you prefer for the Python scripts (I didn't see an existing pytest setup alongside the vitest suites - let me know and I'll wire one up).

The incremental /understand path (SKILL.md Phase 2) writes the pruned prior graph
to batch-existing.json and states the merge will combine it with the fresh
batch-*.json files. But merge-batch-graphs.py filters batch files with the regex
batch-(\d+)(?:-part-(\d+))?\.json, which batch-existing.json does not match - so it
is classed 'unrecognized' and skipped at load. Every retained node/edge is dropped
and the incremental update collapses the graph down to only the changed files.

Fix: recognize batch-existing.json explicitly and sort it first (key -1) so the
fresh per-file batches still override it during dedup (keep-last-occurrence).

Verified with a standalone repro (batch-existing.json with files a,b,c + an edge,
plus batch-0.json with changed file d): before, the assembled graph has only
file:d.py; after, all of a,b,c,d and the edge are preserved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant