Skip to content

feat: /understand-crossrepo — multi-repo interlinked knowledge graph#522

Open
Hafiz408 wants to merge 17 commits into
Egonex-AI:mainfrom
Hafiz408:feat/understand-crossrepo
Open

feat: /understand-crossrepo — multi-repo interlinked knowledge graph#522
Hafiz408 wants to merge 17 commits into
Egonex-AI:mainfrom
Hafiz408:feat/understand-crossrepo

Conversation

@Hafiz408

Copy link
Copy Markdown

/understand-crossrepo — multi-repo interlinked knowledge graph

Understand Anything analyzes one repo at a time. A microservice platform is many repos that communicate at runtime (HTTP calls, an SSO/auth provider, iframe embedding, a shared data hub) — links invisible to static import analysis. This adds a new command that combines several interlinked repos into one namespaced, per-repo-layered, cross-linked knowledge graph, explorable in the existing dashboard with no dashboard changes.

Pipeline (new skills/understand-crossrepo/SKILL.md, Phases 0–7)

  1. Select repos (args or interactive) + output dir.
  2. Reuse-or-fill — reuse each repo's fresh /understand graph (git-commit unchanged) or build it.
  3. Extract signalsextract-crossrepo-signals.mjs, a deterministic, stdlib-only scanner for outbound API hosts, Keycloak clients, iframe/ct_token embeds, Pub/Sub topics, GCS buckets, plus each repo's served identity.
  4. Combinecombine-graphs.py namespaces every node id (<type>:<repo>/<path>[:member]), unions + dedups, and builds one layer:<repo> + a module:<repo> anchor per repo.
  5. Linkagents/crossrepo-linker.md (LLM) matches one repo's outbound signals to another's identity and emits typed cross-repo edges (calls/authenticates_via/embeds/publishes/…) with confidence + evidence; self-grounds (no fabrication).
  6. Applyapply-interlinks.py synthesizes shared-infra nodes (service:external/<svc> + layer:external-shared-infra), applies edges (dedup, drop-dangling, low-confidence tag), assembles, and validates.

Verified

  • 31 unit tests (extractor 16 node:test, combine 7 + apply 8 pytest).
  • Live e2e on savo_pricing_ui + savo_pricing_service + savo_bridge_service: 508-node / 761-edge / 4-layer combined graph, 0 id collisions, 10 cross-repo edges (incl. pricing_ui → bridge calls and all repos → Keycloak), rendered in the dashboard with 0 graph/validation errors.

The graph is validated field-by-field against packages/core/src/schema.ts; the dashboard is untouched.

Hafiz408 and others added 16 commits June 26, 2026 19:08
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ark router ceiling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… ceiling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…point/service layer membership

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e + validate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… source, bind phase2 vars

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Lock files (package-lock.json, pnpm-lock.yaml, …) are huge and full of
registry/vcs URLs (github.com) that are never real integration signals.
Found during e2e: pricing_ui emitted 260 outbound signals, ~all github.com
from package-lock.json. Skip by exact basename (their .json/.yaml extensions
slip past the existing .lock BINARY_EXTS guard). pricing_ui 260→18 signals.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…yers

combine-graphs.py picked layer members by id-prefix (file/endpoint/service),
but the canonical graph validator treats config/document/pipeline/table/schema/
resource as file-level too — every such node must be in exactly one layer.
Found during e2e: 59 'File node not in any layer' validation issues (config,
document, pipeline, resource). Fix: select layer members by node.type against
the validator's full FILE_LEVEL_TYPES set (a pipeline node can carry a step:
id, so type — not id-prefix — is authoritative). Regression test covers a
config node and a pipeline node with a step: id. Final graph now validates 0 issues.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The assembler validated only against the inline .cjs, which doesn't check the
dashboard's stricter core/schema.ts. The e2e dashboard load dropped 270 edges/
nodes. Three real gaps, all fixed in apply-interlinks.py:
- project metadata: add analyzedAt + gitCommitHash (ProjectMetaSchema requires
  them; without them the graph fails fatally to load).
- node normalization: ensure every node has a valid complexity, and drop a
  string lineRange (some per-repo graphs store lineRange as a string, which
  fails node validation → the node is dropped → its edges cascade-drop).
- edge types: map linker types not in EdgeTypeSchema (authenticates_via, embeds)
  to depends_on, preserving the original semantic in description (which survives
  schema stripping; label does not). All other linker types already pass.
Dashboard now loads with 0 validation issues; all 4 layers + 10 cross-repo edges
render. Regression test test_dashboard_schema_requirements locks all three.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r order, drop dead regexes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… label

Ponytail trims (no behavior change, both verified by tests + e2e):
- tour steps no longer compute inline `order` (the sequential renumber after
  assembly is the single source of truth); removes misleading len()+2 arithmetic.
- cross-repo edges set only `description` (kept by the dashboard schema), not a
  duplicate `label` (stripped on load, read by nothing).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hafiz408 added a commit to Hafiz408/Understand-Anything that referenced this pull request Jun 26, 2026
…rection

QA across gemba/market_pulse/product_lens repo sets surfaced that some base
/understand graphs emit edges with weight>1 (e.g. 2) or invalid direction. The
dashboard's core/schema clamps them on load and shows an 'N auto-corrections'
banner (240/123 on gemba_ui/product_lens). The assembler already normalizes
nodes (lineRange/complexity); do the same for edges so the combined graph
renders clean on ANY source repos. Sets now validate with 0 issues (was 240/10/123).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant