Skip to content

feat(sources): source-authority scorer — P1 foundation (#111)#112

Merged
askalf merged 1 commit into
masterfrom
feat/source-authority-scorer
Jun 15, 2026
Merged

feat(sources): source-authority scorer — P1 foundation (#111)#112
askalf merged 1 commit into
masterfrom
feat/source-authority-scorer

Conversation

@askalf

@askalf askalf commented Jun 15, 2026

Copy link
Copy Markdown
Owner

First commit of #111 (source-authority ranking). A pure, deterministic per-URL trust scorer — no LLM, no network — that answers the orthogonal question to the lexical verifier: not "does the answer match its source?" but "is that source trustworthy?".

scoreAuthority(url) → { tier: 'primary'|'reputable'|'unknown'|'low', score, reason }

  • Boost-led (near-zero false positives): *.gov/*.edu/*.mil + .ac.* TLDs; academia/standards (arxiv, aclanthology, ietf, w3.org, nature…); official docs (docs.*/developer.* prefixes + curated vendor set). reputable = wikipedia/SO/github.
  • unknown is neutral, not punished — a niche-but-legit source shouldn't be downranked for being unrecognized.
  • low = small, conservative seed denylist of content farms seen in the 2026-06-15 dogfood (precision over recall).

Pure + unit-tested (9 cases: TLDs, subdomain matching, denylist-beats-boost, gov/edu look-alike false-positive guards, unparseable input, monotonic scores). Validated on real dogfood domains: aiflashreport.com/gpt0x.com → low, aws.amazon.com/federalreserve.gov/redis.io → primary, slingacademy.com → unknown.

Not yet wired into the keep loop or output — this is the foundational unit. Next: keep-stage ranking (rank candidates by authority before the maxSources cut) and the source-trust signal in --json/footer. No version bump (not user-facing until wired).

Pure, deterministic per-URL trust scorer — the foundation for ranking which
fetched sources to keep and for a source-trust signal distinct from citation
support. Boost-led: gov/edu/standards/academia/official-docs = primary,
wikipedia/SO/github = reputable; unknown is neutral (niche-but-legit not
punished); low is a conservative seed denylist of observed content farms.
Not yet wired into the keep loop — next commit. Validated on the 2026-06-15
dogfood domains: content farms -> low, AWS/Fed/redis docs -> primary.
@askalf askalf merged commit 953414c into master Jun 15, 2026
5 checks passed
@askalf askalf deleted the feat/source-authority-scorer branch June 15, 2026 21:01
askalf added a commit that referenced this pull request Jun 18, 2026
…(P1 of #111) (#113)

Wires the #112 scoreAuthority module — until now a pure, unused export —
into the fetch keep-stage so authoritative/primary sources win the limited
fetch slots ahead of whatever search happened to rank first.

- source-authority.ts: add rankByAuthority(items, urlOf, mode) +
  SourceAuthorityMode. "prefer" (default) stable-sorts by authority and
  drops nothing; "strict" also drops known content farms, with a min-keep
  floor so a niche/recency round that only surfaces farms isn't zeroed out;
  "off" is identity.
- agent.ts: rank this round's candidates before the slot-limited slice.
- config.ts / cli.ts: --source-authority / DEEPDIVE_SOURCE_AUTHORITY
  (prefer|strict|off), default prefer; flag validated, env falls back.
- 5 unit tests for rankByAuthority (prefer ordering + stability, strict
  drop + min-keep floor, off identity). Full build + config/agent/cli
  suites green (117 pass / 0 fail).

Default is "prefer" per the issue — a reorder-only boost that drops
nothing, so it cannot starve a query. P2 (trust signal in output/--json)
and P3 (bench metric) follow.
askalf added a commit that referenced this pull request Jun 18, 2026
Ships the source-authority layer (#112/#113/#114): deterministic, no-LLM
domain scoring (scoreAuthority), keep-stage ranking that gives authoritative
sources the limited fetch slots (--source-authority prefer|strict|off), and a
source-trust signal in the footer + --json — orthogonal to citation support,
so a fully-cited answer built on content farms is flagged instead of silently
confident. Version bump + CHANGELOG + README doctor sample only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant