Skip to content

Mandate exhaustive fact extraction in text-to-fact criteria#150

Merged
justinjoy merged 1 commit into
mainfrom
extraction-exhaustiveness
Jun 26, 2026
Merged

Mandate exhaustive fact extraction in text-to-fact criteria#150
justinjoy merged 1 commit into
mainfrom
extraction-exhaustiveness

Conversation

@justinjoy

Copy link
Copy Markdown
Contributor

What

Adds a 완전성 원칙 (exhaustiveness) section to
skills/factlog/references/text-to-fact.md, the authoritative criteria the
agent follows when extracting candidate facts during /factlog sync and
/factlog add.

Why

Dense tables — participant rosters, financial/registry status, budget line
items, monthly schedules, career/patent records — are the highest-density fact
source in real documents, yet the prior criteria only said "record relation
candidates." In practice the agent skimmed the visible prose and silently
dropped repeated table rows. On a real proposal, a source with ~400 extractable
facts yielded only ~90 (≈20–25% coverage) before this change.

What it changes

  • Forbid sampling of repeated items ("대표 몇 개만" → extract all N).
  • Table → triple mapping rule: row key → subject, column header →
    relation, cell → object; per-year/unit context → note.
  • Judge coverage by section/table sweep, not converted-file byte size
    (HWP/office conversions are inflated by HTML table markup).
  • Pre-finish self-check ("any section/table/list not yet covered?").
  • Existing PII exclusions preserved; clarifies that business identifiers
    (사업자/법인등록번호) and financials are in-scope while personal contact info
    and birthdates stay out.

Docs/criteria only — no code paths touched; the file is read at extraction time
so the change is live without reinstall.

Add a "완전성 원칙" section so extraction sweeps every section and table
row-by-row instead of skimming prose. Dense tables (참여인력 명부, 재무·등기
현황, 예산 비목, 추진 일정, 경력·특허 실적) were the main silent-omission
source — narrative got captured while repeated table rows were dropped.

- forbid sampling ("대표 몇 개만") of repeated items
- table → triple mapping rule (row key→subject, header→relation, cell→object)
- judge coverage by section/table sweep, not converted-file byte size
- pre-finish self-check, with existing PII exclusions preserved
@justinjoy justinjoy merged commit fe814a8 into main Jun 26, 2026
3 checks passed
@justinjoy justinjoy deleted the extraction-exhaustiveness branch June 26, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant