Skip to content

[codex] preserve SRS across core dictionary updates#51

Merged
harumiWeb merged 5 commits intomainfrom
fix/update-db
Apr 11, 2026
Merged

[codex] preserve SRS across core dictionary updates#51
harumiWeb merged 5 commits intomainfrom
fix/update-db

Conversation

@harumiWeb
Copy link
Copy Markdown
Owner

@harumiWeb harumiWeb commented Apr 11, 2026

Summary

  • preserve SRS progress across bundled core dictionary version updates by diff-syncing core words on normalized lemma/pos
  • retire removed bundled core rows with words.is_active and exclude inactive rows from future session planning
  • abandon any active session during a bundled core version bump so resumed questions cannot drift after the underlying dictionary changes
  • roll the user-visible changes into the 0.7.0 changelog entry

Why

The previous bundled core reseed path wiped learning tables on every dict_version bump, so users lost SRS progress when bundled words were added. The follow-up review also found that keeping active sessions across a live dictionary sync was unsafe because questions are rebuilt from current words rows and current distractor candidates on resume.

User impact

Users keep their existing SRS history for matched bundled core words when the embedded core dictionary updates, see only newly added bundled words as new material, and do not get dropped into broken or drifted in-progress sessions after a dictionary update.

Validation

  • go test ./internal/store ./internal/app ./internal/quiz ./cmd/eitango
  • go test ./...
  • pre-commit goimports
  • pre-commit lint

Closes #49


Open with Devin

Summary by CodeRabbit

  • New Features

    • Dictionary version updates now preserve SRS progress and word identity for matched core words; removed core words are kept for history but excluded from new lesson planning.
    • Counts and planning now ignore retired (inactive) core words; active-only selection drives due/new lists.
  • Bug Fixes

    • Active learning sessions present during a bundled core bump are marked abandoned so they won’t resume with drifted prompts/choices.
  • Documentation

    • Specs, ADRs, and diagnostics updated to describe non-destructive core sync and reseed behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 11, 2026

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be368963-1202-4e71-89dc-50c5e95ead0a

📥 Commits

Reviewing files that changed from the base of the PR and between 1481a28 and c80373c.

📒 Files selected for processing (2)
  • internal/store/word_write.go
  • tasks/lessons.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • tasks/lessons.md

📝 Walkthrough

Walkthrough

When bundled core dictionary version changes, core words are synchronized non-destructively: matched words keep their IDs and SRS progress, new words are inserted, removed core words are marked inactive (kept for history but excluded from planning), and any active session at sync time is marked abandoned.

Changes

Cohort / File(s) Summary
Changelog & specs
CHANGELOG.md, tasks/feature_spec.md, tasks/lessons.md, tasks/todo.md
Document v0.7.0 release and specify non-destructive bundled-core sync, session-abandonment on version bump, active/inactive semantics, and related acceptance tests.
DB migration
assets/migrations/006_words_is_active.sql
Adds words.is_active INTEGER NOT NULL DEFAULT 1 and new indexes to support active-word filtering.
ADR / design
docs/adr/0002-bundled-core-dictionary-lifecycle.md
Updates ADR to define sync-by-(normalized lemma,pos), preserve word_id/progress, retire unmatched core rows (is_active=0), and retain reset --reseed as destructive.
Core sync & write logic
internal/store/word_write.go, internal/store/migrate.go
Implements syncCoreWordsTx, in-batch key validation, new types/helpers, upsert/insert/update now set is_active, and retire unmatched core rows. SeedWords now calls sync on version bump and abandons active sessions.
Query filtering & session handling
internal/store/word_repo.go
Planning/selection queries now filter w.is_active = 1. AbandonActiveSession uses explicit transaction and abandonActiveSessionsTx helper.
Diagnostics / doctor
internal/store/doctor.go, internal/store/doctor_test.go
Doctor inspects schema for is_active, runs metadata/quizability diagnostics against active words, reports active vs retired counts, and supports PRAGMA table_info fallback. Tests updated to cover retired-word behavior.
Tests (store & app)
internal/store/store_test.go, internal/app/cmds_test.go
New and modified tests: preserve matched-core progress, abandon active sessions on version bump, retire removed core words from planning; test schemas and seeding updated to include is_active.
Store persistence changes
internal/store/word_write.go (helpers/signatures)
Added existingWordRow, syncCoreWordCounts, countWordsBySourceActive, validateEntryKeys; changed insert/update signatures to include is_active.

Sequence Diagram

sequenceDiagram
    participant App
    participant Store
    participant SessionRepo as Session Repo
    participant WordSync as Word Sync
    participant AppMeta as App Meta

    App->>Store: SeedWords(dict.CoreWordsVersion)
    Store->>Store: Check currentVersion vs version
    alt Version changed and core words present
        Store->>SessionRepo: abandonActiveSessionsTx()
        activate SessionRepo
        SessionRepo->>SessionRepo: UPDATE sessions SET status='abandoned', finished_at=now()
        deactivate SessionRepo

        Store->>WordSync: syncCoreWordsTx(entries)
        activate WordSync
        WordSync->>WordSync: Load existing core rows keyed by normalized(lemma,pos)
        WordSync->>WordSync: UPDATE matched rows (preserve id, update metadata, is_active=1)
        WordSync->>WordSync: INSERT new core rows (is_active=1)
        WordSync->>WordSync: UPDATE unmatched prev core -> is_active=0
        WordSync->>Store: return counts (inserted/updated/retired)
        deactivate WordSync
    end
    Store->>AppMeta: Set dict_version
    Store->>Store: Commit transaction
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through rows both new and old,
I kept the IDs, their stories told.
New sprouts I planted, retired ones slept,
Active sessions gently swept—
Now learning grows, preserved and bold.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[codex] preserve SRS across core dictionary updates' clearly and concisely summarizes the main change: preserving spaced repetition learning progress during bundled core dictionary updates.
Linked Issues check ✅ Passed All coding requirements from issue #49 are met: is_active column added, non-destructive core sync implemented, inactive words excluded from planning queries, reset --reseed remains destructive, and regression tests validate preservation of word_id/progress.
Out of Scope Changes check ✅ Passed All changes are directly related to preserving SRS across core dictionary updates: schema migration, sync logic, planning queries, diagnostics, and comprehensive test coverage with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/update-db

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@harumiWeb harumiWeb requested a review from Copilot April 11, 2026 13:06
@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 11, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 104 complexity · 19 duplication

Metric Results
Complexity 104
Duplication 19

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

@harumiWeb harumiWeb marked this pull request as ready for review April 11, 2026 13:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Preserves users’ SRS progress across bundled core dictionary (dict_version) updates by diff-syncing core words on normalized lemma/pos, retiring removed core rows via words.is_active, and preventing resumed sessions from drifting after a core update.

Changes:

  • Replace destructive core reseed-on-version-bump with transactional diff sync that preserves word_id/progress, inserts new core words, and retires removed ones (is_active=0).
  • Update session planning / candidate-selection queries and doctor diagnostics to treat only active words as eligible, while keeping historical access to retired words.
  • Add migrations + regression tests (including abandoning active sessions on version bump) and update ADR/changelog/spec notes.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tasks/todo.md Tracks completion of issue #49 work items.
tasks/lessons.md Documents the session-drift risk and the “abandon on version bump” policy.
tasks/feature_spec.md Defines required behavior/acceptance for non-destructive core sync + retirement.
internal/store/word_write.go Implements core diff sync, key validation, and supports is_active in inserts/updates.
internal/store/word_repo.go Filters planning queries to is_active=1; adds shared tx helper for abandoning active sessions.
internal/store/migrate.go Updates SeedWords() to no-op on same version and to diff-sync + abandon sessions on version bump.
internal/store/store_test.go Adds/updates regression tests for preserved progress, session abandonment, and retirement behavior.
internal/store/doctor.go Makes doctor resilient to pre-006 schema and reports using active/retired core counts; ignores retired words in certain checks.
internal/store/doctor_test.go Expands test coverage for retired-word behavior and legacy-schema scenarios.
internal/app/cmds_test.go Updates test schema to include words.is_active.
docs/adr/0002-bundled-core-dictionary-lifecycle.md Updates lifecycle ADR to reflect diff-sync + retirement + reset semantics.
CHANGELOG.md Adds 0.7.0 entry describing the user-visible behavior changes.
assets/migrations/006_words_is_active.sql Adds words.is_active column and supporting indexes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tasks/lessons.md (1)

42-42: Move the version-bump behavior contract to docs/specs, keep lessons as a recurrence rule.

This line currently mixes a permanent session-resume contract with the lesson text. Keep the anti-pattern/rule here, and place the normative behavior (abandon or snapshot requirement) in docs/specs/*.

As per coding guidelines tasks/lessons.md は再発防止ルールを記録する場所であり、設計判断や仕様そのものを置く場所として使わないこと.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tasks/lessons.md` at line 42, The line in tasks/lessons.md mixes a normative
session-resume contract with a recurrence rule; remove the specification text
describing how sessions must behave on a version bump (the "abandoned"
transition and the "question snapshot" requirement) from tasks/lessons.md and
instead add a new document under docs/specs (e.g., a version-bump or
session-resume spec) that normatively defines the behavior for version bumps,
including the conditions to mark a session "abandoned" or the requirement to
store a "question snapshot" / fix prompt/choices; leave tasks/lessons.md as a
short recurrence/anti-pattern note only (keep the rule that session resume
should not rely on live words rows or distractor reloads) and update the wording
to remove implementation/contract language.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tasks/lessons.md`:
- Line 42: The line in tasks/lessons.md mixes a normative session-resume
contract with a recurrence rule; remove the specification text describing how
sessions must behave on a version bump (the "abandoned" transition and the
"question snapshot" requirement) from tasks/lessons.md and instead add a new
document under docs/specs (e.g., a version-bump or session-resume spec) that
normatively defines the behavior for version bumps, including the conditions to
mark a session "abandoned" or the requirement to store a "question snapshot" /
fix prompt/choices; leave tasks/lessons.md as a short recurrence/anti-pattern
note only (keep the rule that session resume should not rely on live words rows
or distractor reloads) and update the wording to remove implementation/contract
language.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fb1f039e-c206-471a-93ab-b585bb6f542b

📥 Commits

Reviewing files that changed from the base of the PR and between b8c7b39 and 35cca24.

📒 Files selected for processing (13)
  • CHANGELOG.md
  • assets/migrations/006_words_is_active.sql
  • docs/adr/0002-bundled-core-dictionary-lifecycle.md
  • internal/app/cmds_test.go
  • internal/store/doctor.go
  • internal/store/doctor_test.go
  • internal/store/migrate.go
  • internal/store/store_test.go
  • internal/store/word_repo.go
  • internal/store/word_write.go
  • tasks/feature_spec.md
  • tasks/lessons.md
  • tasks/todo.md

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/store/doctor.go (2)

524-570: ⚠️ Potential issue | 🟡 Minor

Exclude invalid frequency_rank values from duplicate-rank checks.

Lines 442-465 already treat frequency_rank <= 0 as missing, but these new queries only exclude NULL. That means multiple rows with rank 0 now get reported twice: once as missing metadata and again as duplicate ranks. Please filter to valid positive ranks here as well.

Suggested fix
-  WHERE frequency_rank IS NOT NULL
+  WHERE frequency_rank > 0
-  WHERE is_active = 1 AND frequency_rank IS NOT NULL
+  WHERE is_active = 1 AND frequency_rank > 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/store/doctor.go` around lines 524 - 570, The duplicate-rank SQL
(duplicateRankCountQuery and duplicateRankSampleQuery) currently only excludes
NULL ranks and thus still counts non-positive ranks like 0; update the WHERE
clauses used both in the base queries and the hasIsActive branch to require
frequency_rank > 0 (e.g., replace/augment "frequency_rank IS NOT NULL" with
"frequency_rank IS NOT NULL AND frequency_rank > 0" or simply "frequency_rank >
0") so only valid positive ranks are considered when checking duplicates; ensure
both duplicateRankCountQuery and duplicateRankSampleQuery strings (including the
versions inside the hasIsActive conditional) are changed consistently.

959-968: ⚠️ Potential issue | 🟡 Minor

Use an active-specific empty-state message here.

With is_active present, wordCount is counting active rows only. If the DB contains only retired words, this now reports words table is empty, which is misleading. Please change this to something like no active words are available for quizability.

Suggested fix
 if wordCount == 0 {
-	return diagnosticCheckError("quizability", "words table is empty")
+	summary := "words table is empty"
+	if hasIsActive {
+		summary = "no active words are available for quizability"
+	}
+	return diagnosticCheckError("quizability", summary)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/store/doctor.go` around lines 959 - 968, The empty-state error
message is misleading when hasIsActive is true because wordCount then reflects
only active rows; update the branch that handles wordCount == 0 in the doctor
check to return a different diagnosticCheckError message when hasIsActive is
true (e.g., "no active words are available for quizability") and keep the
existing "words table is empty" message when hasIsActive is false; locate the
logic around wordCountQuery, hasIsActive, s.countRows and the wordCount == 0
return and swap the returned message based on hasIsActive.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/store/doctor.go`:
- Around line 524-570: The duplicate-rank SQL (duplicateRankCountQuery and
duplicateRankSampleQuery) currently only excludes NULL ranks and thus still
counts non-positive ranks like 0; update the WHERE clauses used both in the base
queries and the hasIsActive branch to require frequency_rank > 0 (e.g.,
replace/augment "frequency_rank IS NOT NULL" with "frequency_rank IS NOT NULL
AND frequency_rank > 0" or simply "frequency_rank > 0") so only valid positive
ranks are considered when checking duplicates; ensure both
duplicateRankCountQuery and duplicateRankSampleQuery strings (including the
versions inside the hasIsActive conditional) are changed consistently.
- Around line 959-968: The empty-state error message is misleading when
hasIsActive is true because wordCount then reflects only active rows; update the
branch that handles wordCount == 0 in the doctor check to return a different
diagnosticCheckError message when hasIsActive is true (e.g., "no active words
are available for quizability") and keep the existing "words table is empty"
message when hasIsActive is false; locate the logic around wordCountQuery,
hasIsActive, s.countRows and the wordCount == 0 return and swap the returned
message based on hasIsActive.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a152d00e-cdc5-4746-899c-e0684fc7c3e3

📥 Commits

Reviewing files that changed from the base of the PR and between 35cca24 and 1481a28.

📒 Files selected for processing (2)
  • internal/store/doctor.go
  • tasks/lessons.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • tasks/lessons.md

@harumiWeb harumiWeb merged commit 945adc3 into main Apr 11, 2026
7 of 9 checks passed
@harumiWeb harumiWeb deleted the fix/update-db branch April 11, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preserve SRS progress across bundled core dictionary updates

2 participants