Skip to content

feat: add index persistence#140

Merged
stephantul merged 7 commits into
mainfrom
add-persistence
May 22, 2026
Merged

feat: add index persistence#140
stephantul merged 7 commits into
mainfrom
add-persistence

Conversation

@stephantul

Copy link
Copy Markdown
Contributor

This PR adds index persistence for use in the CLI, which is a feature requested by several users.

This is a big PR! Sorry for that. This PR:

  • Adds a new index command to the CLI.

You can now index a repository as follows:

semble index -o "my_index"

and then search using:

semble search "where is persistency defined?" --index "my_index"

This greatly speeds up subsequent searches. Loading a decently-sized index takes 200-400ms.

  • Adds persistence to the embedding backend. This was necessary because we override the basicbackend a little weirdly. Something we can think about doing differently
  • Adds persistence to the index itself. All components are saved in subfolders, except the model. For the model, we save the name of the model. To facilitate saving itself, I added helpers to Chunk. These are thin wrappers around asdict and a dictionary expansion.
  • Removed the Encoder protocol: this no longer made sense because we use the saving and loading methods in model2vec.

The ugly part of this is that I chose to refactor a large part of the code: we now now longer pass a model to the index when building it. Instead I use a path to the model. This path is then used to load the model, and reverts to the default model when None. This is a more elegant construction I think, since this allows us to store the model path, and also cache model loading more efficiently.

Follow-up tasks:

  • Not all benchmarks work, but the most important ones (i.e., the regular one and ablations) still work.

@codecov

codecov Bot commented May 21, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/semble/__init__.py 100.00% <100.00%> (ø)
src/semble/cli.py 100.00% <100.00%> (ø)
src/semble/index/create.py 100.00% <100.00%> (ø)
src/semble/index/dense.py 100.00% <100.00%> (ø)
src/semble/index/index.py 100.00% <100.00%> (ø)
src/semble/index/types.py 100.00% <100.00%> (ø)
src/semble/mcp.py 100.00% <100.00%> (ø)
src/semble/search.py 100.00% <100.00%> (ø)
src/semble/types.py 100.00% <100.00%> (ø)
src/semble/utils.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Pringled Pringled left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice, some minor things

Comment thread src/semble/index/index.py Outdated
Comment thread src/semble/index/index.py Outdated
Comment thread src/semble/index/index.py
Comment thread src/semble/cli.py
Comment thread src/semble/index/index.py
Comment thread src/semble/mcp.py
@stephantul stephantul mentioned this pull request May 22, 2026
@stephantul stephantul merged commit b0111ac into main May 22, 2026
15 checks passed
@stephantul stephantul deleted the add-persistence branch May 22, 2026 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants