Skip to content

Enhance search functionality with regex, FTS5, snippets, and sorting#252

Open
jesserobbins wants to merge 4 commits intowesm:mainfrom
jesserobbins:fix-search-final
Open

Enhance search functionality with regex, FTS5, snippets, and sorting#252
jesserobbins wants to merge 4 commits intowesm:mainfrom
jesserobbins:fix-search-final

Conversation

@jesserobbins
Copy link
Copy Markdown

@jesserobbins jesserobbins commented Apr 9, 2026

Fix and Improve search accuracy and performance across full-text, fast (Parquet), and aggregate query paths with word-boundary matching, better snippet handling, and consistent result sorting.

Problem

While importing my old 1990's era email archives, I discovered a number of issues with search

  1. Substring matching causes false positives — Text search matches partial strings within words, returning irrelevant results
  2. Body text invisible in fast search — Messages where search terms only appear in the body don't show in TUI unless user switches to deep search mode (which I feel is underdocumented but that's a separate issue)
  3. Inconsistent sort behavior — Search results don't respect the user's selected sort field (Name/Count/Size)

Proposed Solution

  • Word-boundary regex (\b) for text search to match exact words, not substrings
  • Prefix matching in FTS5 (* operator) for partial word matches within the deep search path
  • Snippet inclusion in fast search conditions so body text matches appear immediately in TUI

Changes

  • internal/query/sqlite.go — FTS5 prefix matching
  • internal/query/duckdb.go — Word-boundary regex,
    snippet inclusion, sort parameter, escapeRegex helper

Testing

  • Text search with word boundaries prevents false
    substring matches
  • Parquet fast search now catches body text matches

…et support, sorting

- Use word-boundary regex (\b) for text search to match whole words only
- Add FTS5 prefix matching (*) for partial word matches in deep search
- Include snippet field in fast search so body text matches appear in TUI
- Add MessageSorting parameter to Search and SearchFast methods
- Add escapeRegex helper function for DuckDB regex pattern escaping

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 9, 2026

roborev: Combined Review (945c419)

Summary verdict: Changes are not ready to merge due to a compile-breaking API migration and an ignored sorting parameter.

High

  • Location: internal/query/engine.go:31
  • Problem: Search, SearchFast, and SearchFastWithStats now require MessageSorting, but the change is not propagated to existing call sites and implementations, including internal/mcp/handlers.go:131, internal/api/handlers.go:1196, internal/tui/model.go:498, internal/remote/engine.go:575, and internal/query/querytest/mock_engine.go:99. The package will not compile.
  • Fix: Update all callers, the remote engine, mocks, and tests to pass or accept the new MessageSorting argument, or keep the old interface until the full call chain is migrated.

Medium

  • Location: internal/query/sqlite.go:1207
  • Problem: The new sorting argument is accepted but never used. Search results still hard-code ORDER BY m.sent_at DESC, and the DuckDB search/cache paths likewise keep fixed date-desc ordering. Callers passing size, subject, or ascending sort options will silently get the wrong order.
  • Fix: Thread sorting into the query builders and page cache helpers, generate a validated ORDER BY consistent with ListMessages, and include sorting in the DuckDB search cache key if cached result order can vary.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

…et support

- Use word-boundary regex (\b) for text search to match whole words only
- Add FTS5 prefix matching (*) for partial word matches in deep search
- Include snippet field in fast search so body text matches appear in TUI
- Add escapeRegex helper function for DuckDB regex pattern escaping

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@jesserobbins
Copy link
Copy Markdown
Author

Removed the incomplete sort stuff

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 10, 2026

roborev: Combined Review (d9e2366)

High-risk search regressions found: DuckDB regex boundaries miss punctuation-leading terms, and SQLite FTS5 can misparse unquoted special characters.

High

  • Location: internal/query/sqlite.go:1171-1175
    Problem: FTS5 search terms are only double-quoted when they contain a space. Terms containing FTS5 special characters such as -, :, (, or ) can be interpreted as query syntax instead of literals, causing syntax errors or incorrect results.
    Fix: Quote all FTS terms before appending the prefix operator, e.g. ftsTerms[i] = fmt.Sprintf("\"%s\"*", term), and remove the conditional strings.Contains(term, " ") branch.

  • Location: internal/query/duckdb.go:461-462, internal/query/duckdb.go:2393-2394
    Problem: Prefixing escaped search terms with \b causes DuckDB regex searches to miss terms that start with non-word characters, such as +15551234567, @gmail.com, #bug, or punctuation-prefixed tokens. Because \b requires a word/non-word transition, it fails when both the preceding character and the first character of the term are non-word characters. This regresses cases that previously matched via ILIKE.
    Fix: Only apply the word-boundary prefix when the term starts with a word character, or fall back to substring matching / a boundary expression that handles start-of-string and non-word separators for punctuation-leading terms.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

wesm and others added 2 commits April 9, 2026 20:36
- Always quote FTS5 search terms to prevent special characters
  (-, :, (, )) from being parsed as query syntax
- Only apply \b word boundary in DuckDB regex when term starts with
  a word character — non-word-leading terms (+, @, #) skip \b since
  it requires a word/non-word transition that fails at string start
- Update stale tests that still expected ILIKE patterns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TestSearch_WithFTS_SpecialChars: verifies FTS5 search with -, :, ()
  doesn't cause syntax errors (previously unquoted terms were parsed
  as FTS5 operators)
- TestBuildWhereClause_WordBoundaryPrefix: verifies \b is applied only
  for word-char-leading terms, and non-word-leading terms (+, @, #, ()
  skip the word boundary prefix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 10, 2026

roborev: Combined Review (e31a46d)

Verdict: No Medium, High, or Critical issues were reported.

All agents that provided findings agree the diff is clean; no actionable findings to include.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm wesm marked this pull request as ready for review April 10, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants