Skip to content

fix: make observer/observed optional in conclusions query endpoint#492

Open
TreyDong wants to merge 1 commit intoplastic-labs:mainfrom
TreyDong:fix/conclusions-query-optional-observer-observed
Open

fix: make observer/observed optional in conclusions query endpoint#492
TreyDong wants to merge 1 commit intoplastic-labs:mainfrom
TreyDong:fix/conclusions-query-optional-observer-observed

Conversation

@TreyDong
Copy link
Copy Markdown

@TreyDong TreyDong commented Apr 5, 2026

Summary

Previously, the conclusions query endpoint required both observer and observed to be specified, raising ValidationException if either was missing.

Changes

  • query_documents: observer/observed now optional (default None)
  • _query_documents_pgvector: conditionally applies WHERE clause only when param is not None
  • query_conclusions router: removed ValidationException guard

Behavior Change

Scenario Before After
Without filters ValidationException Returns results
SDK path Works Works (unchanged)

Summary by CodeRabbit

  • New Features
    • Document and conclusion semantic search now supports optional peer filtering—users can omit observer and/or observed parameters to query across all peer relationships in the workspace.

Previously, query_conclusions required both observer and observed to be
specified in filters, raising ValidationException if either was missing.
This made the REST API awkward to use for ad-hoc queries and was
inconsistent with the SDK (which always passes these params).

Changes:
- query_documents: observer/observed are now optional (default None)
- _query_documents_pgvector: conditionally applies WHERE clause only when
  the param is not None (None = unfiltered)
- query_conclusions router: removed the ValidationException guard,
  docstring updated to explain the new behavior

Fixes the awkward UX where calling the conclusions query endpoint without
observer/observed filters returned an opaque error instead of empty or
cross-peer results.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 5, 2026

Walkthrough

The PR makes observer and observed parameters optional in document querying functions throughout the CRUD and routing layers. Conditional filtering is applied only when these values are provided, and validation requiring both parameters is removed from the router layer.

Changes

Cohort / File(s) Summary
CRUD Document Query Layer
src/crud/document.py
Updated query_documents and _query_documents_pgvector signatures to accept observer: str | None = None and observed: str | None = None. Conditional filters now append only when values are not None, allowing flexible peer filtering.
Router Conclusions Layer
src/routers/conclusions.py
Removed ValidationException handling that required both observer and observed to be present. Updated docstring to clarify optional filtering behavior; omitting both parameters returns conclusions across all peer relationships in the workspace.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Optional peers now dance with grace,
No longer bound by validation's embrace,
When None flows through, filters stay in place,
A flexible query finds its space! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: making observer/observed parameters optional in the conclusions query endpoint, which is the primary objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/crud/document.py (2)

386-394: ⚠️ Potential issue | 🔴 Critical

Critical: External vector store path will raise ValueError when observer or observed is None.

The query_documents function now accepts None for observer/observed, but when the external vector store path is taken (i.e., _uses_pgvector() returns False), it calls query_external_vector_document_ids which explicitly raises ValueError("observer and observed are required for document namespaces") per src/vector_store/__init__.py:110-112.

This creates a runtime failure for any deployment using external vector stores (LanceDB, Turbopuffer) when clients omit observer/observed filters.

Options to resolve:

  1. Add a guard in query_documents to require observer/observed when using external vector stores
  2. Update query_external_vector_document_ids to support None values (if the stores can handle workspace-wide queries)
  3. Fall back to pgvector path when observer/observed are None
🐛 Option 1: Guard against None in external vector store path
     # External vector store — network call first, DB only for the ID fetch
+    if observer is None or observed is None:
+        raise ValidationException(
+            "observer and observed filters are required when using external vector stores"
+        )
     document_ids = await query_external_vector_document_ids(
         workspace_name=workspace_name,
         observer=observer,
         observed=observed,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/crud/document.py` around lines 386 - 394, The external-vector-store path
currently calls query_external_vector_document_ids which requires non-None
observer/observed, causing runtime errors when query_documents accepts None; fix
query_documents by checking _uses_pgvector(workspace_name) before calling
query_external_vector_document_ids and if it returns False and (observer is None
or observed is None) raise a clear ValueError (or return a controlled error)
stating that observer and observed are required for external vector stores;
update the error message to reference
query_external_vector_document_ids/_uses_pgvector so it's clear where the
constraint comes from.

399-419: ⚠️ Potential issue | 🟡 Minor

Type mismatch: fetch_documents_by_ids expects non-optional observer/observed.

The fetch_documents_by_ids function signature (lines 252-254) requires observer: str and observed: str as non-optional parameters. Passing None here would cause SQLAlchemy to filter for observer IS NULL rather than skipping the filter.

Currently this is masked because the external vector store call fails first (as noted above), but if that's ever fixed, this function would also need updating for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/crud/document.py` around lines 399 - 419, fetch_documents_by_ids requires
non-optional observer: str and observed: str, but this call site may pass None;
before invoking fetch_documents_by_ids (both branches where db is provided and
inside the tracked_db block) validate that observer and observed are non-None
and of type str (or coerce to a meaningful non-empty string) and if they are
None raise a clear ValueError (or return an error) so SQLAlchemy won't
accidentally filter for IS NULL; update the calls in this file to ensure
observer and observed are validated and passed as strings to
fetch_documents_by_ids (refer to symbols: fetch_documents_by_ids, observer,
observed, db, tracked_db, managed_db).
src/routers/conclusions.py (1)

102-122: ⚠️ Potential issue | 🟠 Major

External vector store will fail with None observer/observed; docstring overpromises flexibility.

The updated docstring claims "omitting both is allowed," but this only works for pgvector deployments. When using an external vector store (Turbopuffer, LanceDB), the query_external_vector_document_ids function passes observer/observed directly to get_vector_namespace, which raises ValueError at lines 110-113 of src/vector_store/__init__.py: "observer and observed are required for document namespaces."

The pgvector path correctly handles None values via conditional checks (lines 292-295 in src/crud/document.py), but the external vector store path does not. Either restrict the API to require both fields, or update the external vector store query logic to handle optional observer/observed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/conclusions.py` around lines 102 - 122, The external vector store
path fails when observer/observed are None because
query_external_vector_document_ids passes them into get_vector_namespace which
requires both; update the logic in crud.query_documents or
query_external_vector_document_ids so external stores either (A) enforce
observer and observed are present and raise a clear error (validate in
query_documents before calling query_external_vector_document_ids), or (B)
handle optional values like the pgvector branch does (check for None and avoid
calling get_vector_namespace with None, using a workspace/global namespace or
building namespaces only for provided fields); modify get_vector_namespace only
if you choose option B to accept optional args, and reference these symbols:
crud.query_documents (or query_documents), query_external_vector_document_ids,
and get_vector_namespace when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/crud/document.py`:
- Around line 386-394: The external-vector-store path currently calls
query_external_vector_document_ids which requires non-None observer/observed,
causing runtime errors when query_documents accepts None; fix query_documents by
checking _uses_pgvector(workspace_name) before calling
query_external_vector_document_ids and if it returns False and (observer is None
or observed is None) raise a clear ValueError (or return a controlled error)
stating that observer and observed are required for external vector stores;
update the error message to reference
query_external_vector_document_ids/_uses_pgvector so it's clear where the
constraint comes from.
- Around line 399-419: fetch_documents_by_ids requires non-optional observer:
str and observed: str, but this call site may pass None; before invoking
fetch_documents_by_ids (both branches where db is provided and inside the
tracked_db block) validate that observer and observed are non-None and of type
str (or coerce to a meaningful non-empty string) and if they are None raise a
clear ValueError (or return an error) so SQLAlchemy won't accidentally filter
for IS NULL; update the calls in this file to ensure observer and observed are
validated and passed as strings to fetch_documents_by_ids (refer to symbols:
fetch_documents_by_ids, observer, observed, db, tracked_db, managed_db).

In `@src/routers/conclusions.py`:
- Around line 102-122: The external vector store path fails when
observer/observed are None because query_external_vector_document_ids passes
them into get_vector_namespace which requires both; update the logic in
crud.query_documents or query_external_vector_document_ids so external stores
either (A) enforce observer and observed are present and raise a clear error
(validate in query_documents before calling query_external_vector_document_ids),
or (B) handle optional values like the pgvector branch does (check for None and
avoid calling get_vector_namespace with None, using a workspace/global namespace
or building namespaces only for provided fields); modify get_vector_namespace
only if you choose option B to accept optional args, and reference these
symbols: crud.query_documents (or query_documents),
query_external_vector_document_ids, and get_vector_namespace when making the
change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7ebee6c1-a93c-4500-b94f-047adda7a86d

📥 Commits

Reviewing files that changed from the base of the PR and between e487358 and 427c674.

📒 Files selected for processing (2)
  • src/crud/document.py
  • src/routers/conclusions.py

@ajspig
Copy link
Copy Markdown
Contributor

ajspig commented Apr 7, 2026

Hey @TreyDong — thanks for this PR!

Currently, the /conclusions/query endpoint requires both observer and observedto be specified. Omitting them raises a ValidationException.

I opened an issue to track this as a feature (#520) since the full solution touches the external vector store path (Turbopuffer/LanceDB) in a non-trivial way — namespace hashing requires both observer and observed, so omitting them crashes on those backends. There's also a type mismatch in fetch_documents_by_ids that would
need addressing.

Feel free to weigh in there and/or try and tackle the improvements needed!

@VVoruganti
Copy link
Copy Markdown
Collaborator

Generally we want to stick to a strict boundary for conclusions queries to a singular representation. For external vector dbs such as turbopuffer or lancedb we enforce a strict collection boundary. Instead we would have to do a fanout query across multiple representations simultaneously and then rerank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants