fix: make observer/observed optional in conclusions query endpoint#492
fix: make observer/observed optional in conclusions query endpoint#492TreyDong wants to merge 1 commit intoplastic-labs:mainfrom
Conversation
Previously, query_conclusions required both observer and observed to be specified in filters, raising ValidationException if either was missing. This made the REST API awkward to use for ad-hoc queries and was inconsistent with the SDK (which always passes these params). Changes: - query_documents: observer/observed are now optional (default None) - _query_documents_pgvector: conditionally applies WHERE clause only when the param is not None (None = unfiltered) - query_conclusions router: removed the ValidationException guard, docstring updated to explain the new behavior Fixes the awkward UX where calling the conclusions query endpoint without observer/observed filters returned an opaque error instead of empty or cross-peer results.
WalkthroughThe PR makes Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/crud/document.py (2)
386-394:⚠️ Potential issue | 🔴 CriticalCritical: External vector store path will raise
ValueErrorwhenobserverorobservedisNone.The
query_documentsfunction now acceptsNoneforobserver/observed, but when the external vector store path is taken (i.e.,_uses_pgvector()returnsFalse), it callsquery_external_vector_document_idswhich explicitly raisesValueError("observer and observed are required for document namespaces")persrc/vector_store/__init__.py:110-112.This creates a runtime failure for any deployment using external vector stores (LanceDB, Turbopuffer) when clients omit observer/observed filters.
Options to resolve:
- Add a guard in
query_documentsto require observer/observed when using external vector stores- Update
query_external_vector_document_idsto support None values (if the stores can handle workspace-wide queries)- Fall back to pgvector path when observer/observed are None
🐛 Option 1: Guard against None in external vector store path
# External vector store — network call first, DB only for the ID fetch + if observer is None or observed is None: + raise ValidationException( + "observer and observed filters are required when using external vector stores" + ) document_ids = await query_external_vector_document_ids( workspace_name=workspace_name, observer=observer, observed=observed,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crud/document.py` around lines 386 - 394, The external-vector-store path currently calls query_external_vector_document_ids which requires non-None observer/observed, causing runtime errors when query_documents accepts None; fix query_documents by checking _uses_pgvector(workspace_name) before calling query_external_vector_document_ids and if it returns False and (observer is None or observed is None) raise a clear ValueError (or return a controlled error) stating that observer and observed are required for external vector stores; update the error message to reference query_external_vector_document_ids/_uses_pgvector so it's clear where the constraint comes from.
399-419:⚠️ Potential issue | 🟡 MinorType mismatch:
fetch_documents_by_idsexpects non-optionalobserver/observed.The
fetch_documents_by_idsfunction signature (lines 252-254) requiresobserver: strandobserved: stras non-optional parameters. PassingNonehere would cause SQLAlchemy to filter forobserver IS NULLrather than skipping the filter.Currently this is masked because the external vector store call fails first (as noted above), but if that's ever fixed, this function would also need updating for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crud/document.py` around lines 399 - 419, fetch_documents_by_ids requires non-optional observer: str and observed: str, but this call site may pass None; before invoking fetch_documents_by_ids (both branches where db is provided and inside the tracked_db block) validate that observer and observed are non-None and of type str (or coerce to a meaningful non-empty string) and if they are None raise a clear ValueError (or return an error) so SQLAlchemy won't accidentally filter for IS NULL; update the calls in this file to ensure observer and observed are validated and passed as strings to fetch_documents_by_ids (refer to symbols: fetch_documents_by_ids, observer, observed, db, tracked_db, managed_db).src/routers/conclusions.py (1)
102-122:⚠️ Potential issue | 🟠 MajorExternal vector store will fail with None observer/observed; docstring overpromises flexibility.
The updated docstring claims "omitting both is allowed," but this only works for pgvector deployments. When using an external vector store (Turbopuffer, LanceDB), the
query_external_vector_document_idsfunction passes observer/observed directly toget_vector_namespace, which raisesValueErrorat lines 110-113 ofsrc/vector_store/__init__.py: "observer and observed are required for document namespaces."The pgvector path correctly handles None values via conditional checks (lines 292-295 in
src/crud/document.py), but the external vector store path does not. Either restrict the API to require both fields, or update the external vector store query logic to handle optional observer/observed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/conclusions.py` around lines 102 - 122, The external vector store path fails when observer/observed are None because query_external_vector_document_ids passes them into get_vector_namespace which requires both; update the logic in crud.query_documents or query_external_vector_document_ids so external stores either (A) enforce observer and observed are present and raise a clear error (validate in query_documents before calling query_external_vector_document_ids), or (B) handle optional values like the pgvector branch does (check for None and avoid calling get_vector_namespace with None, using a workspace/global namespace or building namespaces only for provided fields); modify get_vector_namespace only if you choose option B to accept optional args, and reference these symbols: crud.query_documents (or query_documents), query_external_vector_document_ids, and get_vector_namespace when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/crud/document.py`:
- Around line 386-394: The external-vector-store path currently calls
query_external_vector_document_ids which requires non-None observer/observed,
causing runtime errors when query_documents accepts None; fix query_documents by
checking _uses_pgvector(workspace_name) before calling
query_external_vector_document_ids and if it returns False and (observer is None
or observed is None) raise a clear ValueError (or return a controlled error)
stating that observer and observed are required for external vector stores;
update the error message to reference
query_external_vector_document_ids/_uses_pgvector so it's clear where the
constraint comes from.
- Around line 399-419: fetch_documents_by_ids requires non-optional observer:
str and observed: str, but this call site may pass None; before invoking
fetch_documents_by_ids (both branches where db is provided and inside the
tracked_db block) validate that observer and observed are non-None and of type
str (or coerce to a meaningful non-empty string) and if they are None raise a
clear ValueError (or return an error) so SQLAlchemy won't accidentally filter
for IS NULL; update the calls in this file to ensure observer and observed are
validated and passed as strings to fetch_documents_by_ids (refer to symbols:
fetch_documents_by_ids, observer, observed, db, tracked_db, managed_db).
In `@src/routers/conclusions.py`:
- Around line 102-122: The external vector store path fails when
observer/observed are None because query_external_vector_document_ids passes
them into get_vector_namespace which requires both; update the logic in
crud.query_documents or query_external_vector_document_ids so external stores
either (A) enforce observer and observed are present and raise a clear error
(validate in query_documents before calling query_external_vector_document_ids),
or (B) handle optional values like the pgvector branch does (check for None and
avoid calling get_vector_namespace with None, using a workspace/global namespace
or building namespaces only for provided fields); modify get_vector_namespace
only if you choose option B to accept optional args, and reference these
symbols: crud.query_documents (or query_documents),
query_external_vector_document_ids, and get_vector_namespace when making the
change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7ebee6c1-a93c-4500-b94f-047adda7a86d
📒 Files selected for processing (2)
src/crud/document.pysrc/routers/conclusions.py
|
Hey @TreyDong — thanks for this PR! Currently, the I opened an issue to track this as a feature (#520) since the full solution touches the external vector store path (Turbopuffer/LanceDB) in a non-trivial way — namespace hashing requires both observer and observed, so omitting them crashes on those backends. There's also a type mismatch in fetch_documents_by_ids that would Feel free to weigh in there and/or try and tackle the improvements needed! |
|
Generally we want to stick to a strict boundary for conclusions queries to a singular representation. For external vector dbs such as turbopuffer or lancedb we enforce a strict |
Summary
Previously, the conclusions query endpoint required both observer and observed to be specified, raising ValidationException if either was missing.
Changes
Behavior Change
Summary by CodeRabbit
observerand/orobservedparameters to query across all peer relationships in the workspace.