Skip to content

feat: inline Google Sheets data source integration#830

Open
cmd-err wants to merge 1 commit into
juspay:releasefrom
cmd-err:feat/data-sources-inline
Open

feat: inline Google Sheets data source integration#830
cmd-err wants to merge 1 commit into
juspay:releasefrom
cmd-err:feat/data-sources-inline

Conversation

@cmd-err

@cmd-err cmd-err commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Inline DataSourceRef config stored in template.data_sources JSONB
  • No separate data_source table, CRUD endpoints, or accessor layers
  • Content fetched at call time with Redis cache (60s TTL, shared by content hash)
  • Prefetch at dispatch for cache warming (fire-and-forget)
  • inject_as="var"template_vars[{name}] variable substitution
  • inject_as="message" → prepended system message
  • Discovery routes (/data-sources/sheets/*) for sheet exploration (admin-only)
  • _ds_messages returned separately from load_template() — survives playground overrides

Changed Files (20 files, +723/-25)

  • New: services/google/sheets.py — Google Sheets API service (markdown/csv/json)
  • New: managers/data_source_prefetch.py — background cache warming
  • New: api/routers/data_sources/ — discovery routes (admin-only)
  • New: migrations/034ALTER TABLE template ADD data_sources JSONB
  • Modified: types.pyDataSourceRef with inline config + data_sources on TemplateModel
  • Modified: loader.py — Layer 5 content injection
  • Modified: flow.py_ds_messages propagation
  • Modified: worker.py — prefetch at dispatch
  • Modified: pyproject.toml — add google-api-python-client

Summary by CodeRabbit

Release Notes

New Features

  • Added Google Sheets data source integration—templates can now reference Google Sheets to populate conversation data dynamically
  • New admin APIs for discovering sheets, listing columns, and previewing data from spreadsheets
  • Automatic data caching to optimize performance when using sheet data sources
  • Support for multiple output formats (markdown, CSV, JSON) when injecting sheet data into conversations

Copilot AI review requested due to automatic review settings June 15, 2026 10:54
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd8895a6-9c60-4eab-aec7-57715af6bde4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Adds Google Sheets as an inline data source for Breeze Buddy templates. A new DataSourceRef schema attaches spreadsheet configurations to templates, persisted via a DB migration and JSONB column. A Google Sheets service fetches and formats sheet data; the template loader injects it as template variables or system messages. A prefetch manager warms Redis before calls, and three admin REST endpoints enable spreadsheet discovery.

Changes

Google Sheets Data Sources Feature

Layer / File(s) Summary
Data source types and response schemas
app/ai/voice/agents/breeze_buddy/template/types.py, app/schemas/breeze_buddy/data_source.py, app/services/data_sources.py
DataSourceFormat enum and DataSourceRef model define inline spreadsheet configs. TemplateModel, CreateTemplateRequest, and ReplaceTemplateRequest gain data_sources fields. TabsResponse, ColumnsResponse, and PreviewResponse schemas added for the discovery API. DATA_SOURCE_UNAVAILABLE sentinel constant introduced.
Google Sheets service
app/services/google/sheets.py, pyproject.toml
New service module authenticates via service account, extracts spreadsheet IDs from URLs, and provides list_tabs, get_column_headers, fetch_sheet_data, and fetch_formatted (markdown/csv/json). All Google API calls run in a thread executor with safe error fallbacks. google-api-python-client dependency added.
DB migration, queries, decoder, and accessor
app/database/migrations/034_add_data_sources_to_template.sql, app/database/queries/breeze_buddy/template.py, app/database/decoder/breeze_buddy/template.py, app/database/accessor/breeze_buddy/template.py
Migration adds a nullable JSONB data_sources column. All SELECT queries include it. create_template_query and replace_template_query accept and persist data_sources with ::jsonb casting and renumbered SQL parameters. Decoder parses JSON strings; accessors serialize via json.dumps.
Template loader injection and prefetch manager
app/ai/voice/agents/breeze_buddy/template/loader.py, app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py
FlowConfigLoader.load_template now returns _ds_messages, concurrently fetches active DataSourceRef entries via _fetch_one_data_source (Redis cache → fetch_formatted with timeout → DATA_SOURCE_UNAVAILABLE fallback), and injects content as template variables or system-role messages. prefetch_data_sources preloads all data sources into Redis (60s TTL, 5s per-fetch timeout) concurrently without raising.
Agent state, flow wiring, and dispatch prefetch trigger
app/ai/voice/agents/breeze_buddy/agent/__init__.py, app/ai/voice/agents/breeze_buddy/agent/flow.py, app/ai/voice/agents/breeze_buddy/dispatch/worker.py
Agent initializes self.ds_messages and unpacks it from load_template_config in Daily and telephony paths. build_flow_config receives ds_messages and stores it under _data_source_messages. Dispatch worker fires prefetch_data_sources as a background task when template.data_sources is set.
Admin discovery endpoints and template handler wiring
app/api/routers/breeze_buddy/data_sources/__init__.py, app/api/routers/breeze_buddy/data_sources/handlers.py, app/api/routers/breeze_buddy/__init__.py, app/api/routers/breeze_buddy/templates/handlers.py
Admin-only GET endpoints /data-sources/sheets/tabs, /columns, and /preview added, each enforcing require_admin. Handlers validate spreadsheet URLs and delegate to Sheets service functions. Template create/replace handlers forward data_sources to persistence. Router registered on main Breeze Buddy APIRouter.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(135, 206, 235, 0.5)
    Note over DispatchWorker,Redis: Background prefetch at dispatch time
    DispatchWorker->>prefetch_data_sources: asyncio.create_task (template.data_sources)
    prefetch_data_sources->>fetch_formatted: per DataSourceRef (timeout 5s, concurrent)
    fetch_formatted-->>prefetch_data_sources: formatted string
    prefetch_data_sources->>Redis: SET cache_key → content (TTL 60s)
  end

  rect rgba(144, 238, 144, 0.5)
    Note over Agent,FlowConfigLoader: Template loading at call time
    Agent->>load_template_config: self.lead
    load_template_config->>FlowConfigLoader: load_template
    FlowConfigLoader->>_fetch_one_data_source: per DataSourceRef
    _fetch_one_data_source->>Redis: GET cache_key
    Redis-->>_fetch_one_data_source: cached content (or miss → fetch_formatted)
    _fetch_one_data_source-->>FlowConfigLoader: content string
    FlowConfigLoader-->>load_template_config: template, config, vars, ds_messages
    load_template_config-->>Agent: template, config, vars, ds_messages
    Agent->>build_flow_config: ds_messages=self.ds_messages
    build_flow_config-->>Agent: flow_config with _data_source_messages
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hop hop, the spreadsheets are near,
Redis warms the cache before each call is clear.
A DataSourceRef blooms on the template's vine,
Markdown tables and CSV rows align.
The bunny fetches data with a five-second grace,
[Data unavailable]? — a soft landing in place! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: inline Google Sheets data source integration' accurately and concisely describes the main change: adding inline Google Sheets data sources to templates without a separate table.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/api/routers/breeze_buddy/templates/handlers.py (1)

130-145: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Normalize data_sources before persistence and preserve existing values when PUT omits the field.

template_data.data_sources is a list of Pydantic models; forwarding it directly to the accessor risks serialization failure on json.dumps. Also, in replace flow, passing None when the field is omitted can unintentionally wipe existing data_sources for older clients doing unrelated edits.

Suggested fix
+def _dump_data_sources(data_sources):
+    if data_sources is None:
+        return None
+    return [
+        ds.model_dump(exclude_none=True) if hasattr(ds, "model_dump") else ds
+        for ds in data_sources
+    ]
...
-        template = await create_template(
+        create_data_sources = _dump_data_sources(template_data.data_sources)
+        template = await create_template(
             ...
-            data_sources=template_data.data_sources,
+            data_sources=create_data_sources,
             now=now,
         )
...
-        updated_template = await replace_template(
+        effective_data_sources = (
+            template_data.data_sources
+            if template_data.data_sources is not None
+            else existing_template.data_sources
+        )
+        updated_template = await replace_template(
             ...
-            data_sources=template_data.data_sources,
+            data_sources=_dump_data_sources(effective_data_sources),
             now=now,
         )

Also applies to: 526-540

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/api/routers/breeze_buddy/templates/handlers.py` around lines 130 - 145,
The `template_data.data_sources` parameter contains Pydantic models which cannot
be directly serialized by json.dumps during persistence. Normalize the
data_sources by converting each Pydantic model to a dictionary (using .dict() or
.model_dump() method) before passing it to the create_template function call.
Additionally, when the data_sources field is omitted or None in a PUT/replace
request, preserve the existing data_sources value from the current template
instead of passing None to create_template, to avoid unintentionally wiping out
the stored value for clients making unrelated edits. Apply these fixes to both
the POST handler (create_template call around lines 130-145) and the PUT handler
(update_template call around lines 526-540).
🧹 Nitpick comments (4)
app/services/google/sheets.py (1)

42-67: ⚡ Quick win

Add explicit helper return types and concrete dict typing.

Line 42 and Line 66 are missing return annotations, and Line 138 uses List[dict] instead of a concrete Dict[str, Any] shape. This weakens static checks on the service boundary.

Suggested patch
-from typing import List, Optional
+from typing import Any, Dict, List, Optional
...
-def _get_sheets_session():
+def _get_sheets_session() -> Optional[AuthorizedSession]:
...
-def _get_json(session: AuthorizedSession, url: str, params: Optional[dict] = None):
+def _get_json(
+    session: AuthorizedSession,
+    url: str,
+    params: Optional[Dict[str, str]] = None,
+) -> Dict[str, Any]:
...
-) -> List[dict]:
+) -> List[Dict[str, Any]]:

As per coding guidelines: “Add type hints on all function signatures” and “Use Optional[T], List[T], Dict[str, Any], Union for type hints.”

Also applies to: 133-139

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/services/google/sheets.py` around lines 42 - 67, Add explicit return type
annotations to the functions _get_sheets_session() and _get_json() to strengthen
static type checking. The _get_sheets_session() function should have a return
type of Optional[AuthorizedSession] since it returns either an AuthorizedSession
instance or None. The _get_json() function needs an appropriate return type
annotation based on what it returns. Additionally, update the imprecise typing
at lines 133-139 where List[dict] is used and replace it with List[Dict[str,
Any]] to provide concrete dictionary structure typing and align with the coding
guidelines for explicit type hints.

Source: Coding guidelines

app/api/routers/breeze_buddy/data_sources/__init__.py (1)

31-37: ⚡ Quick win

Add explicit return annotations on route handlers.

Line 31, Line 41, and Line 54 should declare concrete return types to keep API contracts type-checkable.

Suggested patch
 async def get_sheet_tabs(
     spreadsheet_url: str = Query(..., description="Full Google Sheets URL"),
     current_user: UserInfo = Depends(get_current_user_with_rbac),
-):
+) -> TabsResponse:
...
 async def get_sheet_columns(
...
-):
+) -> ColumnsResponse:
...
 async def preview_sheet(
...
-):
+) -> PreviewResponse:

As per coding guidelines: “Add type hints on all function signatures.”

Also applies to: 41-50, 54-65

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/api/routers/breeze_buddy/data_sources/__init__.py` around lines 31 - 37,
The route handlers get_sheet_tabs (lines 31-37), the function at lines 41-50,
and the function at lines 54-65 are missing explicit return type annotations on
their function signatures. Add a concrete return type annotation to each of
these async functions by specifying what the handler actually returns. For each
function, determine the appropriate return type based on what the handler
function (such as list_tabs_handler) returns and add it to the function
signature using the arrow syntax after the closing parenthesis.

Source: Coding guidelines

app/ai/voice/agents/breeze_buddy/agent/flow.py (1)

30-30: ⚡ Quick win

Use explicit dict key/value typing on ds_messages signatures.

Line 30 and Line 130 use List[Dict], which is too broad for this cross-module contract and reduces type-check value.

Proposed fix
 async def load_template_config(
     lead: LeadCallTracker,
-) -> tuple[TemplateModel, Optional[ConfigurationModel], Dict[str, str], List[Dict]]:
+) -> tuple[
+    TemplateModel,
+    Optional[ConfigurationModel],
+    Dict[str, str],
+    List[Dict[str, Any]],
+]:
@@
 def build_flow_config(
     flow_builder: FlowConfigBuilder,
     template: TemplateModel,
-    ds_messages: Optional[List[Dict]] = None,
+    ds_messages: Optional[List[Dict[str, Any]]] = None,
 ) -> tuple[Dict[str, Any], List, Any]:

As per coding guidelines, "Use Optional[T], List[T], Dict[str, Any], Union for type hints."

Also applies to: 130-130

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/agents/breeze_buddy/agent/flow.py` at line 30, Replace the
overly broad List[Dict] type hint with List[Dict[str, Any]] to provide explicit
key and value typing for the ds_messages parameter in the function signatures.
This needs to be updated in the return type annotation at line 30 and also at
line 130 where this same pattern appears, ensuring the cross-module contract has
proper type clarity as per coding guidelines.

Source: Coding guidelines

app/database/queries/breeze_buddy/template.py (1)

65-67: ⚡ Quick win

Add explicit timestamp type hints on new signature params.

Line 66, Line 67, and Line 345 introduce new defaulted parameters without type annotations, which weakens static validation for query contracts.

Proposed fix
+from datetime import datetime
 from typing import Any, Dict, List, Optional, Tuple
@@
 def create_template_query(
@@
-    created_at=None,
-    updated_at=None,
+    created_at: Optional[datetime] = None,
+    updated_at: Optional[datetime] = None,
 ) -> Tuple[str, List[Any]]:
@@
 def replace_template_query(
@@
-    updated_at=None,
+    updated_at: Optional[datetime] = None,
 ) -> Tuple[str, List[Any]]:

As per coding guidelines, "Add type hints on all function signatures."

Also applies to: 344-345

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/database/queries/breeze_buddy/template.py` around lines 65 - 67, Add
explicit type hints to the parameters introduced without type annotations. In
app/database/queries/breeze_buddy/template.py at the anchor site (lines 66-67),
add appropriate type annotations to the created_at and updated_at parameters
that currently only have default values of None. Similarly, apply the same type
hint fixes at the sibling site (lines 344-345) where the same parameters appear.
The parameters should follow the pattern of other timestamp parameters in the
codebase and use Optional typing since they default to None. This ensures
consistency with the coding guideline requiring type hints on all function
signatures and strengthens static validation for query contracts.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py`:
- Around line 51-53: The redis.setex call in the data_source_prefetch.py file is
missing the required namespace parameter, which violates the coding guidelines
and can cause key collisions across services sharing the same Redis instance.
Add the namespace parameter to the setex call alongside the existing cache_key,
content, and ttl_seconds parameters to ensure proper key isolation across
services.

In `@app/ai/voice/agents/breeze_buddy/template/loader.py`:
- Around line 317-333: Replace the direct redis.get() call with the namespaced
redis_get() helper that includes a namespace parameter to prevent key collisions
across services, passing cache_key and an appropriate namespace identifier.
Additionally, after the fetch_formatted() function successfully returns the
sheet content, immediately persist that content back to cache using the
corresponding namespaced redis_set() helper with the same namespace parameter
and cache_key to ensure the fetched data is available for subsequent cache hits.
- Around line 223-225: The issue is that malformed spreadsheet URLs can raise an
exception during the `spreadsheet_url.split("/d/")[1]` operation before the
protected fetch block, and this exception is currently skipped with `continue`
in the condition checking `isinstance(result, Exception)`. Instead of silently
skipping malformed URLs, wrap the URL parsing logic in a try-except block and
set the result to `DATA_SOURCE_UNAVAILABLE` when an exception occurs during the
split operation, ensuring that the fallback value is used rather than allowing
the variable to disappear entirely.
- Around line 303-304: The combined inline import statement `import hashlib,
json` in the `_fetch_one_data_source` function does not comply with isort
formatting standards. Split the combined import into separate import statements
on individual lines, so `hashlib` and `json` are each imported on their own
line. This will resolve the isort CI check failure.

In `@app/services/google/sheets.py`:
- Around line 75-90: The _fetch() function in the Sheets module obtains a
session from _get_sheets_session() but never closes it, causing resource leaks.
Ensure the session is properly closed after use by wrapping the session usage in
a try-finally block or using a context manager if available. Close the session
in the finally block or after the try block completes, ensuring cleanup happens
regardless of whether an exception occurs. This same fix needs to be applied to
all occurrences of _fetch() functions that follow this pattern (at the locations
mentioned in the review) to prevent socket and file descriptor leaks.

---

Outside diff comments:
In `@app/api/routers/breeze_buddy/templates/handlers.py`:
- Around line 130-145: The `template_data.data_sources` parameter contains
Pydantic models which cannot be directly serialized by json.dumps during
persistence. Normalize the data_sources by converting each Pydantic model to a
dictionary (using .dict() or .model_dump() method) before passing it to the
create_template function call. Additionally, when the data_sources field is
omitted or None in a PUT/replace request, preserve the existing data_sources
value from the current template instead of passing None to create_template, to
avoid unintentionally wiping out the stored value for clients making unrelated
edits. Apply these fixes to both the POST handler (create_template call around
lines 130-145) and the PUT handler (update_template call around lines 526-540).

---

Nitpick comments:
In `@app/ai/voice/agents/breeze_buddy/agent/flow.py`:
- Line 30: Replace the overly broad List[Dict] type hint with List[Dict[str,
Any]] to provide explicit key and value typing for the ds_messages parameter in
the function signatures. This needs to be updated in the return type annotation
at line 30 and also at line 130 where this same pattern appears, ensuring the
cross-module contract has proper type clarity as per coding guidelines.

In `@app/api/routers/breeze_buddy/data_sources/__init__.py`:
- Around line 31-37: The route handlers get_sheet_tabs (lines 31-37), the
function at lines 41-50, and the function at lines 54-65 are missing explicit
return type annotations on their function signatures. Add a concrete return type
annotation to each of these async functions by specifying what the handler
actually returns. For each function, determine the appropriate return type based
on what the handler function (such as list_tabs_handler) returns and add it to
the function signature using the arrow syntax after the closing parenthesis.

In `@app/database/queries/breeze_buddy/template.py`:
- Around line 65-67: Add explicit type hints to the parameters introduced
without type annotations. In app/database/queries/breeze_buddy/template.py at
the anchor site (lines 66-67), add appropriate type annotations to the
created_at and updated_at parameters that currently only have default values of
None. Similarly, apply the same type hint fixes at the sibling site (lines
344-345) where the same parameters appear. The parameters should follow the
pattern of other timestamp parameters in the codebase and use Optional typing
since they default to None. This ensures consistency with the coding guideline
requiring type hints on all function signatures and strengthens static
validation for query contracts.

In `@app/services/google/sheets.py`:
- Around line 42-67: Add explicit return type annotations to the functions
_get_sheets_session() and _get_json() to strengthen static type checking. The
_get_sheets_session() function should have a return type of
Optional[AuthorizedSession] since it returns either an AuthorizedSession
instance or None. The _get_json() function needs an appropriate return type
annotation based on what it returns. Additionally, update the imprecise typing
at lines 133-139 where List[dict] is used and replace it with List[Dict[str,
Any]] to provide concrete dictionary structure typing and align with the coding
guidelines for explicit type hints.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 92c67258-3944-477e-8919-32411565e2e8

📥 Commits

Reviewing files that changed from the base of the PR and between 81d67b9 and eabe99d.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • app/ai/voice/agents/breeze_buddy/agent/__init__.py
  • app/ai/voice/agents/breeze_buddy/agent/flow.py
  • app/ai/voice/agents/breeze_buddy/dispatch/worker.py
  • app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py
  • app/ai/voice/agents/breeze_buddy/template/loader.py
  • app/ai/voice/agents/breeze_buddy/template/types.py
  • app/api/routers/breeze_buddy/__init__.py
  • app/api/routers/breeze_buddy/data_sources/__init__.py
  • app/api/routers/breeze_buddy/data_sources/handlers.py
  • app/api/routers/breeze_buddy/templates/handlers.py
  • app/database/accessor/breeze_buddy/template.py
  • app/database/decoder/breeze_buddy/template.py
  • app/database/migrations/034_add_data_sources_to_template.sql
  • app/database/queries/breeze_buddy/template.py
  • app/schemas/breeze_buddy/data_source.py
  • app/services/data_sources.py
  • app/services/google/__init__.py
  • app/services/google/sheets.py
  • pyproject.toml

Comment on lines +51 to +53
redis = await get_redis_service()
await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL)
logger.info(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add Redis namespace on cache write.

Line 52 writes with setex(...) but omits namespace, which can collide keys across services sharing Redis.

Suggested patch
-        await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL)
+        await redis.setex(
+            cache_key,
+            content,
+            ttl_seconds=_CACHE_TTL,
+            namespace="breeze_buddy",
+        )

As per coding guidelines: “Always use namespace parameter in redis_get/redis_set calls to prevent key collisions across services.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
redis = await get_redis_service()
await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL)
logger.info(
redis = await get_redis_service()
await redis.setex(
cache_key,
content,
ttl_seconds=_CACHE_TTL,
namespace="breeze_buddy",
)
logger.info(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/agents/breeze_buddy/managers/data_source_prefetch.py` around
lines 51 - 53, The redis.setex call in the data_source_prefetch.py file is
missing the required namespace parameter, which violates the coding guidelines
and can cause key collisions across services sharing the same Redis instance.
Add the namespace parameter to the setex call alongside the existing cache_key,
content, and ttl_seconds parameters to ensure proper key isolation across
services.

Source: Coding guidelines

Comment on lines +223 to +225
for ref, result in zip(template_obj.data_sources, contents):
if isinstance(result, Exception) or result is None:
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Malformed spreadsheet URLs can silently skip injection instead of falling back.

spreadsheet_url.split("/d/")[1] can raise before the protected fetch block. In load_template, that exception is currently skipped (continue), so the variable/message may disappear entirely rather than using DATA_SOURCE_UNAVAILABLE.

Suggested fix
-        spreadsheet_id = ref.spreadsheet_url.split("/d/")[1].split("/")[0]
+        try:
+            spreadsheet_id = ref.spreadsheet_url.split("/d/")[1].split("/")[0]
+        except Exception:
+            return DATA_SOURCE_UNAVAILABLE
...
-                if isinstance(result, Exception) or result is None:
-                    continue
+                if isinstance(result, Exception) or result is None:
+                    result = DATA_SOURCE_UNAVAILABLE

Also applies to: 308-309

🧰 Tools
🪛 Ruff (0.15.15)

[warning] 223-223: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/agents/breeze_buddy/template/loader.py` around lines 223 - 225,
The issue is that malformed spreadsheet URLs can raise an exception during the
`spreadsheet_url.split("/d/")[1]` operation before the protected fetch block,
and this exception is currently skipped with `continue` in the condition
checking `isinstance(result, Exception)`. Instead of silently skipping malformed
URLs, wrap the URL parsing logic in a try-except block and set the result to
`DATA_SOURCE_UNAVAILABLE` when an exception occurs during the split operation,
ensuring that the fallback value is used rather than allowing the variable to
disappear entirely.

Comment on lines +303 to +304
import hashlib, json

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix CI-blocking isort import formatting in _fetch_one_data_source.

The combined inline import fails current isort checks.

Suggested fix
-        import hashlib, json
+        import hashlib
+        import json
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/agents/breeze_buddy/template/loader.py` around lines 303 - 304,
The combined inline import statement `import hashlib, json` in the
`_fetch_one_data_source` function does not comply with isort formatting
standards. Split the combined import into separate import statements on
individual lines, so `hashlib` and `json` are each imported on their own line.
This will resolve the isort CI check failure.

Source: Pipeline failures

Comment on lines +317 to +333
redis = await get_redis_service()
cached = await redis.get(cache_key)
if cached:
return cached
except Exception:
pass

try:
return await asyncio.wait_for(
fetch_formatted(
spreadsheet_id=spreadsheet_id,
sheet_name=ref.sheet_name,
columns=ref.columns,
format=ref.format.value,
),
timeout=0.8,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use namespaced Redis helpers and persist fetched content on cache miss.

The current path reads Redis directly and never writes the freshly fetched sheet content back, so runtime caching can be bypassed and keys can collide across services.

As per coding guidelines: "Always use namespace parameter in redis_get/redis_set calls to prevent key collisions across services."

🧰 Tools
🪛 Ruff (0.15.15)

[error] 321-322: try-except-pass detected, consider logging the exception

(S110)


[warning] 321-321: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/agents/breeze_buddy/template/loader.py` around lines 317 - 333,
Replace the direct redis.get() call with the namespaced redis_get() helper that
includes a namespace parameter to prevent key collisions across services,
passing cache_key and an appropriate namespace identifier. Additionally, after
the fetch_formatted() function successfully returns the sheet content,
immediately persist that content back to cache using the corresponding
namespaced redis_set() helper with the same namespace parameter and cache_key to
ensure the fetched data is available for subsequent cache hits.

Source: Coding guidelines

Comment on lines +75 to +90
def _fetch():
session = _get_sheets_session()
if not session:
return []
try:
result = _get_json(
session,
f"https://sheets.googleapis.com/v4/spreadsheets/{spreadsheet_id}",
params={"fields": "sheets.properties.title"},
)
return [sheet["properties"]["title"] for sheet in result.get("sheets", [])]
except Exception as e:
logger.error(f"Error listing tabs for {spreadsheet_id}: {e}")
return []

return await asyncio.get_running_loop().run_in_executor(None, _fetch)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Close per-request Sheets sessions after use.

On Line 76, Line 100, and Line 142, each _fetch() allocates an authenticated HTTP session but never closes it. Over time this can leak sockets/file descriptors and degrade worker availability.

Suggested patch
     def _fetch():
         session = _get_sheets_session()
         if not session:
             return []
         try:
             ...
         except Exception as e:
             logger.error(...)
             return []
+        finally:
+            session.close()

Also applies to: 99-130, 141-185

🧰 Tools
🪛 Ruff (0.15.15)

[warning] 86-86: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/services/google/sheets.py` around lines 75 - 90, The _fetch() function in
the Sheets module obtains a session from _get_sheets_session() but never closes
it, causing resource leaks. Ensure the session is properly closed after use by
wrapping the session usage in a try-finally block or using a context manager if
available. Close the session in the finally block or after the try block
completes, ensuring cleanup happens regardless of whether an exception occurs.
This same fix needs to be applied to all occurrences of _fetch() functions that
follow this pattern (at the locations mentioned in the review) to prevent socket
and file descriptor leaks.

@cmd-err cmd-err force-pushed the feat/data-sources-inline branch 4 times, most recently from 974542d to 8ca5cd3 Compare June 15, 2026 11:36
is_active: bool = True
data_sources: Optional[List[DataSourceRef]] = Field(
default=None,
description="Inline data source configs attached to this template",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟥 [BLOCKING — test/CI regression] New data_sources field breaks the field-coverage test

Adding TemplateModel.data_sources without registering it fails tests/test_field_reference_coverage.py::test_all_fields_present_in_reference:

AssertionError: 1 field(s) exist in models but are missing from field_reference.json:
    TemplateModel.data_sources

Verified by running the suite against this branch: 1 failed, 423 passed — this is the only failure, and it's introduced by this PR. The test enforces that every Breeze Buddy template model field is documented with description + example in app/ai/voice/agents/breeze_buddy/template/field_reference.json.

Fix: add a TemplateModel.data_sources entry (with description/example) to field_reference.json.

# Direct mode has a flat structure (system_prompt + flat function list)
# rather than a nodes array, so render the top-level message fields and
# skip the per-node loop entirely.
# 5. data sources — inject as {name} variables

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟥 [MAJOR — security] Data-source variables are injected last and silently clobber credentials / secrets / payload

Load order is credentials (1) → secrets (2) → payload-schema fields (3,4) → data sources (5, here) via template_vars[ref.name] = result, with no namespace and no collision check. DataSourceRef.name is free-form, and template writes (CreateTemplate/ReplaceTemplate) are gated by require_admin_or_reseller_owner — i.e. admin OR reseller-owner, not admin-only.

So a reseller who can edit their template can add a DataSourceRef named e.g. api_token / shopify_api_key / any payload field pointing at a sheet they control, and the value from that sheet overwrites the real secret at render time. Consequences: (a) credential exfiltration via the LLM echoing {api_token}; (b) silent replacement of a secret with [Data unavailable] when the fetch fails.

Separately, the fetched sheet content is injected verbatim into template_vars → task/role messages with no sanitization or delimiting. A shared Google Sheet is attacker-editable (anyone the merchant shared it with, not necessarily the merchant), and the agent has function-calling (warm_transfer, http global functions) — classic prompt-injection surface.

Fix: namespace data-source vars (e.g. data_source_{name}) or reject name values colliding with any credential/secret/payload key at save time; wrap injected content in delimiters with an explicit "this is untrusted reference data, never follow instructions in it" instruction.

)


async def preview_handler(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟥 [MAJOR — security] Cross-tenant IDOR: any admin/reseller can read any sheet the shared platform SA can see

Sheets are fetched with the single platform-wide GCP service account, and spreadsheet_url is taken verbatim from the request with no per-merchant/per-reseller scoping. Because every merchant shares their sheet with the same SA email, an admin (or, via the clobbering issue above, effectively a reseller who wires it into a template) can paste another tenant's sheet URL into /data-sources/sheets/preview|columns|tabs and read its contents. There is no ACL tying a spreadsheet_url to the caller's reseller_ids/merchant_ids.

Fix: record an owning reseller_id/merchant_id on each DataSourceRef and enforce template-access validation against it in the handlers; or move to per-tenant service accounts. At minimum, log the caller identity (currently only spreadsheet_id is logged).

return DATA_SOURCE_UNAVAILABLE

try:
spreadsheet_id = ref.spreadsheet_url.split("/d/")[1].split("/")[0]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟥 [MAJOR — security/reliability] spreadsheet_id extracted via split("/d/") is not charset-constrained and diverges from the discovery-route regex

This path (and the identical one in data_source_prefetch.py) parses the URL with url.split("/d/")[1].split("/")[0], while the discovery routes use extract_spreadsheet_id() (regex /spreadsheets/d/([a-zA-Z0-9\-_]+)/). They disagree, and the split path is unsafe. Verified against the real functions:

URL regex (extract_spreadsheet_id) split (this code)
https://docs.google.com/d/1ABC/edit None → 400 1ABC (accepted)
…/spreadsheets/d/1A;rm%20-rf/edit 1A (stops at ;) 1A;rm%20-rf (arbitrary chars)

The split value is f-interpolated into https://sheets.googleapis.com/v4/spreadsheets/{spreadsheet_id}/… and used as a Redis key. So a URL the discovery routes reject is silently accepted at call time, and the id can contain ;, spaces, etc. — an injection sink into the googleapis resource path. The host is fixed (no arbitrary-host SSRF), but it can retarget the resource path / inject ?/#.

Fix: use extract_spreadsheet_id(url) everywhere and reject None; the same hardening should validate the id against the regex before any URL build in sheets.py.

columns=ref.columns,
format=ref.format.value,
),
timeout=0.8,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟥 [MAJOR — reliability] wait_for(..., timeout=0.8) cannot cancel the blocking executor call; thread + session leak, and 0.8s is unrealistic on a cache miss

fetch_formattedfetch_sheet_data runs its blocking HTTP via run_in_executor(None, _fetch). When wait_for times out it cancels the awaiting coroutine, but a run_in_executor task cannot be cancelled — the thread keeps running the Google HTTP call (up to the 10s socket timeout) and AuthorizedSession.close() in the _fetch finally never fires until the thread finishes, so sockets linger. Under repeated timeouts (very likely — see below) on the shared default executor, this competes with and can starve other run_in_executor users.

Also: 0.8s is far below realistic latency for a cache miss. The cold path does token exchange (often 200–500ms) plus the values request, and two round-trips when sheet_name is None (metadata then values). The prefetch uses 5.0s; the loader using 0.8s means the data source will silently fall back to [Data unavailable] for most cold fetches — the whole feature is effectively dependent on the fire-and-forget prefetch (see the GC comment on worker.py) having already succeeded.

Fix: raise to ~3–5s (match prefetch) or drop the per-call timeout in favor of a shared deadline; move _fetch to a bounded dedicated ThreadPoolExecutor; consider a single get/batchGet for metadata+values.

)
cache_key = _cache_key(ref)
redis = await get_redis_service()
await redis.setex(cache_key, content, ttl_seconds=_CACHE_TTL)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟥 [MAJOR — reliability] The error sentinel [Data unavailable] is written to Redis with a 60s TTL

fetch_formatted returns DATA_SOURCE_UNAVAILABLE on any error or empty sheet, and _prefetch_one stores it unconditionally here. The loader's own write path guards with if content != DATA_SOURCE_UNAVAILABLE — but the prefetch path does not. So a single transient blip at dispatch time (Google 5xx, a slow token exchange, a 0.8s timeout upstream) poisons the cache for the full 60s, and every lead dialed in that window renders [Data unavailable] into its prompt even if the sheet recovers seconds later.

Fix: skip setex when content == DATA_SOURCE_UNAVAILABLE; optionally cache a short (~5s) negative marker to avoid hammering Google, but never the 60s positive TTL.

if template:
template = apply_playground_overrides(locked, template)
if template.data_sources:
asyncio.create_task(prefetch_data_sources(template=template))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟧 [MAJOR — reliability] Fire-and-forget task with no reference held → silent GC

asyncio.create_task(...) returns a Task that is assigned to nothing. CPython's asyncio keeps only a weak reference to tasks, so the GC may collect it mid-flight (especially under memory pressure), emitting RuntimeWarning: coroutine 'prefetch_data_sources' was never awaited and silently aborting the prefetch. I grepped — no reference is retained anywhere. This is the entire point of the dispatch-time warming; losing it silently degrades to the 0.8s loader-time fetch (which itself usually times out). The two bugs compound: prefetch silently dropped → loader 0.8s timeout → [Data unavailable].

Fix: retain a reference, e.g. a module-level set with a done_callback that discards, or store on the Worker instance (self._prefetch_tasks.add(task); task.add_done_callback(self._prefetch_tasks.discard)).


return records
except Exception as e:
logger.error(f"Error fetching data for {spreadsheet_id}/{sheet_name}: {e}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟥 [MAJOR — operability] Empty sheet vs. every error class are indistinguishable

fetch_sheet_data catches all exceptions and returns []: 403 (sheet not shared with the SA), 404 (deleted), network error, JSON error — and a legitimately empty sheet all collapse to [], then fetch_formatted maps that to [Data unavailable]. The merchant can't tell "I forgot to share the sheet with the SA" from "the sheet is empty" from "Google is down", and only a generic exception string is logged (no HTTP status). The sharing step is manual, so 403 will be the single most common real-world failure — and it presents as an opaque "Data unavailable" with no actionable signal.

Fix: raise/return distinct signals for auth (403), not-found (404), and genuinely-empty; surface a structured, status-coded reason in logs; for the 403-not-shared case consider a merchant-facing hint.

Comment thread app/services/google/sheets.py Outdated
return match.group(1) if match else None


def _get_sheets_session():

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟧 [MAJOR — performance/quota] A new AuthorizedSession (+ JSON credential parse + token exchange) is built on every fetch, with no token caching

Every cache miss does json.loads(GOOGLE_CREDENTIALS_JSON) + service_account.Credentials.from_service_account_info(...) + constructs a fresh AuthorizedSession, then .close()s it in finally. AuthorizedSession lazily mints an OAuth2 token on first request, so each cold fetch = (token round-trip + values round-trip), and back-to-back fetches re-mint tokens. Under the unbounded asyncio.gather concurrency (no Semaphore), many threads exchange tokens simultaneously — wasteful and capable of tripping the SA's per-minute auth quota.

Fix: build one module-level AuthorizedSession lazily and reuse it — google.auth refresh is thread-safe and the session caches the token until expiry.

return await asyncio.get_running_loop().run_in_executor(None, _fetch)


def _rows_to_markdown_table(headers: List[str], rows: List[dict]) -> str:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟧 [MAJOR — correctness] Markdown table cells/headers aren't escaped — a | or newline produces a malformed table

Verified against the real _rows_to_markdown_table: a cell value pick A | B renders as | pick A | B | yes | — a phantom third column — and a newline in a cell splits the row across lines. The LLM sees a broken table. (The CSV path handles this correctly via csv.writer.)

| q | a |
| --- | --- |
| pick A | B | yes |     <- one cell became two columns

Fix: escape |\| and \n → space (or <br>) in cell/header values in _rows_to_markdown_table.

ALTER TABLE template ADD COLUMN data_sources JSONB DEFAULT NULL;

COMMENT ON COLUMN template.data_sources IS
'Array of inline DataSourceRef: [{name, inject_as, spreadsheet_url, sheet_name?, columns?, format?, is_active}]';

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟧 [MAJOR — docs/contract] Migration comment references a non-existent inject_as field; PR description advertises features that aren't implemented

The column COMMENT documents [{name, inject_as, spreadsheet_url, ...}], but DataSourceRef has no inject_as field (fields are name, spreadsheet_url, sheet_name, columns, format, is_active). grep confirms inject_as appears only in this SQL comment.

This matches the PR description, which claims:

  • inject_as="message" → prepended system message
  • Modified: flow.py — _ds_messages propagation
  • "_ds_messages returned separately from load_template() — survives playground overrides"

None of that exists in the diff. The actual implementation only ever injects as template_vars[{name}] (the inject_as="var" path), flow.py's change is only deleted comments/docstrings, and there is no _ds_messages anywhere. So the shipped behavior is a subset of what's described.

Fix: either implement the message injection mode + _ds_messages, or correct the PR description and drop inject_as from this COMMENT so the persisted schema docs match the model.

@narsimhaReddyJuspay

Copy link
Copy Markdown
Contributor

PR #830 review — inline Google Sheets data sources

Verdict: Request changes. The design (inline data_sources JSONB, content-hash Redis cache, dispatch-time prefetch) is reasonable and the SQL layer is clean, but there is 1 CI-blocking test regression and several major reliability + security issues I'd want addressed before merge. 11 inline comments posted; summary + verified runtime results below.


✅ Verified by actually running it

  • Server boot: started FastAPI on 127.0.0.1:8137 (free port) with the merged env. Lifespan completed — Postgres pool + Redis connected, background scheduler (chat cleanup, backlog reconcile, stuck-lead reap, dispatch monitor) all running against live infra.
  • Health: /healthhealthy; /health/databaseconnected.
  • New routes wired + protected: /data-sources/sheets/{tabs,columns,preview} all return 403 without authget_current_user_with_rbac + require_admin enforced. All 3 registered in /openapi.json.
  • Unit suite: 423 passed, 1 failed, 1 xfailed. The 1 failure is introduced by this PR — see inline comment on types.py (missing TemplateModel.data_sources in field_reference.json).
  • CI gates: black --check ✅ · isort --check ✅ · autofly/autoflake ✅ · pyrefly check0 errors (CI mode, **/automatic/** excluded) · 0 errors on the new files.

🟥 Critical / blocking

  1. Field-coverage test regression (TemplateModel.data_sources undocumented) — inline on types.py.

🟥 Major — security

  1. Data-source vars clobber credentials/secrets/payload (injected last, no collision check; reseller-writable) + unsanitized sheet content → prompt injection — inline on loader.py:176.
  2. Cross-tenant IDOR via the shared platform SA (no per-tenant scoping of spreadsheet_url) — inline on handlers.py.
  3. Unsafe spreadsheet_id extraction (split accepts ;/spaces; diverges from the regex used by discovery) — inline on loader.py:237. Demonstrated against the real functions.

🟥 Major — reliability

  1. wait_for(timeout=0.8) can't cancel the executor thread → thread/session leak; 0.8s is far below cold Google API latency — inline on loader.py:269.
  2. Error sentinel [Data unavailable] cached for 60s (prefetch writes it unconditionally) — inline on data_source_prefetch.py:52.
  3. Fire-and-forget create_task with no reference → silent GC — inline on worker.py:346.

🟧 Major — operability / perf / correctness

  1. Empty sheet vs. every error (403/404/network) indistinguishable → all become [Data unavailable] — inline on sheets.py:186.
  2. Per-call credential parse + new AuthorizedSession, no token caching — inline on sheets.py:42.
  3. Markdown table doesn't escape |/newlines (phantom column; demonstrated) — inline on sheets.py:194.
  4. Stale inject_as in migration COMMENT + PR description advertises inject_as="message" / _ds_messages that aren't implemented — inline on migration 034.

🟨 Minor (not posted inline)

  • No tests added for the new feature — cache-key construction, the 3 format converters, extract_spreadsheet_id, and the empty/error→sentinel paths are all unit-testable and currently uncovered. Given the bugs above, coverage here would help.
  • Unbounded asyncio.gather over data sources (no Semaphore) — N sheets = N simultaneous executor threads + Google calls. Bound it.
  • max_rows omitted from the cache key — currently safe only because both paths use the default 500; fragile. Also the cache-key logic is duplicated between data_source_prefetch.py and loader.py (extract to one shared helper, include max_rows).
  • get_templates_list_query / get_all_templates_by_outbound_number_id_query now SELECT data_sources but neither list consumer reads it — wasted JSONB I/O on list/IVR-selection paths (which are hot). Drop it from those SELECTs.
  • Discovery router mounted under the # Public demo (unauthenticated) section comment — cosmetic, auth is enforced per-route, but move/relabel to avoid confusion.
  • A requested column not present in the sheet is silently filled with "" — log a warning so merchants notice typos.

👍 What looks good

  • Query placeholder/column counts are exactly right across all 7 query functions (no off-by-one), ::jsonb casts present, RETURNING clauses updated.
  • require_admin raises 403 (verified); redis get/setex signatures match the call sites exactly (verified).
  • GOOGLE_CREDENTIALS_JSON defaults to "" and is handled gracefully (won't crash boot); credentials are never logged.
  • decode_template JSONB handling is consistent with the existing flow/secrets columns; DEFAULT NULL migration semantics are correct for opt-in.
  • Read-only Sheets scope; URL interpolation is host-locked to sheets.googleapis.com.

Suggested pre-merge order

  1. Fix the field_reference.json test regression (unblocks CI). 2. Don't cache the error sentinel (reliability). 3. Retain the prefetch task ref + raise the loader timeout (makes the feature actually work). 4. Namespace data-source vars / block secret collisions + delimit injected content (security). 5. Route URL parsing through extract_spreadsheet_id everywhere. 6. Per-tenant scoping decision on discovery. Then minors + tests.

Review generated by checking out the PR, running the suite + type/format gates, and booting the server against live Postgres/Redis.

@cmd-err cmd-err force-pushed the feat/data-sources-inline branch 4 times, most recently from 55301fe to 46ebb66 Compare June 16, 2026 02:18
@cmd-err cmd-err force-pushed the feat/data-sources-inline branch from 46ebb66 to e60b854 Compare June 16, 2026 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants