Skip to content

feat(slackbot): persist thread history in a WritableFileSystem block#1342

Merged
zzstoatzz merged 1 commit into
mainfrom
feat/persist-thread-messages
May 12, 2026
Merged

feat(slackbot): persist thread history in a WritableFileSystem block#1342
zzstoatzz merged 1 commit into
mainfrom
feat/persist-thread-messages

Conversation

@zzstoatzz

Copy link
Copy Markdown
Collaborator

Summary

  • Replace the local SQLite slack_thread_messages table with a per-thread JSON archive stored in a Prefect WritableFileSystem block, configured via MARVIN_SLACKBOT_MESSAGE_STORE_BLOCK.
  • Defaults to a LocalFileSystem rooted at MARVIN_SLACKBOT_MESSAGE_STORE_LOCAL_DIR (./marvin_chat_storage by default) so local dev keeps working unchanged.
  • Thread message history now survives Cloud Run revision deploys and is readable from anywhere with workspace access (no more shelling into the container to inspect a thread).

Why

The SQLite file lived at marvin_chat.sqlite on the container's working directory — Cloud Run ephemeral disk. Every redeploy wiped it. There was also no programmatic path to retrieve what marvin said in a thread short of opening Slack or hitting the container's disk directly. Pointing the store at a workspace-owned bucket block fixes both at once.

Design

  • _internal/message_store.py wraps a WritableFileSystem block. One JSON object per thread, keyed threads/{thread_ts}.json, containing the full ModelMessage[] (the same shape we already write today, just keyed and overwritten instead of append-rows).
  • Writes do read-modify-write under a per-thread asyncio.Lock so two overlapping turns in the same thread don't clobber each other. The bot is single-instance on Cloud Run today; if we move to multi-replica we'll need a distributed primitive (CAS or external lock) — flagged in the module docstring.
  • The Database class is preserved for the ephemeral slack_thread_status table (cross-process dedup state where losing it on redeploy is fine).

Deploy steps (per environment)

This PR is safe to merge to main and deploy to stg before prd:

  1. Install a GCS backend in the runtime image. Either:

    • prefect-gcp (for a gcs-bucket/... block that references our existing gcp-credentials block), or
    • gcsfs (for a remote-file-system/... block with a gs:// basepath)

    Not added to slackbot pyproject in this PR because both transitively trip a pydantic-ai/mistralai resolution conflict on this workspace. Suggest adding via the Dockerfile or as a build-time install instead.

  2. Create the storage block in the workspace (e.g. gcs-bucket/marvin-chat-history).

  3. Set MARVIN_SLACKBOT_MESSAGE_STORE_BLOCK=<slug> on the Cloud Run service.

Without those steps the bot still runs — it falls back to LocalFileSystem (which on Cloud Run means ephemeral disk again, i.e. the current behavior). So a no-op rollout is also safe.

Test plan

  • Lint + format pass (ruff check, ruff format --check).
  • Smoke test against LocalFileSystem: empty thread returns [], single-turn round-trips with message types preserved, multi-turn accumulates, concurrent appends to the same thread serialize cleanly via the per-thread lock, separate threads are isolated. (Local script, not checked in.)
  • slackbot.api imports clean after the changes.
  • Deploy to stg, send a few mentions in a test thread, confirm history persists across a redeploy.
  • Verify gs://<bucket>/threads/<thread_ts>.json is readable from outside the bot.

🤖 Generated with Claude Code

…m block

Replaces the local SQLite slack_thread_messages table with a per-thread
JSON archive stored in a Prefect WritableFileSystem block. Configured via
MARVIN_SLACKBOT_MESSAGE_STORE_BLOCK; defaults to a LocalFileSystem for dev.

Why: the SQLite file lived on Cloud Run's ephemeral disk, so every
revision deploy wiped thread history (and ad-hoc "what did marvin say in
thread X?" required shelling into a running container). Pointing the
store at a workspace-owned bucket block survives redeploys and makes
transcripts reachable via the Prefect API.

Notes:
- thread_status SQLite table is unchanged; that's transient dedup state
  and dying with the container is fine.
- per-thread asyncio.Lock guards the read-modify-write. Single-instance
  on Cloud Run today; if we go multi-replica, this needs to become a
  distributed primitive (CAS or external lock).
- no GCS-backend dep added to slackbot pyproject in this PR — adding
  prefect-gcp/gcsfs triggers a transitive pydantic-ai resolution
  conflict. The right backend will be installed alongside block creation
  at deploy time.

Deploy steps (per environment):
1. Install backend (e.g. `prefect-gcp` or `gcsfs`) in the runtime image
2. Create a workspace block: `gcs-bucket/marvin-chat-history` (or
   `remote-file-system/...` with a `gs://` basepath)
3. Set MARVIN_SLACKBOT_MESSAGE_STORE_BLOCK=<slug> on the service

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 12, 2026 04:47
@zzstoatzz zzstoatzz merged commit 8770bd7 into main May 12, 2026
5 checks passed
@zzstoatzz zzstoatzz deleted the feat/persist-thread-messages branch May 12, 2026 04:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Slackbot example to persist per-thread conversation history in a Prefect WritableFileSystem block (with a LocalFileSystem fallback) instead of storing messages in a local SQLite table, improving durability across Cloud Run deploys.

Changes:

  • Added a MessageStore abstraction backed by WritableFileSystem, storing one JSON archive per Slack thread.
  • Updated the Slack message handling flow to read/append thread history via MessageStore instead of SQLite.
  • Narrowed SQLite usage to the existing ephemeral slack_thread_status dedup table and added settings to configure the message store block/local fallback dir.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
examples/slackbot/src/slackbot/settings.py Adds configuration for a WritableFileSystem-backed message store and local fallback directory.
examples/slackbot/src/slackbot/core.py Removes the SQLite message-history table logic; keeps SQLite connection for thread-status dedup state.
examples/slackbot/src/slackbot/api.py Wires MessageStore into app lifespan and uses it for thread history get/append during message handling.
examples/slackbot/src/slackbot/_internal/message_store.py Introduces the per-thread JSON message store with per-thread asyncio.Lock-guarded read/modify/write.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +54 to +56
def _thread_key(thread_ts: str) -> str:
return f"threads/{thread_ts}.json"

zzstoatzz added a commit that referenced this pull request May 12, 2026
…kerfile

Four Copilot findings, all legit:

1. **Path traversal via `thread_ts`** — `/chat` doesn't verify Slack
   signatures, so an attacker can supply `thread_ts="../etc/passwd"` and
   influence the filesystem path. Validate strictly against the real
   Slack format (`^\d+\.\d+$`) before using it as a key.

2. **Broad `ValueError` catch in `MessageStore.get`** — was swallowing
   any ValueError as "missing file" and could mask real read errors
   ("path is not a file", etc.). Narrow to the LocalFileSystem
   "does not exist" message; re-raise anything else.

3. **Unbounded per-thread `_locks` dict** — defaultdict grew monotonically
   with every new thread_ts and was never cleaned. Replaced with a
   single store-wide write lock; bot QPS is low enough that the
   serialization is free, and the leak goes away entirely.

4. **`uv pip install` could drift from `uv.lock`** — observed locally
   that the previous form upgraded `prefect` from 3.6.12 to 3.7.0
   despite the version range on prefect-gcp. Pin `prefect==3.6.12` in
   the same install command so the resolver can't move it.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
zzstoatzz added a commit that referenced this pull request May 12, 2026
…ing (#1343)

* feat(slackbot): persist thread message history in a WritableFileSystem block

Replaces the local SQLite slack_thread_messages table with a per-thread
JSON archive stored in a Prefect WritableFileSystem block. Configured via
MARVIN_SLACKBOT_MESSAGE_STORE_BLOCK; defaults to a LocalFileSystem for dev.

Why: the SQLite file lived on Cloud Run's ephemeral disk, so every
revision deploy wiped thread history (and ad-hoc "what did marvin say in
thread X?" required shelling into a running container). Pointing the
store at a workspace-owned bucket block survives redeploys and makes
transcripts reachable via the Prefect API.

Notes:
- thread_status SQLite table is unchanged; that's transient dedup state
  and dying with the container is fine.
- per-thread asyncio.Lock guards the read-modify-write. Single-instance
  on Cloud Run today; if we go multi-replica, this needs to become a
  distributed primitive (CAS or external lock).
- no GCS-backend dep added to slackbot pyproject in this PR — adding
  prefect-gcp/gcsfs triggers a transitive pydantic-ai resolution
  conflict. The right backend will be installed alongside block creation
  at deploy time.

Deploy steps (per environment):
1. Install backend (e.g. `prefect-gcp` or `gcsfs`) in the runtime image
2. Create a workspace block: `gcs-bucket/marvin-chat-history` (or
   `remote-file-system/...` with a `gs://` basepath)
3. Set MARVIN_SLACKBOT_MESSAGE_STORE_BLOCK=<slug> on the service

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* fix(slackbot): install prefect-gcp in Docker image for GcsBucket block loading

Follow-up to #1342. The MessageStore is wired to load a workspace block
slug like `gcs-bucket/marvin-chat-history` via `Block.aload`, which
requires `prefect_gcp.cloud_storage.GcsBucket` to be importable at
runtime. Adding `prefect[gcp]` (or `prefect-gcp` or `gcsfs`) to the
slackbot pyproject re-resolves the workspace lock and hits a hard wall:
`pydantic-ai-slim`'s `[mistral]` extra requires `mistralai>=1.9.10`, but
mistralai is quarantined on PyPI right now, so any re-resolution fails.

Install the dep in the Dockerfile after `uv sync --frozen` instead — it
lands in the same venv but doesn't go through uv's universal resolver.
Pin `<0.6.17` because 0.6.17+ require `prefect>=3.6.17` / `>=3.7.0` and
the workspace is on 3.6.12; the version of `command_from_string` they
reach for doesn't exist in 3.6.12 so the import would break.

Verified locally: `uv pip install "prefect-gcp>=0.6.10,<0.6.17"` into the
synced venv, `from prefect_gcp.cloud_storage import GcsBucket` succeeds,
`slackbot.api` imports cleanly, MessageStore smoke test still passes.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* fix(slackbot): address Copilot review on #1342 and pin prefect in Dockerfile

Four Copilot findings, all legit:

1. **Path traversal via `thread_ts`** — `/chat` doesn't verify Slack
   signatures, so an attacker can supply `thread_ts="../etc/passwd"` and
   influence the filesystem path. Validate strictly against the real
   Slack format (`^\d+\.\d+$`) before using it as a key.

2. **Broad `ValueError` catch in `MessageStore.get`** — was swallowing
   any ValueError as "missing file" and could mask real read errors
   ("path is not a file", etc.). Narrow to the LocalFileSystem
   "does not exist" message; re-raise anything else.

3. **Unbounded per-thread `_locks` dict** — defaultdict grew monotonically
   with every new thread_ts and was never cleaned. Replaced with a
   single store-wide write lock; bot QPS is low enough that the
   serialization is free, and the leak goes away entirely.

4. **`uv pip install` could drift from `uv.lock`** — observed locally
   that the previous form upgraded `prefect` from 3.6.12 to 3.7.0
   despite the version range on prefect-gcp. Pin `prefect==3.6.12` in
   the same install command so the resolver can't move it.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants