Skip to content

feat(media): pluggable storage backend + S3 implementation#1166

Draft
ilyaseverin wants to merge 3 commits into
nextlevelbuilder:mainfrom
injectinglabs:feat/media-s3-backend
Draft

feat(media): pluggable storage backend + S3 implementation#1166
ilyaseverin wants to merge 3 commits into
nextlevelbuilder:mainfrom
injectinglabs:feat/media-s3-backend

Conversation

@ilyaseverin

Copy link
Copy Markdown

Summary

Adds a media.Backend interface and an S3Backend implementation so goclaw can run on ephemeral hosts without losing session media on restart. The existing filesystem behaviour is preserved exactly — GOCLAW_MEDIA_BACKEND defaults to fs and every current call site keeps working unchanged.

Three commits, intentionally separable for review:

  1. refactor(media): introduce Backend interface, keep Store as façade — splits internal/media/store.go into backend.go (interface), fs_backend.go (the historical implementation, unchanged), and a thin store.go façade. No behaviour change.
  2. feat(media): add S3Backend + env-driven factoryS3Backend persists objects under {prefix}/{sessionHash}/{id}.{ext} with a manifest sidecar at {prefix}/_manifests/{id} so LocalPath/Open resolve a bare ID in two GETs. Credentials come from the default SDK chain only (IAM role / env), no static-key field. LocalPath lazily caches into os.TempDir()/goclaw-media-cache with atomic rename. Delete paginates the session prefix and DeleteObjects in batches of 1000.
  3. feat(media): wire env-driven backend selection + FSBackend testscmd/gateway_managed.go now goes through media.LoadConfigFromEnv / NewStoreFromConfig. Adds round-trip / mime-ext / not-found tests for the FS backend.

Why

Today media.Store is concrete and FS-only. Deployments on Fargate / K8s without persistent volumes / EC2 ASGs that rotate instances drop their session media on every restart. The workaround — pointing baseDir at EFS/EBS — is heavier than handing the same problem to S3 with a 7-day lifecycle rule.

Secondary win: the concrete *media.Store type made it impossible to substitute a fake in tests; new code can now depend on media.Backend directly.

Env vars

Var Default Purpose
GOCLAW_MEDIA_BACKEND fs fs or s3
GOCLAW_MEDIA_BASEDIR {workspace}/.media FS only
GOCLAW_MEDIA_S3_BUCKET required when backend=s3
GOCLAW_MEDIA_S3_PREFIX empty key prefix inside the bucket
GOCLAW_MEDIA_S3_REGION us-east-1
GOCLAW_MEDIA_S3_ENDPOINT for MinIO / R2 / DO Spaces
GOCLAW_MEDIA_S3_CACHE_DIR $TMPDIR/goclaw-media-cache local read cache

IAM permissions needed for the bucket: s3:PutObject, s3:GetObject, s3:DeleteObject, s3:ListBucket, s3:DeleteObjects.

Test plan

  • FSBackend unit tests (round-trip, mime ext precedence, ErrNotFound).
  • S3Backend integration: not in the test suite — gated on GOCLAW_TEST_S3_BUCKET would be the natural follow-up; happy to add in this PR if you'd like.
  • Build + existing tests on CI.
  • Smoke on my fork's staging once merged downstream (injectinglabs/goclaw).

Out of scope

  • Workspace-as-S3 (covers user-generated files under workspace/generated/...). That requires a separate workspace abstraction and is orthogonal to session media.
  • Signed download URLs for generated documents — already covered upstream via POST /v1/files/sign.

Open questions

  1. Naming: Backend vs Storage vs Driver? I went with Backend to match internal/backup. Happy to rename.
  2. Should Save take an io.Reader instead of srcPath? Current callers all have a local file, so srcPath keeps the FS backend's rename-fast-path; reader-based API would force FS to write a tempfile.
  3. Add a gated s3_backend_test.go in this PR, or as a follow-up?

🤖 Generated with Claude Code

ilyaseverin and others added 3 commits May 20, 2026 20:49
Splits internal/media into:

- backend.go: Backend interface (Save / Open / LocalPath / Delete) and a
  shared ErrNotFound sentinel.
- fs_backend.go: FSBackend, the historical filesystem implementation
  that used to live in store.go. Same key shape, same rename-or-copy
  semantics; no behaviour change.
- store.go: Store is now a thin façade over a Backend. NewStore still
  returns an FS-backed store, so every existing caller — gateway wiring,
  agent loop, http handlers, tests — keeps working without changes.
  New code that wants context-aware access can grab Store.Backend()
  directly.

ExtFromMime stays on Store so both backends agree on the extension that
ends up in the object key.

This commit is intentionally behaviour-preserving. S3Backend, the config
factory, and the bypass-site fixes land in follow-up commits on this
branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
S3Backend persists media objects under {prefix}/{sessionHash}/{id}.{ext}
plus a tiny manifest object at {prefix}/_manifests/{id} that maps the
bare media ID back to its full key. The manifest lets LocalPath / Open
resolve an ID in two GetObjects (manifest + payload) without scanning
the bucket, which would not scale.

Highlights:

- Credentials come from the default AWS SDK chain only (IAM role on
  EC2 / ECS / EKS, env vars in dev). No static-key field on Config to
  avoid copy-pasted secrets in env files.
- manager.NewUploader does multipart with 10MB parts / concurrency=3,
  same posture as internal/backup.
- LocalPath downloads into a process-local cache dir (default
  os.TempDir()/goclaw-media-cache), atomic-renames a sibling tempfile
  into place so a partial fetch never gets surfaced as a cache hit.
- Delete lists the session prefix, batch-deletes payloads in groups of
  1000, then drops the corresponding manifests (best-effort — a
  leftover manifest just wastes a few bytes).
- Endpoint override is plumbed through for MinIO / R2 / DO Spaces.

internal/media/config.go introduces Config + LoadConfigFromEnv +
NewBackend + NewStoreFromConfig so cmd/ wiring stays a one-liner.
Default backend is "fs", default baseDir is whatever the caller passes
in — existing deployments see zero behaviour change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cmd/gateway_managed.go switches from media.NewStore(workspace+\"/.media\")
to media.NewStoreFromConfig(LoadConfigFromEnv(...)). FS remains the
default — operators with no GOCLAW_MEDIA_* env see byte-for-byte the
same behaviour they had before — but flipping a single env var now
moves session media to S3 without touching code.

internal/media/fs_backend_test.go covers the refactor:

- Save→LocalPath→Open→Delete round trip, including the source-file
  consumption contract.
- ExtFromMime taking precedence over the source's own extension.
- LocalPath returning ErrNotFound (errors.Is-matchable) for stale IDs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ilyaseverin added a commit to injectinglabs/goclaw that referenced this pull request May 20, 2026
nextlevelbuilder#1166 (#90)

* refactor(media): introduce Backend interface, keep Store as façade

Splits internal/media into:

- backend.go: Backend interface (Save / Open / LocalPath / Delete) and a
  shared ErrNotFound sentinel.
- fs_backend.go: FSBackend, the historical filesystem implementation
  that used to live in store.go. Same key shape, same rename-or-copy
  semantics; no behaviour change.
- store.go: Store is now a thin façade over a Backend. NewStore still
  returns an FS-backed store, so every existing caller — gateway wiring,
  agent loop, http handlers, tests — keeps working without changes.
  New code that wants context-aware access can grab Store.Backend()
  directly.

ExtFromMime stays on Store so both backends agree on the extension that
ends up in the object key.

This commit is intentionally behaviour-preserving. S3Backend, the config
factory, and the bypass-site fixes land in follow-up commits on this
branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(media): add S3Backend + env-driven factory

S3Backend persists media objects under {prefix}/{sessionHash}/{id}.{ext}
plus a tiny manifest object at {prefix}/_manifests/{id} that maps the
bare media ID back to its full key. The manifest lets LocalPath / Open
resolve an ID in two GetObjects (manifest + payload) without scanning
the bucket, which would not scale.

Highlights:

- Credentials come from the default AWS SDK chain only (IAM role on
  EC2 / ECS / EKS, env vars in dev). No static-key field on Config to
  avoid copy-pasted secrets in env files.
- manager.NewUploader does multipart with 10MB parts / concurrency=3,
  same posture as internal/backup.
- LocalPath downloads into a process-local cache dir (default
  os.TempDir()/goclaw-media-cache), atomic-renames a sibling tempfile
  into place so a partial fetch never gets surfaced as a cache hit.
- Delete lists the session prefix, batch-deletes payloads in groups of
  1000, then drops the corresponding manifests (best-effort — a
  leftover manifest just wastes a few bytes).
- Endpoint override is plumbed through for MinIO / R2 / DO Spaces.

internal/media/config.go introduces Config + LoadConfigFromEnv +
NewBackend + NewStoreFromConfig so cmd/ wiring stays a one-liner.
Default backend is "fs", default baseDir is whatever the caller passes
in — existing deployments see zero behaviour change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(media): wire env-driven backend selection + FSBackend tests

cmd/gateway_managed.go switches from media.NewStore(workspace+\"/.media\")
to media.NewStoreFromConfig(LoadConfigFromEnv(...)). FS remains the
default — operators with no GOCLAW_MEDIA_* env see byte-for-byte the
same behaviour they had before — but flipping a single env var now
moves session media to S3 without touching code.

internal/media/fs_backend_test.go covers the refactor:

- Save→LocalPath→Open→Delete round trip, including the source-file
  consumption contract.
- ExtFromMime taking precedence over the source's own extension.
- LocalPath returning ErrNotFound (errors.Is-matchable) for stale IDs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@mrgoonie mrgoonie added agent:github-maintain Processed by github-maintain automation maintain:deferred Deferred by maintain workflow pr:oversized PR too large for bounded cron review labels Jun 23, 2026
@mrgoonie

Copy link
Copy Markdown
Contributor

Summary: Deferred in cron-safe maintenance pass because this PR is still a draft and exceeds the bounded review budget.

Decision: deferred / needs full review when ready

Evidence:

  • PR is marked draft, so it is not merge-ready.
  • Scope is non-trivial: 7 files, +706/-88 lines, including a new S3 media backend and env-driven backend selection.
  • GitHub reports no checks for the current head.

Next step: mark ready for review after CI is available and decide whether S3 integration tests should be included in this PR or follow-up. Once it is ready, it needs a full storage/security review covering bucket key layout, cache behavior, delete semantics, env config, and regression tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent:github-maintain Processed by github-maintain automation maintain:deferred Deferred by maintain workflow pr:oversized PR too large for bounded cron review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants