Skip to content

feat(agent): S3-backed Claude SDK session store #484

Description

@krokoko

Context: ROADMAP.md → S3-backed SDK session store (portable transcripts)


Component

Agent (Python runtime)

Describe the feature

Plumb the Claude Agent SDK SessionStore to S3 (dedicated bucket or prefix) with:

  • Eager flush, IAM-scoped access, conditional part creates, checksums, adaptive retries, structured logging
  • Metrics/alarms on transcript mirror failures
  • Graceful shutdown (disconnect on SIGTERM/cancel) so in-flight frames flush
  • Persist task_id ↔ Claude session UUID (from first ResultMessage) for resume on another worker
  • Keep agent cwd stable so SDK-derived project_key paths stay predictable
  • Plan compaction when part count threatens resume latency
  • Optional S3 Express One Zone when fleet is single-AZ

Complements shipped Persistent session storage (/mnt/workspace FUSE caches) and end-of-task trace upload to traces/...jsonl.gz.

Use case

In-VM caches do not survive worker migration or long pauses. Portable transcripts enable resume, debugging, and compliance without relying solely on end-of-task trace upload.

Proposed solution

  1. S3 prefix per task_id + SDK project_key; SessionRole scoped writes.
  2. Flush on frame boundary; disconnect on shutdown.
  3. Alarms on mirror failures; structured logging.
  4. Compaction when part count threatens resume latency.
  5. Optional S3 Express One Zone when fleet is single-AZ.
  6. Map first ResultMessage session UUID to task_id in DynamoDB for cross-worker resume.

Other information

  • Shipped: persistent session storage on /mnt/workspace; execution tracing to traces/...jsonl.gz.

  • Design context: docs/design/COMPUTE.md, docs/design/OBSERVABILITY.md.

  • This might be a breaking change

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent-runtimePython agent container: pipeline, runner, hooks, prompts, tools, DockerfileenhancementNew feature or requestinfra-cdkCDK stacks/constructs, bootstrap, deploy topology, tags, IAM wiring, teardown

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions