Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "agent-fs",
"description": "Agent-first filesystem CLI — store, search, and version files backed by S3",
"version": "0.8.1",
"version": "0.9.0",
"author": {
"name": "desplega-ai"
},
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ agent-fs was built to power the shared filesystem in [agent-swarm](https://githu
## Key Features

- **Semantic search** — Index and search files using vector embeddings (OpenAI, Google GenAI, or local llama.cpp)
- **SQL queries** — Run sandboxed DuckDB SQL over stored documents (CSV, TSV, Parquet, Excel, JSON, NDJSON, SQLite, DuckDB) — see [docs/sql.md](./docs/sql.md)
- **Structured storage** — SQLite-backed file operations with metadata and versioning
- **S3-compatible sync** — Sync agent workspaces to any S3-compatible object store
- **Identity management** — Persistent agent identity files that evolve over time
Expand Down
41 changes: 32 additions & 9 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

65 changes: 64 additions & 1 deletion docs/openapi.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"openapi": "3.1.0",
"info": {
"title": "agent-fs API",
"version": "0.8.1",
"version": "0.9.0",
"description": "A persistent, searchable filesystem for AI agents. agent-fs is to files what agentmail is to email.",
"license": {
"name": "MIT",
Expand Down Expand Up @@ -769,6 +769,69 @@
},
"additionalProperties": false
},
{
"type": "object",
"title": "sql",
"description": "Run a DuckDB SQL query over documents stored in the drive. Reference documents directly by path string literal (e.g. SELECT * FROM '/data/sales.csv') or bind them as named tables via `tables` ({ name: path } or { name: { path, format } } to override format detection). Supports csv, tsv, parquet, xlsx, json, ndjson/jsonl (each also .gz except parquet/xlsx), sqlite (.db/.sqlite/.sqlite3, tables exposed as name.tablename), and .duckdb files. Queries run sandboxed: no filesystem or network access beyond the bound documents. Returns { columns, rows, rowCount, truncated, files, elapsedMs }.",
"required": [
"op",
"query"
],
"properties": {
"op": {
"type": "string",
"const": "sql"
},
"driveId": {
"type": "string",
"description": "Target drive ID (optional, uses default drive)"
},
"query": {
"type": "string"
},
"tables": {
"type": "object",
"additionalProperties": {
"anyOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"path": {
"type": "string"
},
"format": {
"type": "string",
"enum": [
"csv",
"tsv",
"parquet",
"xlsx",
"json",
"ndjson",
"sqlite",
"duckdb"
]
}
},
"required": [
"path"
],
"additionalProperties": false
}
]
}
},
"maxRows": {
"type": "integer",
"minimum": 1,
"maximum": 10000
}
},
"additionalProperties": false
},
{
"type": "object",
"title": "signed-url",
Expand Down
144 changes: 144 additions & 0 deletions docs/sql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# SQL Queries

Run DuckDB SQL directly against documents stored in agent-fs — CSV, TSV, Parquet, Excel, JSON, NDJSON, SQLite databases, and DuckDB databases. Available as a CLI command (`agent-fs sql`), an API op (`sql`), and an MCP tool.

```bash
agent-fs sql "SELECT category, sum(amount) FROM '/finance/2026.csv' GROUP BY category"
```

## Supported formats

| Format | Extensions | Notes |
|--------|------------|-------|
| CSV | `.csv`, `.csv.gz` | Delimiter, header, and types auto-detected |
| TSV | `.tsv`, `.tab`, `.tsv.gz` | Tab-delimited |
| Parquet | `.parquet` | |
| Excel | `.xlsx` | First sheet; header auto-detected |
| JSON | `.json`, `.json.gz` | Arrays of objects or single objects |
| NDJSON | `.ndjson`, `.jsonl` (+ `.gz`) | Newline-delimited JSON |
| SQLite | `.db`, `.sqlite`, `.sqlite3` | All tables exposed — see below |
| DuckDB | `.duckdb` | Tables from the `main` schema |

## Referencing documents

There are two ways to make a document queryable:

### 1. Path literals (file formats)

Reference any stored CSV/TSV/Parquet/Excel/JSON/NDJSON document by its drive path as a quoted string, exactly like DuckDB's own file syntax:

```bash
agent-fs sql "SELECT * FROM '/data/sales.csv' WHERE amount > 100"
agent-fs sql "SELECT a.id FROM '/data/a.parquet' a JOIN '/data/b.ndjson' b USING (id)"
```

Only string literals that resolve to an existing document in the drive are bound — other strings are left untouched, so `WHERE name = 'report.csv'` works as expected.

### 2. Named table bindings

Bind documents to table names with `--table` (repeatable). Required for SQLite/DuckDB databases, and useful for readable multi-file queries:

```bash
agent-fs sql "SELECT s.name, t.tag FROM sales s JOIN tags t ON s.id = t.id" \
-t sales=/data/sales.csv -t tags=/data/tags.parquet
```

SQLite and DuckDB databases expose every table under the binding name as a schema:

```bash
# /backups/app.db has tables `users` and `orders`
agent-fs sql "SELECT u.email, count(*) FROM app.users u JOIN app.orders o ON o.user_id = u.id GROUP BY 1" \
-t app=/backups/app.db
```

### Making any document SQL-able (format override)

If a document doesn't have a recognized extension, override the format with a `:format` suffix (CLI) or `{ path, format }` (API/MCP):

```bash
agent-fs sql "SELECT sum(a) FROM logs" -t logs=/raw/data.txt:csv
```

Formats: `csv`, `tsv`, `parquet`, `xlsx`, `json`, `ndjson`, `sqlite`, `duckdb`.

## CLI usage

```bash
agent-fs sql [query] [options]

-t, --table <name=path[:format]> Bind a document as a named table (repeatable)
--max-rows <n> Max rows to return (default: 1000, max: 10000)
--json Output the raw result object
```

The query can also be piped via stdin:

```bash
echo "SELECT count(*) FROM '/data/events.ndjson'" | agent-fs sql
```

Multi-statement queries are supported — the result of the last statement is returned:

```bash
agent-fs sql "CREATE TABLE big AS SELECT * FROM '/data/sales.csv' WHERE amount > 100; SELECT count(*) FROM big"
```

## API usage

`sql` is a standard op on the dispatch endpoint:

```bash
curl -X POST http://localhost:7433/orgs/{orgId}/ops \
-H "Authorization: Bearer <api-key>" \
-H "Content-Type: application/json" \
-d '{
"op": "sql",
"driveId": "<driveId>",
"query": "SELECT s.name, t.tag FROM sales s JOIN tags t ON s.id = t.id",
"tables": {
"sales": "/data/sales.csv",
"tags": { "path": "/data/tags.parquet", "format": "parquet" }
},
"maxRows": 1000
}'
```

Response:

```json
{
"columns": [{ "name": "name", "type": "VARCHAR" }, { "name": "tag", "type": "VARCHAR" }],
"rows": [{ "name": "alpha", "tag": "aa" }],
"rowCount": 1,
"truncated": false,
"files": [
{ "table": "sales", "path": "/data/sales.csv", "format": "csv" },
{ "table": "tags", "path": "/data/tags.parquet", "format": "parquet" }
],
"elapsedMs": 42
}
```

Value encoding: `BIGINT`s outside JavaScript's safe integer range and timestamps/dates/UUIDs are returned as strings; decimals as numbers; lists and structs as JSON arrays/objects; blobs as `<blob N bytes>` placeholders. The op requires the `viewer` role.

The same op is exposed as the `sql` MCP tool with identical arguments.

## Web UI

The live UI has a SQL workbench at `/sql/~/{orgId}/{driveId}` — pick documents, write queries (Cmd+Enter to run), and view results as a table or quick chart. CSV/TSV/Parquet/JSON/NDJSON queries run client-side via DuckDB WASM; Excel/SQLite/DuckDB documents automatically run on the server. Open it from any queryable file via the **Query** button in the file detail view.

## Sandboxing & limits

User SQL never touches the host. Referenced documents are downloaded and materialized into an in-memory DuckDB instance during setup; before any user SQL executes, external access is disabled, the local filesystem is turned off, and the configuration is locked. SQLite files are converted in an isolated bridge instance — the instance that runs your query never loads the sqlite extension, closing its direct-I/O escape hatch. Attempts to read host paths, fetch URLs, `ATTACH`, `COPY ... TO`, or change settings fail with a `VALIDATION_ERROR`.

Server-side knobs (environment variables on the daemon/server):

| Variable | Default | Meaning |
|----------|---------|---------|
| `AGENT_FS_SQL_TIMEOUT_MS` | `30000` | Query timeout (the query is interrupted) |
| `AGENT_FS_SQL_MAX_FILE_BYTES` | `268435456` (256 MB) | Per-document size cap |
| `AGENT_FS_SQL_MEMORY_LIMIT` | `512MB` | DuckDB memory limit |

`maxRows` caps the returned rows (default 1000, max 10000); `truncated: true` signals there were more.

Note: the first query that touches an `.xlsx` or SQLite document downloads the DuckDB `excel`/`sqlite` extension on the server (cached in `~/.duckdb` afterwards), so the server needs outbound network access once.
2 changes: 2 additions & 0 deletions landing/content/markdown.ts
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ fully self-hostable.
### Reference

- API reference: https://agent-fs.dev/docs/api-reference — HTTP endpoints, auth, MCP transport, and operation dispatch
- SQL queries: https://agent-fs.dev/docs/sql — query CSV, Parquet, Excel, JSON, and SQLite documents with DuckDB
- OpenAPI spec: https://agent-fs.dev/docs/openapi.json

### Mounting (FUSE)
Expand All @@ -125,6 +126,7 @@ prefer fetching markdown directly.
- \`agent-fs cat <path>\` — read a file
- \`agent-fs search '<query>'\` — semantic search over stored files
- \`agent-fs fts '<query>'\` — full-text search over stored files
- \`agent-fs sql "SELECT * FROM '/data/sales.csv'"\` — DuckDB SQL over stored documents (csv, parquet, xlsx, json, sqlite)
- \`agent-fs comment add <path> --content '...'\` — leave a comment
- \`agent-fs drive invite <email>\` — share a drive with another agent or human

Expand Down
Loading
Loading