-
Notifications
You must be signed in to change notification settings - Fork 1.2k
docs(deployment): add Deploy to Railway guide and group nav #1880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rohitg00
wants to merge
4
commits into
docs/deployment-guide
Choose a base branch
from
docs/deploy-railway-guide
base: docs/deployment-guide
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+630
−2
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
1201d0f
docs(deployment): add Deploy to Railway guide and group nav
rohitg00 87a6b87
docs(deployment): drop hand-rendered skill.md (generated in CI)
rohitg00 0876397
docs(deployment): keep deployment.mdx unchanged (handled in #1872)
rohitg00 0c25327
docs(deployment): libcap-ng0, restart policy, Railway checklist
rohitg00 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,308 @@ | ||
| --- | ||
| title: "Deploy to Railway" | ||
| description: "Run the iii engine and your workers on Railway with a clean, reusable base image." | ||
| owner: "devrel" | ||
| type: "how-to" | ||
| --- | ||
|
|
||
| This guide deploys the iii engine and your workers to [Railway](https://railway.com). | ||
| The approach is the same one used everywhere else (see | ||
| [Self-hosted deployment](./deployment)): ship a **clean engine image** and let your | ||
| `config.yaml` decide what runs. The base image contains **no** workers. You add | ||
| capabilities by declaring them, and the engine provisions them. One image stays | ||
| reusable across every app, and adding an integration is another config entry. | ||
|
|
||
| Railway differs from a bare-host deploy in three ways that shape the rest of this | ||
| guide: its private network is IPv6-only, durable state persists on a mounted | ||
| volume, and its edge terminates TLS so you do not run your own reverse proxy. | ||
|
|
||
| ## A base image worth reusing | ||
|
|
||
| Build a base that contains the engine **and** the `iii-worker` daemon (the | ||
| process that runs add-on workers), but nothing app-specific. The pre-built | ||
| distroless `iiidev/iii:latest` is engine-only and can't be extended (no shell), | ||
| so for any deployment that uses registry workers, build a small base from the | ||
| install script instead. Railway builds it from this repo | ||
| [Dockerfile](https://docs.railway.com/builds/dockerfiles): | ||
|
|
||
| ```dockerfile | ||
| # Dockerfile: clean iii base. Contains NO workers. | ||
| FROM debian:bookworm-slim | ||
|
|
||
| # curl, ca-certificates, and jq drive the installer. libssl3 and libcap2 are | ||
| # engine runtime deps; libcap-ng0 provides libcap-ng.so.0, which the iii-worker | ||
| # daemon needs to launch binary registry workers (for example `database`). | ||
| RUN apt-get update \ | ||
| && apt-get install -y --no-install-recommends curl ca-certificates jq libssl3 libcap2 libcap-ng0 \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Installs both `iii` (engine) and `iii-worker` (the add-on worker daemon). | ||
| RUN curl -fsSL https://install.iii.dev/iii/main/install.sh | sh | ||
| ENV PATH="/root/.local/bin:${PATH}" | ||
|
|
||
| WORKDIR /app | ||
| COPY config.yaml /app/config.yaml | ||
|
|
||
| EXPOSE 49134 3111 3112 | ||
| CMD ["iii", "--config", "/app/config.yaml"] | ||
| ``` | ||
|
|
||
| <Note> | ||
| A ready-to-fork starter with this `Dockerfile`, a `config.yaml`, and a | ||
| `railway.json` (Dockerfile builder + restart policy) lives at | ||
| [iii-experimental/railway-template](https://github.com/iii-experimental/railway-template). | ||
| </Note> | ||
|
|
||
| ## A Railway-ready config.yaml | ||
|
|
||
| The only opinionated file is `config.yaml`. Two things make it Railway-ready: | ||
| **bind to `[::]`** (Railway private networking is IPv6-only) and **put all | ||
| durable state under `/data`** (a Railway volume): | ||
|
|
||
| ```yaml | ||
| workers: | ||
| # The first manager entry is the engine WS port. Bind [::] so worker | ||
| # services can reach it over Railway's IPv6 private network. | ||
| - name: iii-worker-manager | ||
| config: { host: "[::]", port: 49134 } | ||
|
|
||
| - name: iii-http | ||
| config: | ||
| host: "[::]" | ||
| port: 3111 | ||
| cors: | ||
| allowed_origins: ["*"] # tighten to your domains in production | ||
| - name: iii-stream | ||
| config: | ||
| host: "[::]" | ||
| port: 3112 | ||
| adapter: { name: kv, config: { store_method: file_based, file_path: /data/stream_store } } | ||
| - name: iii-state | ||
| config: | ||
| adapter: { name: kv, config: { store_method: file_based, file_path: /data/state_store.db } } | ||
| - name: iii-queue | ||
| config: | ||
| adapter: { name: builtin, config: { store_method: file_based, file_path: /data/queue_store } } | ||
| - name: configuration | ||
| config: | ||
| adapter: { name: fs, config: { directory: /data/configuration } } | ||
| ``` | ||
|
|
||
| ## Add workers by declaring them | ||
|
|
||
| Each entry above is a worker. That is all there is in iii: everything is a | ||
| worker, and you run one by **declaring it** in `config.yaml`. Nothing is added | ||
| to the image itself. If a worker you declare isn't present locally, the | ||
| engine fetches it from the registry on boot. For example, add a SQLite-backed | ||
| database with nothing in the image: | ||
|
|
||
| ```yaml | ||
| - name: database | ||
| config: | ||
| databases: | ||
| primary: | ||
| url: sqlite:/data/iii.db | ||
| ``` | ||
|
|
||
| The engine logs `Worker 'database' not found locally, checking registry...` then | ||
| registers `database::query`, `database::execute`, and the rest as a plain child | ||
| process, no image rebuild. Add any worker the same way: a `- name: <worker>` | ||
| entry plus its config schema (see the worker's page on | ||
| [workers.iii.dev](https://workers.iii.dev)). Supply secrets (database URLs, | ||
| S3/R2 credentials) from Railway service variables, covered below. | ||
|
|
||
| <Note> | ||
| Auto-provisioning downloads the worker on first boot, which adds cold-start | ||
| time and needs registry access at runtime. For reproducible, fast starts in | ||
| production, **pin it into the image**: add `RUN iii worker add <name>` to your | ||
| Dockerfile. The base stays generic; the build step is your version lock. | ||
| </Note> | ||
|
|
||
| ## Connect the services | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we give readers explicit steps to create the Railway services, rather than only describing them? |
||
|
|
||
| A typical app is the engine plus one or more worker services that connect to it | ||
| over the private network: | ||
|
|
||
| | Service | What it is | Key setting | | ||
| | --- | --- | --- | | ||
| | `engine` | the base image above + your `config.yaml` | volume mounted at `/data`; `RAILWAY_RUN_UID=0` so a non-root image can write it; public domain on port `3111` | | ||
| | `api-worker` (and friends) | your Node/Python/Rust worker code | `III_URL=ws://engine.railway.internal:49134` | | ||
|
|
||
| Worker services stay private (no public domain) and reach the engine over | ||
| `engine.railway.internal`, Railway's internal DNS name for the engine service. | ||
| Railway [private networking](https://docs.railway.com/private-networking) is | ||
| IPv6-only, which is why the engine manager binds `[::]`. A worker that binds or | ||
| dials `127.0.0.1` will not find the engine across services; always use the | ||
| `.railway.internal` hostname. | ||
|
|
||
| <Note> | ||
| Set the engine service's | ||
| [restart policy](https://docs.railway.com/deployments/restarts) to **`ALWAYS`**. The | ||
| engine exits cleanly (exit code 0) when it reloads `config.yaml`, and Railway's default | ||
| `ON_FAILURE` policy only restarts on a non-zero exit, so a clean exit would leave the | ||
| engine stopped. The `railway.json` in the | ||
| [starter template](https://github.com/iii-experimental/railway-template) is the place to | ||
| set this. | ||
| </Note> | ||
|
|
||
| ## Public domain, TLS, and routing | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It’s not clear! |
||
|
|
||
| Railway's edge is your reverse proxy. It terminates TLS and forwards one public | ||
| domain to one port on one service, so you do not run Caddy or nginx yourself | ||
| (contrast the bare-host [Hardening](./deployment#hardening) section). See Railway's | ||
| [public networking](https://docs.railway.com/public-networking) guide for the | ||
| domain and `PORT` details. | ||
|
|
||
| - Attach the public domain to the **engine** service and target port **3111**, | ||
| the `iii-http` worker that serves your registered HTTP routes. Railway issues | ||
| and renews the certificate, so the domain serves HTTPS with no extra config. | ||
| - A custom domain works the same way: add it on the engine service and point your | ||
| DNS `CNAME` at the Railway target Railway shows you. | ||
| - Keep worker services private. They have no domain and are reachable only on the | ||
| internal network. | ||
| - If an external client needs the raw engine WebSocket port (`49134`), expose it | ||
| with a Railway [TCP proxy](https://docs.railway.com/networking/tcp-proxy). | ||
| Inside Railway, prefer the private network. | ||
| - Health check (optional): Railway's default check confirms the port is | ||
| listening. For an application-level check, set the service | ||
| [healthcheck](https://docs.railway.com/deployments/healthchecks) path to a | ||
| route your `iii-http` worker serves. | ||
|
|
||
| ## Secrets and environment | ||
|
|
||
| Supply every credential through Railway | ||
| [service variables](https://docs.railway.com/variables), never the image or git. | ||
| Reference them in `config.yaml` with `${VAR}` placeholders | ||
| (`${VAR:-default}` for a fallback); the engine substitutes them at boot: | ||
|
|
||
| ```yaml | ||
| - name: database | ||
| config: | ||
| databases: | ||
| primary: | ||
| url: ${DATABASE_URL} # set DATABASE_URL on the engine service | ||
| ``` | ||
|
|
||
| Shared variables let several services read one value, which is useful when a | ||
| worker and the engine both need the same connection string. Change a variable and | ||
| redeploy (or restart) the service for the engine to pick it up. | ||
|
|
||
| ## Stateful workers and object storage | ||
|
|
||
| Attach a Railway [volume](https://docs.railway.com/volumes) to the engine | ||
| service, mount it at `/data`, and keep every worker's persistence path under it | ||
| (`sqlite:/data/iii.db`, `/data/queue_store`, `/data/state_store.db`). Railway | ||
| mounts volumes as `root`, so a non-root image needs `RAILWAY_RUN_UID=0` to write | ||
| the volume. | ||
|
|
||
| For object storage, use the `storage` worker's **remote providers**, which need | ||
| no local disk: | ||
|
|
||
| ```yaml | ||
| - name: storage | ||
| config: | ||
| buckets: | ||
| uploads: { provider: s3, bucket: my-bucket, region: us-east-1 } | ||
| avatars: { provider: r2, bucket: avatars, account_id: ${R2_ACCOUNT_ID}, | ||
| access_key_id: ${R2_ACCESS_KEY_ID}, secret_access_key: ${R2_SECRET_ACCESS_KEY} } | ||
| ``` | ||
|
|
||
| <Note> | ||
| The `storage` worker's `local` provider runs a `rustfs` sidecar that does not | ||
| reach a healthy state inside Railway's container (a worker-internal lifecycle | ||
| issue, not a config one). On Railway, use a remote provider (`s3`, `gcs`, `r2`) | ||
| for object storage. | ||
| </Note> | ||
|
|
||
| ## Scaling | ||
|
|
||
| - **More workers**: add another service per worker (or per language runtime). | ||
| They all dial the same `engine.railway.internal:49134`. | ||
| - **External adapters**: swap the `file_based`/`builtin` adapters for Redis and | ||
| RabbitMQ when you outgrow single-instance file storage. See | ||
| [Scale out with Redis and RabbitMQ](./deployment#scale-out-with-redis-and-rabbitmq). | ||
| Add those as Railway services (or use a managed add-on) and point the | ||
| adapter config at their private hostnames. | ||
| - **Object storage**: the `storage` worker's remote providers (`s3`, `gcs`, | ||
| `r2`) are the durable, scalable path on Railway; supply credentials from | ||
| service variables. | ||
|
|
||
| ## What cannot run on Railway | ||
|
|
||
| Railway containers do **not** expose `/dev/kvm`. Any worker that boots a | ||
| micro-VM therefore cannot run there: | ||
|
|
||
| - `iii-sandbox` and any OCI/image (managed) worker. They boot guests via | ||
| libkrun. See [Engine-managed workers (micro-VMs)](./deployment#engine-managed-workers-micro-vms). | ||
| - Bundle workers, which are dispatched through the same libkrun rails. (Note | ||
| some workers are moving to `deploy: binary`, which **does** run on Railway as a | ||
| plain process. Check the worker's current type before assuming.) | ||
|
|
||
| Every other worker runs there, including `deploy: binary` workers, which run as | ||
| plain processes. | ||
|
|
||
| ## Verify the deployment | ||
|
|
||
| Once the engine service is live, the engine log shows each declared worker | ||
| registering, including any it auto-provisioned from the registry. Then call a | ||
| route your worker registered through the public domain: | ||
|
|
||
| ```bash | ||
| curl https://<your-engine-domain>/orders -X POST \ | ||
| -H 'content-type: application/json' \ | ||
| -d '{ "sku": "abc", "qty": 1 }' | ||
| ``` | ||
|
|
||
| A response from your handler confirms the full path: Railway edge to `iii-http` | ||
| to your worker to the `database` worker on the volume. | ||
|
|
||
| <Note> | ||
| Redeploying the engine briefly re-registers HTTP routes, which can race a | ||
| worker that still holds the old route. Prefer `railway restart` over a full | ||
| redeploy when only restarting, and restart dependent worker services after an | ||
| engine redeploy so they reconnect and re-register. | ||
| </Note> | ||
|
|
||
| ## Railway deployment checklist | ||
|
|
||
| This is the Railway-specific layer on top of the general | ||
| [Deployment checklist](./deployment#deployment-checklist). | ||
|
|
||
| **Image and build** | ||
|
|
||
| - [ ] Clean engine base built from the install script (not the distroless image) when you | ||
| use registry workers; distroless is engine-only with no shell to extend. | ||
| - [ ] `config.yaml` baked into the image; rebuild and redeploy to change it (Railway builds | ||
| from the Dockerfile, with no live file mount). | ||
| - [ ] `libcap-ng0` installed in the image if you run binary registry workers (for example | ||
| `database`); the `iii-worker` daemon needs it to launch them. | ||
| - [ ] Registry workers pinned with `RUN iii worker add <name>` for fast, reproducible | ||
| starts and no runtime registry dependency. | ||
|
|
||
| **Networking** | ||
|
|
||
| - [ ] Engine binds `[::]`; worker services dial `ws://engine.railway.internal:49134`, never | ||
| `127.0.0.1`. | ||
| - [ ] Public domain on the engine targets port `3111`; worker services have no public | ||
| domain. | ||
| - [ ] Raw `49134` exposed (via TCP proxy) only behind an RBAC listener | ||
| ([RBAC](./deployment#rbac)); otherwise keep it on the private network. | ||
|
|
||
| **State and resiliency** | ||
|
|
||
| - [ ] Volume mounted at `/data`; every worker `file_path` lives under it. | ||
| - [ ] `RAILWAY_RUN_UID=0` set so a non-root image can write the root-owned volume. | ||
| - [ ] No `DO NOT USE IN_MEMORY` warnings in the boot logs. | ||
| - [ ] Single engine service while on `file_based` storage; move to Redis/RabbitMQ before | ||
| scaling to multiple replicas. | ||
| - [ ] Engine restart policy set to `ALWAYS`, so a clean exit from a config reload does not | ||
| leave the service stopped (Railway's default `ON_FAILURE` skips code-0 exits). | ||
|
|
||
| **Security and operations** | ||
|
|
||
| - [ ] CORS narrowed to real origins (the sample uses `*`). | ||
| - [ ] Secrets supplied through Railway variables, never the image or git. | ||
| - [ ] Dependent worker services restarted after an engine redeploy (HTTP-route | ||
| re-registration race). | ||
| - [ ] Health check path pointed at a real `iii-http` route; observability exporter chosen | ||
| deliberately ([Observability](./deployment#observability)). | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep yaml, for coerence?