arcagi3-worldmodel

A Gymnasium environment that wraps the ARC-AGI-3 game API, plus a DreamerV3-style world model (train.py) that learns and plans on it. You can drive ARC-AGI-3 games with the standard reset() / step() RL loop and any Gymnasium-compatible tooling (wrappers, vectorisation, RL libraries).

It is built following the official Create a Custom Environment guide and the ARC-AGI-3 REST API.

Install

pip install -e .             # core deps include stable-worldmodel (torch, lancedb, ...)
pip install -e ".[http]"     # + requests, for the real ARC-AGI-3 API
pip install -e ".[dev]"      # + pytest, requests

stable-worldmodel is a first-class dependency: the env is swm-ready by design (see below), so import env always works with swm.

The whole project is a handful of flat modules — data_models.py (wire types), client.py (HTTP + mock clients), and env.py (the env). env.py works both standalone and directly with stable-worldmodel — there is a single ArcAgi3Env, no second adapter class.

Quick start (offline, no API key)

import gymnasium as gym
import env  # noqa: F401  (importing registers the gym ids)
from env import encode_action

e = gym.make("arcagi3/ArcAgi3Mock-v0", render_mode="ansi")
obs, info = e.reset(seed=0)

obs, reward, terminated, truncated, info = e.step(encode_action(6, x=12, y=30))  # ACTION6
print(info["state"], info["available_actions"])
print(e.render())
e.close()

ArcAgi3Mock-v0 runs a self-contained toy game (no network, no key) — useful for tests, check_env, and wiring up an agent before you go online.

Quick start (online, real ARC-AGI-3 API)

cp .env.template .env      # then set ARC_API_KEY (the app loads it via load_dotenv)

import gymnasium as gym
import env  # noqa: F401

e = gym.make("arcagi3/ArcAgi3-v0", game_id="ls20")
obs, info = e.reset(seed=0)                  # POST /api/cmd/RESET
obs, reward, term, trunc, info = e.step({"id": 6, "x": 12, "y": 30})  # dict form also ok
e.close()                                    # closes the scorecard

Spaces

	Gymnasium space	Meaning
Observation	`Box(0, 15, (64, 64), uint8)`	current frame, top layer; one colour id per cell
Action	`Discrete(4101)` = `5 + 64*64`	index `0..4` → ACTION1–5; index `5 + (y*64 + x)` → ACTION6 click at `(x, y)`

A flat Discrete action is the canonical form (required by stable-worldmodel — see below). Use encode_action(id, x, y) / decode_action(index) to convert. For convenience step also accepts a {"id", "x", "y"} dict (plus an optional "reasoning"), so e.step({"id": 2}) works too.

ARC-AGI-3 → Gymnasium mapping

ARC-AGI-3	Gymnasium
`POST /api/cmd/RESET`	`env.reset()` (also restarts after `GAME_OVER`)
`POST /api/cmd/ACTION1..6`	`env.step(action)`
`frame` (64×64 grid of colours)	observation `Box`
`state` ∈ {`NOT_PLAYED`,`NOT_FINISHED`,`WIN`,`GAME_OVER`}	`info["state"]`; `WIN`/`GAME_OVER` ⇒ `terminated=True`
`levels_completed` increase	`reward` (delta), `+win_bonus` on `WIN`
`available_actions`, `win_levels`, `guid`, full frame stack	`info[...]`
scorecard open/close	done automatically in `reset()`/`close()`

Truncation is delegated to the standard TimeLimit wrapper via max_episode_steps (80 for the online env, mirroring the reference agent's MAX_ACTIONS).

Info dict

info carries everything that doesn't fit the observation Box: state, available_actions, levels_completed, win_levels, guid, game_id, card_id, action_input, and frame_stack (the full, possibly multi-layer, raw frame).

Architecture

data_models.py  # GRID/NUM_COLORS, Color + PALETTE, GameState, FrameData (pydantic)
client.py       # ArcClient interface + HttpArcClient (real) + MockArcClient (offline)
env.py          # ArcAgi3Env, make_mock_env, encode/decode_action, register_swm
train.py        # continual learning: world model + CEM-MPC + collect->train->solve loop
tests/

The env talks to an injectable ArcClient, so the same ArcAgi3Env runs against the live API (HttpArcClient) or fully offline (MockArcClient). Pass your own client to point at a local server or to stub the API in tests:

from client import HttpArcClient
from env import ArcAgi3Env
e = ArcAgi3Env(client=HttpArcClient(root_url="http://localhost:8001"), game_id="ls20")

Use with stable-worldmodel

stable-worldmodel (swm) drives a pool of Gymnasium envs for world-model data collection, training, and MPC evaluation. ArcAgi3Env already satisfies swm's contract — register_swm() just registers that same env under the swm/ namespace.

import stable_worldmodel as swm
from env import register_swm

register_swm()              # registers swm/ArcAgi3-v0 and swm/ArcAgi3Mock-v0

world = swm.World("swm/ArcAgi3Mock-v0", num_envs=4, image_shape=(64, 64))
world.set_policy(swm.policy.RandomPolicy(seed=0))
world.collect("data/arc.lance", episodes=8, seed=0)   # -> LanceDB dataset

ds = swm.data.load_dataset("data/arc.lance", num_steps=4)  # pixels, action, reward, ...

Why ArcAgi3Env is swm-ready out of the box:

swm requirement	how the env meets it
flat action space (`EverythingToInfoWrapper` rejects `dict` actions; `CategoricalCEMSolver` asserts `Discrete`)	the canonical action is `Discrete(4101)` (`5` simple + `64*64` clicks); see `encode_action`/`decode_action`
`render()` → `rgb_array`	16-colour palette render (swm adds resized `pixels` to `info`)
`variation_space` (`swm.spaces.Dict`)	a minimal space exposing the empty-cell background colour

register_swm() is also used internally by train.py for data collection (it drives a pool of envs with world.collect).

Continual learning (`train.py`)

train.py is one self-contained script that closes the loop collect → train → solve → collect → … (Dreamer / Plan2Explore shape):

# Offline (mock game, no API key) — a complete working run:
python train.py --rounds 10 --episodes-per-round 16 --image-size 32 --eval-episodes 2

python train.py --resume checkpoints/mock-grid-v0.pt --rounds 5  # resume / keep learning
python train.py --rounds 5 --objective explore         # pure exploration instead

For the real API, pass a --game. The full ids look like sc25-635fd71a, but the suffix is only a version — so a bare prefix (sc25) also works: it's resolved against the games your key can see. List them, then train against one:

# set ARC_API_KEY in .env first (see "Quick start (online)" above)
python -c "from dotenv import load_dotenv; load_dotenv('.env'); \
from client import HttpArcClient; print(HttpArcClient().list_games())"

python train.py --online --game sc25 --rounds 10 --eval-episodes 2   # or the full sc25-635fd71a

Online is network-bound — every step is one HTTP call — so keep --num-envs modest and expect it to run much slower than the offline mock.

A short verified end-to-end run against the live ls20 game (a random round, then an MPC round that actually plays the game, with per-round training + eval):

$ python train.py --online --game ls20 --rounds 2 --episodes-per-round 2 \
      --num-envs 2 --eval-episodes 1 --image-size 32 --max-episode-steps 20
INFO start: env=swm/ArcAgi3-v0 game=ls20 rounds=2 objective=reward candidates=4101 ckpt=checkpoints/ls20.pt
INFO round 1/2 [random] collecting 2 episodes over 2 env(s)...
INFO   collected: buffer=2 eps / 40 steps
INFO   training 5 epochs (lr=0.0003) on 38 clips...
INFO     epoch   5/5  loss=5.76479 (recon=0.3281 reward=3.6677 cont=0.6545 surprise=0.0145 kl=1.1000)
INFO   saved checkpoint -> checkpoints/ls20.pt
INFO   eval: win_rate=0.00 mean_levels=0.00
INFO round 2/2 [mpc] collecting 2 episodes over 2 env(s)...
INFO   scorecard 733727ec: score=0.000 levels=0/7 actions=21 envs_done=0/1
INFO   collected: buffer=4 eps / 80 steps
INFO   training 5 epochs (lr=0.0003) on 76 clips...
INFO   saved checkpoint -> checkpoints/ls20.pt
INFO   eval: win_rate=0.00 mean_levels=0.00
INFO done: final model at checkpoints/ls20.pt

win_rate=0.00 is expected here — this is a tiny model over a handful of 20-step episodes, just to show the loop runs end-to-end on a real game. Solving ARC-AGI-3 is open research (see the note below).

Monitoring. Per-round eval already prints win_rate and mean_levels. Add --record-stats to wrap the eval env in gymnasium.wrappers.RecordEpisodeStatistics (episode return/length/time; mean_len is appended to the round line), and --video-dir DIR to record each eval episode to disk via gymnasium.wrappers.RecordVideo (needs pip install moviepy):

python train.py --rounds 10 --episodes-per-round 16 --image-size 32 \
    --eval-episodes 2 --record-stats --video-dir runs/videos

It bundles three things:

World model — a DreamerV3-style RSSM: CNN encoder + nn.Embedding action encoder (for the Discrete(4101) action) + a recurrent latent predictor over categorical latents (--latent-dim / --latent-classes) + deconv decoder, plus a two-hot symlog reward head (--reward-bins), a continue head (episode end), and a learned-surprise head (novelty). The loss combines image reconstruction + reward + continue + surprise + the dynamics/representation KL (free-bits, β_dyn/β_rep).
CEM-MPC planner — rolls the model forward in latent space over a Discrete candidate set (5 simple actions, plus an ACTION6 click grid via --click-stride) and picks the best first action. Default objective is task-directed: predicted_reward + β·surprise (--explore-beta); --objective explore is pure novelty-seeking exploration.
The loop — the first round acts randomly (no model yet), fills a growing in-memory ReplayBuffer, trains a model. Every later round acts with the CEM-MPC policy driven by the current model, appends the new experience, and fine-tunes the same model (warm-start = continual, not from scratch). Exploration is uncertainty-driven (the learned-surprise bonus), not ε-greedy. A few greedy eval episodes per round give a progress signal.

start: device=cuda env=swm/ArcAgi3Mock-v0 game=mock-grid-v0 rounds=10 objective=reward candidates=4101 ...
round 1/10 [random] collecting 16 episodes over 8 env(s)...
  collected: buffer=16 eps / 1280 steps
  eval: win_rate=0.00 mean_levels=0.00
round 2/10 [mpc] collecting 16 episodes over 8 env(s)...
...

The loop runs, learns dynamics + a reward model, and plans for reward; actually solving real ARC-AGI-3 games is open research (sparse reward, a tiny CNN world model, short CEM horizon). The scaffolding is complete and correct — scaling the model/horizon and reward shaping is where the research lives.

ARC-AGI-3 state lives on the game server and can't be set to an arbitrary start/goal, so this is episode-rollout planning (auto-reset), not swm's dataset-replay evaluate(_set_state/_set_goal_state) path.

Tests

pytest -q

Includes gymnasium.utils.env_checker.check_env to validate the env against the Gymnasium contract.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bibliography		bibliography
shell		shell
tests		tests
.env.template		.env.template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
client.py		client.py
data_models.py		data_models.py
env.py		env.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

arcagi3-worldmodel

Install

Quick start (offline, no API key)

Quick start (online, real ARC-AGI-3 API)

Spaces

ARC-AGI-3 → Gymnasium mapping

Info dict

Architecture

Use with stable-worldmodel

Continual learning (`train.py`)

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

arcagi3-worldmodel

Install

Quick start (offline, no API key)

Quick start (online, real ARC-AGI-3 API)

Spaces

ARC-AGI-3 → Gymnasium mapping

Info dict

Architecture

Use with stable-worldmodel

Continual learning (train.py)

Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Continual learning (`train.py`)

Packages