Android Example Gym PWA App

A minimal todo-list PWA designed as an agent evaluation environment (RL gym). Agents interact with the UI to complete tasks and are scored by a deterministic eval harness.

Use this as a lightweight reference for:

How to build a gym-compatible PWA installable on Android as a TWA
How to send state in/out via the gym API for automated evaluation
How to seed per-task state deterministically

Quick Start

pnpm install
pnpm dev        # http://localhost:3000

Or with Docker:

docker build -t todo-gym .
docker run -p 3000:3000 todo-gym

Install as Android TWA

The app ships with a /.well-known/assetlinks.json route and is ready for Bubblewrap or sb.pwa_install:

Bubblewrap

npm i -g @bubblewrap/cli
bubblewrap init --manifest https://<your-host>/manifest.json
bubblewrap build
adb install app-release-signed.apk

Set these env vars before deploying so the assetlinks fingerprint matches your keystore:

TWA_SHA256_FINGERPRINT=AA:BB:CC:...   # from `keytool -list -v -keystore android.keystore`
TWA_PACKAGE_NAME=com.cuaai.gymtodo

Via Cua Android Image

The Cua Android image has built-in PWA installation support — it handles keystore generation, Bubblewrap build, assetlinks trust, and APK install in one call:

async with Sandbox.ephemeral(Image.android().pwa_install(
    url="http://10.0.2.2:3000",
    package_name="com.cuaai.gymtodo",
)) as sb:
    ...

Gym API

Same shape as slack-env — works with any cua-bench eval harness:

Method	Endpoint	Description
`GET`	`/gym/tasks`	List all tasks
`POST`	`/gym/start/:taskId`	Seed DB and start task → `{ success, prompt, task }`
`GET`	`/gym/evaluate`	Evaluate current state → `{ success, reward, message, subResults }`
`POST`	`/gym/evaluate`	Retrieval eval — body: `{ agentAnswer }`
`POST`	`/gym/reset`	Reset DB to shared seed
`POST`	`/gym/session`	Create isolated session → `{ sessionId }`
`DELETE`	`/gym/session`	Destroy session

Pass X-Session-Id: <id> (or cookie gym_session) for isolated parallel sessions.

Eval server forwarding

docker run -p 3000:3000 -e EVAL_SERVER=http://your-server:8080 todo-gym

Posts start, evaluate, and reset events to your server — same payload schema as slack-env.

Tasks

tasks/
  primitives/          # atomic single-action or retrieval tasks
    add_item/
    complete_item/
    delete_item/
    edit_item/
    clear_completed/
    set_filter_active/
    set_filter_completed/
    add_three_items/
    count_items/        # retrieval — answer: number of items
  advanced/            # multi-step composed tasks
    add_and_complete/
    full_workflow/
  _shared/
    seed.sql           # shared baseline (empty list, filter=all)

Each task.json:

{
  "id": "add_item",
  "description": "Add a new todo item with the text: 'Buy groceries'.",
  "evalFunc": "check_item_exists",
  "weight": 1,
  "defaultParams": { "keywords": ["buy groceries"] }
}

Retrieval tasks also carry json_schema + expected_value.

Advanced tasks reference primitives by ID:

{
  "id": "add_and_complete",
  "description": "Add 'Schedule dentist appointment', then mark it completed.",
  "steps": [
    ["add_item", { "keywords": ["schedule dentist appointment"] }],
    ["complete_item", { "keywords": ["schedule dentist appointment"] }]
  ]
}

Gym Controls UI

Desktop: press ↑ to open the floating gym panel
Mobile/Android: tap the 🏋️ FAB (bottom-left) to open the panel

The panel lets you select a task, start it (seeds the DB), optionally type an agent answer for retrieval tasks, evaluate, and reset.

Adding Tasks

Create tasks/primitives/<id>/task.json with id, description, evalFunc, defaultParams
Create tasks/primitives/<id>/seed.sql with the starting DB state
If a new eval function is needed, add it to eval.mjs
For advanced tasks, create tasks/advanced/<id>/task.json with a steps array

Eval Functions

Function	Checks
`check_item_exists`	Item with matching text exists
`check_item_done`	Item with matching text is completed
`check_item_deleted`	Item with matching text no longer exists
`check_no_completed`	No completed items remain
`check_filter`	Active filter matches expected value
`check_item_count_gte`	At least N items in the list
`check_retrieval`	Agent's answer matches `expected_value`
`compose_steps`	Orchestrates multi-step advanced tasks

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
lib		lib
public		public
tasks		tasks
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
android.keystore		android.keystore
eval.mjs		eval.mjs
modal_app.py		modal_app.py
next.config.ts		next.config.ts
package.json		package.json
postcss.config.mjs		postcss.config.mjs
schema.sql		schema.sql
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Android Example Gym PWA App

Quick Start

Install as Android TWA

Bubblewrap

Via Cua Android Image

Gym API

Eval server forwarding

Tasks

Gym Controls UI

Adding Tasks

Eval Functions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Android Example Gym PWA App

Quick Start

Install as Android TWA

Bubblewrap

Via Cua Android Image

Gym API

Eval server forwarding

Tasks

Gym Controls UI

Adding Tasks

Eval Functions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages