Skip to content

trycua/android-example-gym-pwa-app

Repository files navigation

Android Example Gym PWA App

A minimal todo-list PWA designed as an agent evaluation environment (RL gym). Agents interact with the UI to complete tasks and are scored by a deterministic eval harness.

Use this as a lightweight reference for:

  • How to build a gym-compatible PWA installable on Android as a TWA
  • How to send state in/out via the gym API for automated evaluation
  • How to seed per-task state deterministically

Quick Start

pnpm install
pnpm dev        # http://localhost:3000

Or with Docker:

docker build -t todo-gym .
docker run -p 3000:3000 todo-gym

Install as Android TWA

The app ships with a /.well-known/assetlinks.json route and is ready for Bubblewrap or sb.pwa_install:

Bubblewrap

npm i -g @bubblewrap/cli
bubblewrap init --manifest https://<your-host>/manifest.json
bubblewrap build
adb install app-release-signed.apk

Set these env vars before deploying so the assetlinks fingerprint matches your keystore:

TWA_SHA256_FINGERPRINT=AA:BB:CC:...   # from `keytool -list -v -keystore android.keystore`
TWA_PACKAGE_NAME=com.cuaai.gymtodo

Via Cua Android Image

The Cua Android image has built-in PWA installation support — it handles keystore generation, Bubblewrap build, assetlinks trust, and APK install in one call:

async with Sandbox.ephemeral(Image.android().pwa_install(
    url="http://10.0.2.2:3000",
    package_name="com.cuaai.gymtodo",
)) as sb:
    ...

Gym API

Same shape as slack-env — works with any cua-bench eval harness:

Method Endpoint Description
GET /gym/tasks List all tasks
POST /gym/start/:taskId Seed DB and start task → { success, prompt, task }
GET /gym/evaluate Evaluate current state → { success, reward, message, subResults }
POST /gym/evaluate Retrieval eval — body: { agentAnswer }
POST /gym/reset Reset DB to shared seed
POST /gym/session Create isolated session → { sessionId }
DELETE /gym/session Destroy session

Pass X-Session-Id: <id> (or cookie gym_session) for isolated parallel sessions.

Eval server forwarding

docker run -p 3000:3000 -e EVAL_SERVER=http://your-server:8080 todo-gym

Posts start, evaluate, and reset events to your server — same payload schema as slack-env.


Tasks

tasks/
  primitives/          # atomic single-action or retrieval tasks
    add_item/
    complete_item/
    delete_item/
    edit_item/
    clear_completed/
    set_filter_active/
    set_filter_completed/
    add_three_items/
    count_items/        # retrieval — answer: number of items
  advanced/            # multi-step composed tasks
    add_and_complete/
    full_workflow/
  _shared/
    seed.sql           # shared baseline (empty list, filter=all)

Each task.json:

{
  "id": "add_item",
  "description": "Add a new todo item with the text: 'Buy groceries'.",
  "evalFunc": "check_item_exists",
  "weight": 1,
  "defaultParams": { "keywords": ["buy groceries"] }
}

Retrieval tasks also carry json_schema + expected_value.

Advanced tasks reference primitives by ID:

{
  "id": "add_and_complete",
  "description": "Add 'Schedule dentist appointment', then mark it completed.",
  "steps": [
    ["add_item", { "keywords": ["schedule dentist appointment"] }],
    ["complete_item", { "keywords": ["schedule dentist appointment"] }]
  ]
}

Gym Controls UI

  • Desktop: press to open the floating gym panel
  • Mobile/Android: tap the 🏋️ FAB (bottom-left) to open the panel

The panel lets you select a task, start it (seeds the DB), optionally type an agent answer for retrieval tasks, evaluate, and reset.


Adding Tasks

  1. Create tasks/primitives/<id>/task.json with id, description, evalFunc, defaultParams
  2. Create tasks/primitives/<id>/seed.sql with the starting DB state
  3. If a new eval function is needed, add it to eval.mjs
  4. For advanced tasks, create tasks/advanced/<id>/task.json with a steps array

Eval Functions

Function Checks
check_item_exists Item with matching text exists
check_item_done Item with matching text is completed
check_item_deleted Item with matching text no longer exists
check_no_completed No completed items remain
check_filter Active filter matches expected value
check_item_count_gte At least N items in the list
check_retrieval Agent's answer matches expected_value
compose_steps Orchestrates multi-step advanced tasks

About

Todo-list PWA gym for agent evaluation — installable as Android TWA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors