🏥 EDTriage-Env — Emergency Department Triage Simulation

title

EDTriage-Env

emoji

🏥

colorFrom

red

colorTo

blue

sdk

docker

pinned

false

🏥 EDTriage-Env — Emergency Department Triage Simulation

An OpenEnv-compliant reinforcement learning environment that simulates a full Emergency Department. AI agents act as triage nurses — classifying patients using the ESI v4 clinical protocol, allocating scarce beds across 3 zones, managing ventilators and monitors, detecting patient deterioration, and coordinating mass casualty surge responses.

Every action has clinical consequences. Under-triage a STEMI? The patient deteriorates. Assign a sprained ankle to critical care? You waste a bed someone else needs. The dense reward signal teaches agents to think like expert ER physicians.

🎮 Live Demo

→ Launch the Command Center

The interactive dashboard lets judges experience the environment directly:

Watch the GPT-4o agent make real-time clinical decisions with visible reasoning
See the live bed board fill with color-coded patients (ESI-1 red → ESI-5 blue)
Observe the surge alarm fire when mass casualties arrive
Generate a clinical handoff report powered by GPT-4o-mini

⚡ Quick Start (Python Client)

pip install httpx
# Clone or copy edtriage_client/ from this repo

from edtriage_client import EDTriageEnv

# Connect to the live Space
env = EDTriageEnv(base_url="https://nakulsinghcr7-edtriage-env.hf.space")

# Reset for Task 1 (Routine Shift)
obs = env.reset(task_id="task_1_routine_shift", seed=42)

# Run an episode
while not env.done:
    pending = obs.get("pending_actions", [])
    if pending:
        result = env.triage(pending[0], esi_level=3)
    else:
        result = env.wait()
    obs = result["observation"]

# Grade the episode
score = env.grade()
print(f"Score: {score['score']:.4f}")
# Score: 0.7340

🎯 Why This Environment?

Emergency Department triage is one of the highest-stakes real-time decision domains that exists. Triage nurses must:

Classify patient acuity under time pressure with incomplete information
Allocate finite beds to competing needs (only 40 beds, 52 patients in Task 3)
Monitor deterioration across an entire waiting room simultaneously
Make displacement decisions when critical beds fill
Coordinate system-wide surge responses in mass casualty events

Under-triaging a heart attack is 3× worse than over-triaging a sprain. This clinical consequence asymmetry is encoded directly in the reward function, making this environment uniquely suited for training agents with medically meaningful priorities.

📐 Architecture

edtriage-env/
├── app/
│   ├── main.py              # FastAPI — 15 endpoints (6 OpenEnv + 9 extended)
│   ├── env.py               # EDTriageEnv — step/reset/state
│   ├── models.py            # Pydantic v2 typed models
│   ├── patient_gen.py       # Synthetic patient generator (216 clinical cases)
│   ├── deterioration.py     # Vitals tick engine (probabilistic deterioration)
│   ├── bed_manager.py       # 3-zone bed allocation + displacement logic
│   ├── resource_mgr.py      # Scarce resource pools (ventilators, monitors)
│   ├── reward.py            # Dense reward calculator (8 components)
│   └── graders/
│       ├── task1_grader.py  # Routine shift (easy)
│       ├── task2_grader.py  # Capacity crunch (medium)
│       └── task3_grader.py  # Mass casualty (hard)
├── server/
│   └── app.py               # OpenEnv server entry point
├── edtriage_client/
│   └── env.py               # Python client library (pip-installable)
├── static/
│   └── dashboard.html       # Premium ICU command center UI
├── baseline/
│   └── run_baseline.py      # Heuristic + GPT-4o baseline agent
├── data/
│   └── cases.json           # 216 clinical cases (ESI 1-5)
├── tests/
│   └── test_env.py          # 15 smoke tests (all passing)
├── openenv.yaml             # OpenEnv spec metadata
├── pyproject.toml           # Package configuration
└── Dockerfile               # Docker SDK deployment

🔄 Observation Space

Field	Type	Description
`current_patients`	`list[Patient]`	Active patients with vitals, ESI, wait time, resources
`department_state`	`DepartmentState`	Bed occupancy (3 zones), resource pools, queue, surge status
`pending_actions`	`list[str]`	Patient IDs requiring triage or bed assignment
`alerts`	`list[str]`	Deterioration warnings, surge threshold alerts
`sim_time`	`float`	Simulation time in minutes

Patient Fields

Each patient includes: id, name, age, sex, chief_complaint, vitals (HR, BP, RR, SpO2, temp, pain, GCS), arrival_mode, comorbidities, deterioration_risk, assigned_esi, assigned_bed, has_deteriorated.

🎮 Action Space

Action Type	Required Fields	Description
`triage`	`patient_id`, `esi_level` (1-5)	Assign ESI acuity level
`assign_bed`	`patient_id`, `zone`	Assign to critical/observation/fast_track
`order_diagnostic`	`patient_id`, `diagnostic`	Order CBC, troponin, CT, ECG, etc.
`assign_resource`	`patient_id`, `resource`	Assign ventilator/cardiac_monitor/portable_imaging
`release_resource`	`patient_id`, `resource`	Free a resource
`discharge`	`patient_id`	Discharge patient
`transfer`	`patient_id`	Inter-hospital transfer
`activate_surge`	—	Activate MCI surge protocol
`displace_patient`	`displace_patient_id`	Move patient to free a critical bed
`wait`	—	Do nothing this step

📋 Tasks

Task	Difficulty	Patients	Key Challenge	Baseline Score
Routine ER Shift	🟢 Easy	40 over 8hrs	Standard triage + 1 deterioration event	0.7340
Capacity Crunch	🟡 Medium	35 + 3 ESI-1	Critical zone 10/12 full, displacement decisions	0.5737
Mass Casualty Surge	🔴 Hard	30 + 22 surge	22 patients in 12 min, 4 deteriorations, surge protocol	0.6436

Scores are deterministic — same seed always produces the same results. Verified across 5 independent runs.

💰 Reward Function (Dense, Not Sparse)

Signal at every step. Encodes clinical consequence asymmetry:

Component	Value	Clinical Rationale
Correct ESI assignment	+0.40	Accurate triage is the core skill
Correct zone + bed	+0.25	Right placement prevents deterioration
Deterioration caught early	+0.15	Early intervention saves lives
Under-triage critical patient	−0.30 × severity	Missing a STEMI is 3× worse than over-triaging
Missed deterioration	−0.60	Patient could die unmonitored in queue
Resource allocation	±0.10	Critical patients need ventilators
Throughput efficiency	±0.10	Waits increase downstream risk
Surge protocol timing	±0.15	Activate too late = avoidable harm

🏆 Baseline Scores

Reproducible scores using the improved heuristic agent (seed=42):

task_1_routine_shift:   0.7340  (ESI accuracy: 75.9%, bed accuracy: 84.9%)
task_2_capacity_crunch: 0.5737  (ESI accuracy: 50.0%, bed decisions: 96.8%)
task_3_mass_casualty:   0.6436  (ESI accuracy: 71.8%, surge timing: activated)

Run the baseline yourself:

python baseline/run_baseline.py
# With GPT-4o (set OPENAI_API_KEY):
OPENAI_API_KEY=sk-... python baseline/run_baseline.py

🌟 Key Features

🛏️ Priority Bed Allocation with Displacement

40 beds across 3 clinical zones (Critical 12, Observation 18, Fast Track 10). When critical zone is full, the agent must choose which lower-acuity patient to displace to make room for ESI-1 arrivals — a genuinely hard, morally-weighted decision that no toy environment captures.

📉 Patient Deterioration Engine

Vitals tick every simulated minute. Probabilistic deterioration curves per chief complaint. An ESI-3 patient with pneumonia can flip to ESI-1 if left waiting too long. The agent must continuously scan the queue, not just process the next patient in line.

🚨 Mass Casualty Surge

22 patients arrive in 12 minutes in Task 3. The agent must activate the surge protocol at exactly the right threshold — too early wastes resources, too late causes preventable harm. 4 simultaneous deteriorations occur mid-surge.

🔧 Scarce Resource Allocation

4 ventilators, 8 cardiac monitors, 3 portable imaging units. The agent decides who gets scarce resources. Denying a critical patient a ventilator incurs a severe penalty; leaving resources unused wastes capacity.

🧠 GPT-4o Live Agent Mode

The dashboard includes a live GPT-4o agent that streams its clinical reasoning for each decision. Judges can watch a frontier model think through "should I displace this ESI-3 patient to make room for the incoming STEMI?"

📋 AI Clinical Handoff Report

At episode end, clicking "Report" generates a clinical shift handoff document powered by GPT-4o-mini — listing all patients, outcomes, deterioration events, and recommendations for the incoming team.

🔌 API Endpoints

OpenEnv Standard Endpoints

Endpoint	Method	Description
`/reset`	POST	`{"task_id": "task_1_routine_shift", "seed": 42}`
`/step`	POST	`{"action_type": "triage", "patient_id": "PT-0001", "esi_level": 3}`
`/state`	GET	Full environment state snapshot
`/tasks`	GET	List tasks with action schema
`/grader`	POST	`{"task_id": "task_1_routine_shift"}`
`/baseline`	GET	Run heuristic baseline, return scores

OpenEnv Validation Endpoints

Endpoint	Method	Description
`/health`	GET	Returns `{"status": "healthy"}`
`/metadata`	GET	Environment name, description, task list
`/schema`	GET	Action, observation, and state schemas
`/mcp`	POST	JSON-RPC 2.0 MCP interface (tools/list, tools/call)

Extended Dashboard Endpoints

Endpoint	Method	Description
`/api/dashboard`	GET	Rich structured state for UI (beds, queue, resources, alerts)
`/step/heuristic`	POST	One heuristic agent step
`/step/llm`	POST	One GPT-4o agent step with reasoning
`/report`	POST	Generate clinical handoff report (GPT-4o-mini)

🚀 Setup & Usage

Local Development

git clone https://huggingface.co/spaces/NakulSinghCR7/edtriage-env
cd edtriage-env
pip install -r requirements.txt
python data/generate_cases.py
pytest tests/ -v                          # 15 tests, all passing
uvicorn app.main:app --port 7860          # Start API server

Docker

docker build -t edtriage-env .
docker run -p 7860:7860 edtriage-env
# With GPT-4o for AI agent mode:
docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... edtriage-env

curl Examples

# Reset
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "task_1_routine_shift", "seed": 42}'

# Triage a patient
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "triage", "patient_id": "PT-0001", "esi_level": 2}'

# Grade episode
curl -X POST http://localhost:7860/grader \
  -H "Content-Type: application/json" \
  -d '{"task_id": "task_1_routine_shift"}'

# Run baseline
curl http://localhost:7860/baseline

🔬 Clinical Accuracy

This environment implements the Emergency Severity Index v4 (ESI v4) protocol as documented in the official ESI Implementation Handbook. The grading rubric is not invented — it follows the published clinical standard:

ESI-1: Immediate life-saving intervention (cardiac arrest, respiratory failure)
ESI-2: High-risk situation, confused/lethargic/severe pain, vital sign abnormality
ESI-3: Stable but needs ≥2 resources (imaging + labs)
ESI-4: Stable, needs 1 resource (single X-ray)
ESI-5: No resources needed (prescription refill, suture removal)

The 216-case clinical case bank covers real ED presentations: STEMI, acute stroke, sepsis, DKA, anaphylaxis, appendicitis, ruptured ectopic, opioid overdose, hip fracture, tension pneumothorax, drowning, severe burns, and more.

🏗️ Environment Design Highlights

Dense reward, not sparse: Signal at every step. Agents get immediate feedback on every clinical decision, not just at episode end. This makes the environment significantly more trainable than binary-outcome alternatives.

Clinical asymmetry encoded: Under-triaging a critical patient (−0.30 to −0.60) is penalized far more than over-triaging (−0.10). This reflects real clinical ethics — it is categorically worse to miss a heart attack than to be overcautious.

Multi-objective with real trade-offs: The agent must simultaneously optimize ESI accuracy, bed placement, resource allocation, throughput, and deterioration monitoring. These objectives genuinely conflict in constrained scenarios, forcing the agent to develop clinical prioritization strategies.

Deterministic with reproducible seeds: env.reset(task_id=..., seed=42) always produces the same patient sequence, enabling reproducible evaluation and fair comparison between agents.

📄 License

MIT — built for the Meta × Scaler OpenEnv Hackathon 2026.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
app		app
baseline		baseline
data		data
demo		demo
edtriage_client		edtriage_client
server		server
static		static
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
README_client.md		README_client.md
app.py		app.py
fix_csp.py		fix_csp.py
inference.py		inference.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🏥 EDTriage-Env — Emergency Department Triage Simulation

🎮 Live Demo

⚡ Quick Start (Python Client)

🎯 Why This Environment?

📐 Architecture

🔄 Observation Space

Patient Fields

🎮 Action Space

📋 Tasks

💰 Reward Function (Dense, Not Sparse)

🏆 Baseline Scores

🌟 Key Features

🛏️ Priority Bed Allocation with Displacement

📉 Patient Deterioration Engine

🚨 Mass Casualty Surge

🔧 Scarce Resource Allocation

🧠 GPT-4o Live Agent Mode

📋 AI Clinical Handoff Report

🔌 API Endpoints

OpenEnv Standard Endpoints

OpenEnv Validation Endpoints

Extended Dashboard Endpoints

🚀 Setup & Usage

Local Development

Docker

curl Examples

🔬 Clinical Accuracy

🏗️ Environment Design Highlights

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages