| title | EDTriage-Env | |||||
|---|---|---|---|---|---|---|
| emoji | 🏥 | |||||
| colorFrom | red | |||||
| colorTo | blue | |||||
| sdk | docker | |||||
| pinned | false | |||||
| tags |
|
An OpenEnv-compliant reinforcement learning environment that simulates a full Emergency Department. AI agents act as triage nurses — classifying patients using the ESI v4 clinical protocol, allocating scarce beds across 3 zones, managing ventilators and monitors, detecting patient deterioration, and coordinating mass casualty surge responses.
Every action has clinical consequences. Under-triage a STEMI? The patient deteriorates. Assign a sprained ankle to critical care? You waste a bed someone else needs. The dense reward signal teaches agents to think like expert ER physicians.
The interactive dashboard lets judges experience the environment directly:
- Watch the GPT-4o agent make real-time clinical decisions with visible reasoning
- See the live bed board fill with color-coded patients (ESI-1 red → ESI-5 blue)
- Observe the surge alarm fire when mass casualties arrive
- Generate a clinical handoff report powered by GPT-4o-mini
pip install httpx
# Clone or copy edtriage_client/ from this repofrom edtriage_client import EDTriageEnv
# Connect to the live Space
env = EDTriageEnv(base_url="https://nakulsinghcr7-edtriage-env.hf.space")
# Reset for Task 1 (Routine Shift)
obs = env.reset(task_id="task_1_routine_shift", seed=42)
# Run an episode
while not env.done:
pending = obs.get("pending_actions", [])
if pending:
result = env.triage(pending[0], esi_level=3)
else:
result = env.wait()
obs = result["observation"]
# Grade the episode
score = env.grade()
print(f"Score: {score['score']:.4f}")
# Score: 0.7340Emergency Department triage is one of the highest-stakes real-time decision domains that exists. Triage nurses must:
- Classify patient acuity under time pressure with incomplete information
- Allocate finite beds to competing needs (only 40 beds, 52 patients in Task 3)
- Monitor deterioration across an entire waiting room simultaneously
- Make displacement decisions when critical beds fill
- Coordinate system-wide surge responses in mass casualty events
Under-triaging a heart attack is 3× worse than over-triaging a sprain. This clinical consequence asymmetry is encoded directly in the reward function, making this environment uniquely suited for training agents with medically meaningful priorities.
edtriage-env/
├── app/
│ ├── main.py # FastAPI — 15 endpoints (6 OpenEnv + 9 extended)
│ ├── env.py # EDTriageEnv — step/reset/state
│ ├── models.py # Pydantic v2 typed models
│ ├── patient_gen.py # Synthetic patient generator (216 clinical cases)
│ ├── deterioration.py # Vitals tick engine (probabilistic deterioration)
│ ├── bed_manager.py # 3-zone bed allocation + displacement logic
│ ├── resource_mgr.py # Scarce resource pools (ventilators, monitors)
│ ├── reward.py # Dense reward calculator (8 components)
│ └── graders/
│ ├── task1_grader.py # Routine shift (easy)
│ ├── task2_grader.py # Capacity crunch (medium)
│ └── task3_grader.py # Mass casualty (hard)
├── server/
│ └── app.py # OpenEnv server entry point
├── edtriage_client/
│ └── env.py # Python client library (pip-installable)
├── static/
│ └── dashboard.html # Premium ICU command center UI
├── baseline/
│ └── run_baseline.py # Heuristic + GPT-4o baseline agent
├── data/
│ └── cases.json # 216 clinical cases (ESI 1-5)
├── tests/
│ └── test_env.py # 15 smoke tests (all passing)
├── openenv.yaml # OpenEnv spec metadata
├── pyproject.toml # Package configuration
└── Dockerfile # Docker SDK deployment
| Field | Type | Description |
|---|---|---|
current_patients |
list[Patient] |
Active patients with vitals, ESI, wait time, resources |
department_state |
DepartmentState |
Bed occupancy (3 zones), resource pools, queue, surge status |
pending_actions |
list[str] |
Patient IDs requiring triage or bed assignment |
alerts |
list[str] |
Deterioration warnings, surge threshold alerts |
sim_time |
float |
Simulation time in minutes |
Each patient includes: id, name, age, sex, chief_complaint, vitals (HR, BP, RR, SpO2, temp, pain, GCS), arrival_mode, comorbidities, deterioration_risk, assigned_esi, assigned_bed, has_deteriorated.
| Action Type | Required Fields | Description |
|---|---|---|
triage |
patient_id, esi_level (1-5) |
Assign ESI acuity level |
assign_bed |
patient_id, zone |
Assign to critical/observation/fast_track |
order_diagnostic |
patient_id, diagnostic |
Order CBC, troponin, CT, ECG, etc. |
assign_resource |
patient_id, resource |
Assign ventilator/cardiac_monitor/portable_imaging |
release_resource |
patient_id, resource |
Free a resource |
discharge |
patient_id |
Discharge patient |
transfer |
patient_id |
Inter-hospital transfer |
activate_surge |
— | Activate MCI surge protocol |
displace_patient |
displace_patient_id |
Move patient to free a critical bed |
wait |
— | Do nothing this step |
| Task | Difficulty | Patients | Key Challenge | Baseline Score |
|---|---|---|---|---|
| Routine ER Shift | 🟢 Easy | 40 over 8hrs | Standard triage + 1 deterioration event | 0.7340 |
| Capacity Crunch | 🟡 Medium | 35 + 3 ESI-1 | Critical zone 10/12 full, displacement decisions | 0.5737 |
| Mass Casualty Surge | 🔴 Hard | 30 + 22 surge | 22 patients in 12 min, 4 deteriorations, surge protocol | 0.6436 |
Scores are deterministic — same seed always produces the same results. Verified across 5 independent runs.
Signal at every step. Encodes clinical consequence asymmetry:
| Component | Value | Clinical Rationale |
|---|---|---|
| Correct ESI assignment | +0.40 | Accurate triage is the core skill |
| Correct zone + bed | +0.25 | Right placement prevents deterioration |
| Deterioration caught early | +0.15 | Early intervention saves lives |
| Under-triage critical patient | −0.30 × severity | Missing a STEMI is 3× worse than over-triaging |
| Missed deterioration | −0.60 | Patient could die unmonitored in queue |
| Resource allocation | ±0.10 | Critical patients need ventilators |
| Throughput efficiency | ±0.10 | Waits increase downstream risk |
| Surge protocol timing | ±0.15 | Activate too late = avoidable harm |
Reproducible scores using the improved heuristic agent (seed=42):
task_1_routine_shift: 0.7340 (ESI accuracy: 75.9%, bed accuracy: 84.9%)
task_2_capacity_crunch: 0.5737 (ESI accuracy: 50.0%, bed decisions: 96.8%)
task_3_mass_casualty: 0.6436 (ESI accuracy: 71.8%, surge timing: activated)
Run the baseline yourself:
python baseline/run_baseline.py
# With GPT-4o (set OPENAI_API_KEY):
OPENAI_API_KEY=sk-... python baseline/run_baseline.py40 beds across 3 clinical zones (Critical 12, Observation 18, Fast Track 10). When critical zone is full, the agent must choose which lower-acuity patient to displace to make room for ESI-1 arrivals — a genuinely hard, morally-weighted decision that no toy environment captures.
Vitals tick every simulated minute. Probabilistic deterioration curves per chief complaint. An ESI-3 patient with pneumonia can flip to ESI-1 if left waiting too long. The agent must continuously scan the queue, not just process the next patient in line.
22 patients arrive in 12 minutes in Task 3. The agent must activate the surge protocol at exactly the right threshold — too early wastes resources, too late causes preventable harm. 4 simultaneous deteriorations occur mid-surge.
4 ventilators, 8 cardiac monitors, 3 portable imaging units. The agent decides who gets scarce resources. Denying a critical patient a ventilator incurs a severe penalty; leaving resources unused wastes capacity.
The dashboard includes a live GPT-4o agent that streams its clinical reasoning for each decision. Judges can watch a frontier model think through "should I displace this ESI-3 patient to make room for the incoming STEMI?"
At episode end, clicking "Report" generates a clinical shift handoff document powered by GPT-4o-mini — listing all patients, outcomes, deterioration events, and recommendations for the incoming team.
| Endpoint | Method | Description |
|---|---|---|
/reset |
POST | {"task_id": "task_1_routine_shift", "seed": 42} |
/step |
POST | {"action_type": "triage", "patient_id": "PT-0001", "esi_level": 3} |
/state |
GET | Full environment state snapshot |
/tasks |
GET | List tasks with action schema |
/grader |
POST | {"task_id": "task_1_routine_shift"} |
/baseline |
GET | Run heuristic baseline, return scores |
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Returns {"status": "healthy"} |
/metadata |
GET | Environment name, description, task list |
/schema |
GET | Action, observation, and state schemas |
/mcp |
POST | JSON-RPC 2.0 MCP interface (tools/list, tools/call) |
| Endpoint | Method | Description |
|---|---|---|
/api/dashboard |
GET | Rich structured state for UI (beds, queue, resources, alerts) |
/step/heuristic |
POST | One heuristic agent step |
/step/llm |
POST | One GPT-4o agent step with reasoning |
/report |
POST | Generate clinical handoff report (GPT-4o-mini) |
git clone https://huggingface.co/spaces/NakulSinghCR7/edtriage-env
cd edtriage-env
pip install -r requirements.txt
python data/generate_cases.py
pytest tests/ -v # 15 tests, all passing
uvicorn app.main:app --port 7860 # Start API serverdocker build -t edtriage-env .
docker run -p 7860:7860 edtriage-env
# With GPT-4o for AI agent mode:
docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... edtriage-env# Reset
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "task_1_routine_shift", "seed": 42}'
# Triage a patient
curl -X POST http://localhost:7860/step \
-H "Content-Type: application/json" \
-d '{"action_type": "triage", "patient_id": "PT-0001", "esi_level": 2}'
# Grade episode
curl -X POST http://localhost:7860/grader \
-H "Content-Type: application/json" \
-d '{"task_id": "task_1_routine_shift"}'
# Run baseline
curl http://localhost:7860/baselineThis environment implements the Emergency Severity Index v4 (ESI v4) protocol as documented in the official ESI Implementation Handbook. The grading rubric is not invented — it follows the published clinical standard:
- ESI-1: Immediate life-saving intervention (cardiac arrest, respiratory failure)
- ESI-2: High-risk situation, confused/lethargic/severe pain, vital sign abnormality
- ESI-3: Stable but needs ≥2 resources (imaging + labs)
- ESI-4: Stable, needs 1 resource (single X-ray)
- ESI-5: No resources needed (prescription refill, suture removal)
The 216-case clinical case bank covers real ED presentations: STEMI, acute stroke, sepsis, DKA, anaphylaxis, appendicitis, ruptured ectopic, opioid overdose, hip fracture, tension pneumothorax, drowning, severe burns, and more.
Dense reward, not sparse: Signal at every step. Agents get immediate feedback on every clinical decision, not just at episode end. This makes the environment significantly more trainable than binary-outcome alternatives.
Clinical asymmetry encoded: Under-triaging a critical patient (−0.30 to −0.60) is penalized far more than over-triaging (−0.10). This reflects real clinical ethics — it is categorically worse to miss a heart attack than to be overcautious.
Multi-objective with real trade-offs: The agent must simultaneously optimize ESI accuracy, bed placement, resource allocation, throughput, and deterioration monitoring. These objectives genuinely conflict in constrained scenarios, forcing the agent to develop clinical prioritization strategies.
Deterministic with reproducible seeds: env.reset(task_id=..., seed=42) always produces the same patient sequence, enabling reproducible evaluation and fair comparison between agents.
MIT — built for the Meta × Scaler OpenEnv Hackathon 2026.