Skip to content

NakulSingh156/edtriage-env

Repository files navigation

title EDTriage-Env
emoji 🏥
colorFrom red
colorTo blue
sdk docker
pinned false
tags
openenv
healthcare
reinforcement-learning
triage
simulation

🏥 EDTriage-Env — Emergency Department Triage Simulation

OpenEnv Python HF Space License

An OpenEnv-compliant reinforcement learning environment that simulates a full Emergency Department. AI agents act as triage nurses — classifying patients using the ESI v4 clinical protocol, allocating scarce beds across 3 zones, managing ventilators and monitors, detecting patient deterioration, and coordinating mass casualty surge responses.

Every action has clinical consequences. Under-triage a STEMI? The patient deteriorates. Assign a sprained ankle to critical care? You waste a bed someone else needs. The dense reward signal teaches agents to think like expert ER physicians.


🎮 Live Demo

→ Launch the Command Center

The interactive dashboard lets judges experience the environment directly:

  • Watch the GPT-4o agent make real-time clinical decisions with visible reasoning
  • See the live bed board fill with color-coded patients (ESI-1 red → ESI-5 blue)
  • Observe the surge alarm fire when mass casualties arrive
  • Generate a clinical handoff report powered by GPT-4o-mini

⚡ Quick Start (Python Client)

pip install httpx
# Clone or copy edtriage_client/ from this repo
from edtriage_client import EDTriageEnv

# Connect to the live Space
env = EDTriageEnv(base_url="https://nakulsinghcr7-edtriage-env.hf.space")

# Reset for Task 1 (Routine Shift)
obs = env.reset(task_id="task_1_routine_shift", seed=42)

# Run an episode
while not env.done:
    pending = obs.get("pending_actions", [])
    if pending:
        result = env.triage(pending[0], esi_level=3)
    else:
        result = env.wait()
    obs = result["observation"]

# Grade the episode
score = env.grade()
print(f"Score: {score['score']:.4f}")
# Score: 0.7340

🎯 Why This Environment?

Emergency Department triage is one of the highest-stakes real-time decision domains that exists. Triage nurses must:

  • Classify patient acuity under time pressure with incomplete information
  • Allocate finite beds to competing needs (only 40 beds, 52 patients in Task 3)
  • Monitor deterioration across an entire waiting room simultaneously
  • Make displacement decisions when critical beds fill
  • Coordinate system-wide surge responses in mass casualty events

Under-triaging a heart attack is 3× worse than over-triaging a sprain. This clinical consequence asymmetry is encoded directly in the reward function, making this environment uniquely suited for training agents with medically meaningful priorities.


📐 Architecture

edtriage-env/
├── app/
│   ├── main.py              # FastAPI — 15 endpoints (6 OpenEnv + 9 extended)
│   ├── env.py               # EDTriageEnv — step/reset/state
│   ├── models.py            # Pydantic v2 typed models
│   ├── patient_gen.py       # Synthetic patient generator (216 clinical cases)
│   ├── deterioration.py     # Vitals tick engine (probabilistic deterioration)
│   ├── bed_manager.py       # 3-zone bed allocation + displacement logic
│   ├── resource_mgr.py      # Scarce resource pools (ventilators, monitors)
│   ├── reward.py            # Dense reward calculator (8 components)
│   └── graders/
│       ├── task1_grader.py  # Routine shift (easy)
│       ├── task2_grader.py  # Capacity crunch (medium)
│       └── task3_grader.py  # Mass casualty (hard)
├── server/
│   └── app.py               # OpenEnv server entry point
├── edtriage_client/
│   └── env.py               # Python client library (pip-installable)
├── static/
│   └── dashboard.html       # Premium ICU command center UI
├── baseline/
│   └── run_baseline.py      # Heuristic + GPT-4o baseline agent
├── data/
│   └── cases.json           # 216 clinical cases (ESI 1-5)
├── tests/
│   └── test_env.py          # 15 smoke tests (all passing)
├── openenv.yaml             # OpenEnv spec metadata
├── pyproject.toml           # Package configuration
└── Dockerfile               # Docker SDK deployment

🔄 Observation Space

Field Type Description
current_patients list[Patient] Active patients with vitals, ESI, wait time, resources
department_state DepartmentState Bed occupancy (3 zones), resource pools, queue, surge status
pending_actions list[str] Patient IDs requiring triage or bed assignment
alerts list[str] Deterioration warnings, surge threshold alerts
sim_time float Simulation time in minutes

Patient Fields

Each patient includes: id, name, age, sex, chief_complaint, vitals (HR, BP, RR, SpO2, temp, pain, GCS), arrival_mode, comorbidities, deterioration_risk, assigned_esi, assigned_bed, has_deteriorated.


🎮 Action Space

Action Type Required Fields Description
triage patient_id, esi_level (1-5) Assign ESI acuity level
assign_bed patient_id, zone Assign to critical/observation/fast_track
order_diagnostic patient_id, diagnostic Order CBC, troponin, CT, ECG, etc.
assign_resource patient_id, resource Assign ventilator/cardiac_monitor/portable_imaging
release_resource patient_id, resource Free a resource
discharge patient_id Discharge patient
transfer patient_id Inter-hospital transfer
activate_surge Activate MCI surge protocol
displace_patient displace_patient_id Move patient to free a critical bed
wait Do nothing this step

📋 Tasks

Task Difficulty Patients Key Challenge Baseline Score
Routine ER Shift 🟢 Easy 40 over 8hrs Standard triage + 1 deterioration event 0.7340
Capacity Crunch 🟡 Medium 35 + 3 ESI-1 Critical zone 10/12 full, displacement decisions 0.5737
Mass Casualty Surge 🔴 Hard 30 + 22 surge 22 patients in 12 min, 4 deteriorations, surge protocol 0.6436

Scores are deterministic — same seed always produces the same results. Verified across 5 independent runs.


💰 Reward Function (Dense, Not Sparse)

Signal at every step. Encodes clinical consequence asymmetry:

Component Value Clinical Rationale
Correct ESI assignment +0.40 Accurate triage is the core skill
Correct zone + bed +0.25 Right placement prevents deterioration
Deterioration caught early +0.15 Early intervention saves lives
Under-triage critical patient −0.30 × severity Missing a STEMI is 3× worse than over-triaging
Missed deterioration −0.60 Patient could die unmonitored in queue
Resource allocation ±0.10 Critical patients need ventilators
Throughput efficiency ±0.10 Waits increase downstream risk
Surge protocol timing ±0.15 Activate too late = avoidable harm

🏆 Baseline Scores

Reproducible scores using the improved heuristic agent (seed=42):

task_1_routine_shift:   0.7340  (ESI accuracy: 75.9%, bed accuracy: 84.9%)
task_2_capacity_crunch: 0.5737  (ESI accuracy: 50.0%, bed decisions: 96.8%)
task_3_mass_casualty:   0.6436  (ESI accuracy: 71.8%, surge timing: activated)

Run the baseline yourself:

python baseline/run_baseline.py
# With GPT-4o (set OPENAI_API_KEY):
OPENAI_API_KEY=sk-... python baseline/run_baseline.py

🌟 Key Features

🛏️ Priority Bed Allocation with Displacement

40 beds across 3 clinical zones (Critical 12, Observation 18, Fast Track 10). When critical zone is full, the agent must choose which lower-acuity patient to displace to make room for ESI-1 arrivals — a genuinely hard, morally-weighted decision that no toy environment captures.

📉 Patient Deterioration Engine

Vitals tick every simulated minute. Probabilistic deterioration curves per chief complaint. An ESI-3 patient with pneumonia can flip to ESI-1 if left waiting too long. The agent must continuously scan the queue, not just process the next patient in line.

🚨 Mass Casualty Surge

22 patients arrive in 12 minutes in Task 3. The agent must activate the surge protocol at exactly the right threshold — too early wastes resources, too late causes preventable harm. 4 simultaneous deteriorations occur mid-surge.

🔧 Scarce Resource Allocation

4 ventilators, 8 cardiac monitors, 3 portable imaging units. The agent decides who gets scarce resources. Denying a critical patient a ventilator incurs a severe penalty; leaving resources unused wastes capacity.

🧠 GPT-4o Live Agent Mode

The dashboard includes a live GPT-4o agent that streams its clinical reasoning for each decision. Judges can watch a frontier model think through "should I displace this ESI-3 patient to make room for the incoming STEMI?"

📋 AI Clinical Handoff Report

At episode end, clicking "Report" generates a clinical shift handoff document powered by GPT-4o-mini — listing all patients, outcomes, deterioration events, and recommendations for the incoming team.


🔌 API Endpoints

OpenEnv Standard Endpoints

Endpoint Method Description
/reset POST {"task_id": "task_1_routine_shift", "seed": 42}
/step POST {"action_type": "triage", "patient_id": "PT-0001", "esi_level": 3}
/state GET Full environment state snapshot
/tasks GET List tasks with action schema
/grader POST {"task_id": "task_1_routine_shift"}
/baseline GET Run heuristic baseline, return scores

OpenEnv Validation Endpoints

Endpoint Method Description
/health GET Returns {"status": "healthy"}
/metadata GET Environment name, description, task list
/schema GET Action, observation, and state schemas
/mcp POST JSON-RPC 2.0 MCP interface (tools/list, tools/call)

Extended Dashboard Endpoints

Endpoint Method Description
/api/dashboard GET Rich structured state for UI (beds, queue, resources, alerts)
/step/heuristic POST One heuristic agent step
/step/llm POST One GPT-4o agent step with reasoning
/report POST Generate clinical handoff report (GPT-4o-mini)

🚀 Setup & Usage

Local Development

git clone https://huggingface.co/spaces/NakulSinghCR7/edtriage-env
cd edtriage-env
pip install -r requirements.txt
python data/generate_cases.py
pytest tests/ -v                          # 15 tests, all passing
uvicorn app.main:app --port 7860          # Start API server

Docker

docker build -t edtriage-env .
docker run -p 7860:7860 edtriage-env
# With GPT-4o for AI agent mode:
docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... edtriage-env

curl Examples

# Reset
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "task_1_routine_shift", "seed": 42}'

# Triage a patient
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "triage", "patient_id": "PT-0001", "esi_level": 2}'

# Grade episode
curl -X POST http://localhost:7860/grader \
  -H "Content-Type: application/json" \
  -d '{"task_id": "task_1_routine_shift"}'

# Run baseline
curl http://localhost:7860/baseline

🔬 Clinical Accuracy

This environment implements the Emergency Severity Index v4 (ESI v4) protocol as documented in the official ESI Implementation Handbook. The grading rubric is not invented — it follows the published clinical standard:

  • ESI-1: Immediate life-saving intervention (cardiac arrest, respiratory failure)
  • ESI-2: High-risk situation, confused/lethargic/severe pain, vital sign abnormality
  • ESI-3: Stable but needs ≥2 resources (imaging + labs)
  • ESI-4: Stable, needs 1 resource (single X-ray)
  • ESI-5: No resources needed (prescription refill, suture removal)

The 216-case clinical case bank covers real ED presentations: STEMI, acute stroke, sepsis, DKA, anaphylaxis, appendicitis, ruptured ectopic, opioid overdose, hip fracture, tension pneumothorax, drowning, severe burns, and more.


🏗️ Environment Design Highlights

Dense reward, not sparse: Signal at every step. Agents get immediate feedback on every clinical decision, not just at episode end. This makes the environment significantly more trainable than binary-outcome alternatives.

Clinical asymmetry encoded: Under-triaging a critical patient (−0.30 to −0.60) is penalized far more than over-triaging (−0.10). This reflects real clinical ethics — it is categorically worse to miss a heart attack than to be overcautious.

Multi-objective with real trade-offs: The agent must simultaneously optimize ESI accuracy, bed placement, resource allocation, throughput, and deterioration monitoring. These objectives genuinely conflict in constrained scenarios, forcing the agent to develop clinical prioritization strategies.

Deterministic with reproducible seeds: env.reset(task_id=..., seed=42) always produces the same patient sequence, enabling reproducible evaluation and fair comparison between agents.


📄 License

MIT — built for the Meta × Scaler OpenEnv Hackathon 2026.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages