An Adaptive Foundation Model with Evidence-based Clinical Reasoning for Gastroenterology

Wenting Chen^1,* Shengyuan Liu^2,* Boyun Zheng² Jipeng Zhang³ Wenxuan Wang³ Dejun Fan⁴ Raymond Shing Yan Tang⁵ Yuen Tung Lam⁶ Shannon Melissa Chan⁷ Lei Xing¹ Jiancong Hu^4,† Yixuan Yuan^2,†

¹ Department of Radiation Oncology, Stanford University, CA, USA
² Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
³ Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
⁴ The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
⁵ Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong SAR, China
⁶ The Nethersole School of Nursing, The Chinese University of Hong Kong, Hong Kong SAR, China
⁷ Department of Surgery, The Chinese University of Hong Kong, Hong Kong SAR, China
^* These authors contributed equally.
^† Correspondence to Yixuan Yuan and Jiancong Hu.

📄 Introduction

Gastrointestinal diseases affect 2.86 billion people globally, with capsule endoscopy (CE) providing crucial diagnostics but requiring manual review of over 60,000 frames per examination, a process associated with 17.4% disease miss rates. While artificial intelligence shows promise for CE analysis, existing endoscopic vision-language models (VLMs) lack multi-video understanding capability and cannot replicate the systematic multi-evidence reasoning that gastroenterologists integrate findings across anatomical regions to synthesize cohesive diagnoses. Here we introduce CE-R1, an adaptive foundation model with evidence-based clinical reasoning capabilities specifically designed for gastroenterology. CE-R1 incorporates a dynamic router that assesses query complexity and selectively routes cases to either a lightweight model for straightforward questions or a deep reasoning model that generates transparent, step-by-step diagnostic thought processes. To enable this capability, we construct CE-Bench, the first large-scale multimodal CE dataset comprising 502,066 visual question-answering pairs with chain-of-thought reasoning annotations, spanning 70 fine-grained clinical sub-tasks across five core diagnostic categories: anatomy identification, endoscopic findings recognition, disease diagnosis, treatment planning, and medical report generation. Comprehensive evaluation on both in-distribution and out-of-distribution datasets from four independent hospitals demonstrates that CE-R1 achieves 86.7% overall accuracy, substantially outperforming state-of-the-art VLMs (best baseline: 24.6%) and surpassing average physician performance (39.9%) by 21.1%. CE-R1 maintains superior generalization across external validation sets (65.1–81.9% accuracy). Critically, the multi-evidence clinical reasoning capability delivers substantial performance gains in complex diagnostic tasks: CE-R1 surpasses the model without reasoning by 8.5% in disease diagnosis, demonstrating the clinical value of transparent, step-by-step diagnostic processes. These results establish CE-R1 as a robust foundation model for comprehensive CE analysis with immediate applications in clinical decision support and medical education.

⚙️ Setup

Environment Setup

Install the requirements:

conda env create -f environment.yml
pip install git+https://github.com/huggingface/transformers.git@v4.49.0
pip install -e .
pip install -e ".[torch,metrics]"

Download Dataset and Models

Please download the public datasets and our pre-trained models from this repository (https://huggingface.co/datasets/Valentina007/CE_R1_data/).

Please make sure this folder (CE_R1_data) is under the same directory of current folder (CE_R1)

hf download Valentina007/CE_R1_data

Directory structure of this folder (CE_R1_data).

./anno and ./data include the part of data in CE-Bench. These folders include the public datasets, including kid-v1, kid-v2, and kvasir-capsule datasets.

./models includes the pre-trained models of CE-R1.

├── anno
│   ├── kid-v1-image_test.json
│   ├── kid-v2-image_test.json
│   ├── kvasir-capsule-image_test.json
│   └── kvasir-capsule-videoclip_test.json
├── data
│   ├── kid-dataset-1
│   ├── kid-dataset-2
│   ├── kvasir-capsule-labelled_images
│   └── video_clips_v1
└── models
├── deep
├── lite
└── router_models

🚀 Inference

INPUT_PATH_IMG="/path/to/input_image.png"
QUESTION_IMG="You question can be put here."

python test_single.py --path "$INPUT_PATH_IMG" --question "$QUESTION_IMG"

All the results will be saved at: ./results/model_output

In ./results/model_output/final_results.json, you can get the output as follows:

{
  "input_path": "/path/to/input_image.png",
  "question": "You question",
  "probability": 0.3223,
  "model_version": "lite",
  "model_type": "lite",
  "media_type": "image",
  "generated_response": "Final output from CE-R1"
}

This result mentions the input of the CE-R1, the probability from router, model_type we used, and output of the CE-R1. When the probability from router is larger than 0.5, we use the CE-R1-Deep. Otherwise, we use CE-R1-Lite.

Quick Start

Here, we provide an example about the WCE image or video as input.

sh ./lanuch/test_img_single.sh

🎈 Acknowledgements

LLaMA-Factory

Multimodal-BERT-in-Medical-Image-and-Text-Classification

📮 Contact

Please contact me if you have any question (wentchen AT stanford dot edu)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data		data
docker		docker
evaluation		evaluation
examples		examples
imgs		imgs
lanuch		lanuch
outputs		outputs
results/model_output		results/model_output
router		router
scripts		scripts
src		src
test_data		test_data
tests		tests
tmp_data		tmp_data
vllm		vllm
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
get_prob_test.py		get_prob_test.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
test_single.py		test_single.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Adaptive Foundation Model with Evidence-based Clinical Reasoning for Gastroenterology

📄 Introduction

⚙️ Setup

Environment Setup

Download Dataset and Models

🚀 Inference

Quick Start

🎈 Acknowledgements

📮 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

An Adaptive Foundation Model with Evidence-based Clinical Reasoning for Gastroenterology

📄 Introduction

⚙️ Setup

Environment Setup

Download Dataset and Models

🚀 Inference

Quick Start

🎈 Acknowledgements

📮 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages