Skip to content

ashwin2912/med-compass

Repository files navigation

Clinical NLP Pipeline

A natural language processing pipeline for clinical text analysis and healthcare data processing.

Features

  • Clinical text preprocessing and validation
  • Named Entity Recognition (NER) with hybrid models
  • Topic modeling and clustering
  • Medical concept mapping
  • Risk scoring algorithms
  • Negation detection
  • Text summarization
  • CPT code extraction and mapping
  • Interactive Streamlit web interface
  • Docker containerization support

Installation

poetry install

Usage

Run the Streamlit app:

streamlit run streamlit_app.py

Or use Docker:

docker-compose up

Data Requirements

This project requires clinical text data for NLP processing. See data/README.md for detailed instructions.

Quick Start with Sample Data

Sample data is provided in data/sample/ for immediate testing.

Production Data

For production use, obtain access to the MIMIC-III Clinical Database:

  1. Request access at https://physionet.org/content/mimiciii/
  2. Complete required training and sign data use agreement
  3. Download and place data in data/mimic-iii-clinical-database-demo-1.4/

Important: Never commit real clinical data. All patient data should remain local only.

Development

Run tests:

pytest

Future Scope

  • Apache Airflow orchestration for batch processing
  • Real-time streaming with Kafka integration
  • Advanced ML model deployment with MLflow
  • FHIR data integration
  • Multi-language support for clinical texts
  • Advanced privacy-preserving techniques (differential privacy)
  • Integration with EHR systems
  • Scalable distributed processing with Spark
  • Model monitoring and drift detection

Requirements

  • Python 3.10+
  • Poetry
  • Docker (optional)

About

A natural language processing pipeline for clinical text analysis and healthcare data processing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages