DocuRAG Agent

A modern, production-ready document-based RAG (Retrieval-Augmented Generation) system for intelligent question answering. Built with Python 3.9+ and modern tooling.

✨ Features

🔍 Advanced Retrieval: Multiple retrieval strategies including dense, sparse, and hybrid search
🧠 Intelligent Reranking: Cross-encoder models for improved relevance
🔄 Query Rewriting: HyDE (Hypothetical Document Embeddings) for better retrieval
📊 Comprehensive Evaluation: Built-in metrics for retrieval and QA performance
🚀 Production Ready: FastAPI backend with Gradio UI
🛠️ Modern Tooling: Uses uv, ruff, black, and mypy for development

🚀 Quick Start

Prerequisites

Python 3.9 or higher
uv (recommended) or pip

Installation

Using uv (recommended)

# Clone the repository
git clone https://github.com/rucwhx/docurag-agent.git
cd docurag-agent

# Install with uv
uv pip install -e .

# For development
uv pip install -e ".[dev]"

# For GPU support
uv pip install -e ".[gpu]"

Using pip

git clone https://github.com/rucwhx/docurag-agent.git
cd docurag-agent

pip install -e .

# For development
pip install -e ".[dev]"

Running the Application

Web UI (Gradio)

# Using the installed script
docurag-ui

# Or directly
python -m docurag.app.ui

API Server (FastAPI)

# Using the installed script
docurag-serve

# Or directly
python -m docurag.app.api

Basic Usage Example

from docurag import make_chunks, build_index, retriever, generator

# Process documents
chunks = make_chunks("path/to/documents.csv")

# Build search index
index = build_index(chunks)

# Retrieve relevant documents
results = retriever.search("What is machine learning?", index)

# Generate answer
answer = generator.generate(query="What is machine learning?", context=results)

📁 Project Structure

docurag-agent/
├── src/                    # Source code
│   ├── chunk/             # Document chunking
│   ├── embed/             # Embedding & indexing
│   ├── retrieve/          # Document retrieval
│   ├── rerank/            # Result reranking
│   ├── generate/          # Answer generation
│   ├── query_rewrite/     # Query enhancement
│   └── eval/              # Evaluation metrics
├── app/                   # Web applications
│   ├── api.py            # FastAPI backend
│   └── ui.py             # Gradio frontend
├── data/                  # Sample data
├── scripts/               # Utility scripts
├── tests/                 # Test suite
├── pyproject.toml         # Project configuration
└── README.md             # This file

🛠️ Development

Setup Development Environment

# Clone and install for development
git clone https://github.com/rucwhx/docurag-agent.git
cd docurag-agent
uv pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Code Quality Tools

We use modern Python tooling for code quality:

# Format code
black src/ app/ tests/
ruff format src/ app/ tests/

# Lint code
ruff check src/ app/ tests/

# Type checking
mypy src/ app/

# Run tests
pytest

# Run tests with coverage
pytest --cov=src --cov-report=html

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src

# Run specific test file
pytest tests/test_retrieval.py

# Run with verbose output
pytest -v

📊 Evaluation

The system includes comprehensive evaluation metrics:

# Run retrieval evaluation
python -m docurag.eval.retrieval_eval

# Run QA evaluation
python -m docurag.eval.qa_eval

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests and linting
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with Sentence Transformers
Uses FAISS for efficient similarity search
Powered by Transformers
UI built with Gradio
API built with FastAPI

📈 Roadmap

Support for more document formats (PDF, DOCX, etc.)
Advanced query expansion techniques
Multi-language support
Vector database integrations (Pinecone, Weaviate, etc.)
Cloud deployment guides
Performance optimizations

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
app		app
data		data
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.toml		uv.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuRAG Agent

✨ Features

🚀 Quick Start

Prerequisites

Installation

Using uv (recommended)

Using pip

Running the Application

Web UI (Gradio)

API Server (FastAPI)

Basic Usage Example

📁 Project Structure

🛠️ Development

Setup Development Environment

Code Quality Tools

Running Tests

📊 Evaluation

🤝 Contributing

📄 License

🙏 Acknowledgments

📈 Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocuRAG Agent

✨ Features

🚀 Quick Start

Prerequisites

Installation

Using uv (recommended)

Using pip

Running the Application

Web UI (Gradio)

API Server (FastAPI)

Basic Usage Example

📁 Project Structure

🛠️ Development

Setup Development Environment

Code Quality Tools

Running Tests

📊 Evaluation

🤝 Contributing

📄 License

🙏 Acknowledgments

📈 Roadmap

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages