A modern, production-ready document-based RAG (Retrieval-Augmented Generation) system for intelligent question answering. Built with Python 3.9+ and modern tooling.
- 🔍 Advanced Retrieval: Multiple retrieval strategies including dense, sparse, and hybrid search
- 🧠 Intelligent Reranking: Cross-encoder models for improved relevance
- 🔄 Query Rewriting: HyDE (Hypothetical Document Embeddings) for better retrieval
- 📊 Comprehensive Evaluation: Built-in metrics for retrieval and QA performance
- 🚀 Production Ready: FastAPI backend with Gradio UI
- 🛠️ Modern Tooling: Uses
uv,ruff,black, andmypyfor development
- Python 3.9 or higher
- uv (recommended) or pip
# Clone the repository
git clone https://github.com/rucwhx/docurag-agent.git
cd docurag-agent
# Install with uv
uv pip install -e .
# For development
uv pip install -e ".[dev]"
# For GPU support
uv pip install -e ".[gpu]"git clone https://github.com/rucwhx/docurag-agent.git
cd docurag-agent
pip install -e .
# For development
pip install -e ".[dev]"# Using the installed script
docurag-ui
# Or directly
python -m docurag.app.ui# Using the installed script
docurag-serve
# Or directly
python -m docurag.app.apifrom docurag import make_chunks, build_index, retriever, generator
# Process documents
chunks = make_chunks("path/to/documents.csv")
# Build search index
index = build_index(chunks)
# Retrieve relevant documents
results = retriever.search("What is machine learning?", index)
# Generate answer
answer = generator.generate(query="What is machine learning?", context=results)docurag-agent/
├── src/ # Source code
│ ├── chunk/ # Document chunking
│ ├── embed/ # Embedding & indexing
│ ├── retrieve/ # Document retrieval
│ ├── rerank/ # Result reranking
│ ├── generate/ # Answer generation
│ ├── query_rewrite/ # Query enhancement
│ └── eval/ # Evaluation metrics
├── app/ # Web applications
│ ├── api.py # FastAPI backend
│ └── ui.py # Gradio frontend
├── data/ # Sample data
├── scripts/ # Utility scripts
├── tests/ # Test suite
├── pyproject.toml # Project configuration
└── README.md # This file
# Clone and install for development
git clone https://github.com/rucwhx/docurag-agent.git
cd docurag-agent
uv pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit installWe use modern Python tooling for code quality:
# Format code
black src/ app/ tests/
ruff format src/ app/ tests/
# Lint code
ruff check src/ app/ tests/
# Type checking
mypy src/ app/
# Run tests
pytest
# Run tests with coverage
pytest --cov=src --cov-report=html# Run all tests
pytest
# Run with coverage
pytest --cov=src
# Run specific test file
pytest tests/test_retrieval.py
# Run with verbose output
pytest -vThe system includes comprehensive evaluation metrics:
# Run retrieval evaluation
python -m docurag.eval.retrieval_eval
# Run QA evaluation
python -m docurag.eval.qa_evalWe welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests and linting
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Sentence Transformers
- Uses FAISS for efficient similarity search
- Powered by Transformers
- UI built with Gradio
- API built with FastAPI
- Support for more document formats (PDF, DOCX, etc.)
- Advanced query expansion techniques
- Multi-language support
- Vector database integrations (Pinecone, Weaviate, etc.)
- Cloud deployment guides
- Performance optimizations