multi-agent-rag is a modular, extensible framework for building advanced Retrieval-Augmented Generation (RAG) systems powered by multiple collaborative AI agents. By combining modern LLM orchestration with efficient vector search and local model serving, it enables scalable, intelligent workflows that seamlessly retrieve, reason, and generate high-quality responses.
The repository leverages a powerful, modern AI stack:
-
LlamaIndex– For structured data ingestion and advanced retrieval pipelines. -
ChromaDB– As a lightweight, high-performance vector database. -
Ollama– For running local LLMs dedicated to planning, reasoning, and criticizing. -
CrossEncoder– To meticulously score and re-rank retrieved text chunks. -
Pdfplumber– For advanced, high-fidelity text and table extraction from PDFs. -
Typer– For building intuitive, production-ready CLI interfaces. -
Rich– For beautiful, enhanced terminal output and easier debugging.
-
Multi-Agent Collaboration: Employs specialized, collaborating agents to handle complex retrieval and reasoning tasks.
-
Modular & Extensible: Built from the ground up to let you easily swap models, vector databases, and orchestration tools.
-
Local & Scalable: Integrated with local model serving and optimized vector search for secure, high-performance, and cost-effective deployment.
multi-agent-rag/
│
├── main.py
├── ingest.py
├── query.py
├── settings.py
├── embedding.py
├── llm.py
├── skill_registry.py
├── server.js
├── package.json
├── package-lock.json
├── public/
│ ├── index.html
│ ├── tifo.txt
├── agents/
│ ├── router.py
│ ├── retriever.py
│ ├── reranker.py
│ ├── reasoner.py
│ ├── planner.py
├── rag/
│ ├── index.py
│ ├── loader.py
├── vector_database/
├── files/
├── skills/
│ ├── rag_context_critic.md
│ ├── rag_context_qa.md
├── LICENSE
├── .gitignore
├── requirements.txt
└── README.md
Query Input
↓
Retriever.retrieve() ---- (Vector, Keyword, or Hybrid)
↓
Top Vector Chunks
↓
Deduplication ---- (Removes redundant context early)
↓
Cross-Encoder Reranker ---- (Computes high-quality relevance)
↓
Top Reranked Chunks
↓
Prompt Construction ---- (Context formatting & system prompts)
↓
LLM Grounded Generation
↓
Optional Critic
↓
Answer & Citations
+----------+----------------+---------+-------------------------------------+-----------------------------------+
| AGENT | MODEL | SIZE | STRENGTHS | WEAKNESSES |
+----------+----------------+---------+-------------------------------------+-----------------------------------+
| Planner | Llama 3 (8B) | 4.7 GB | - Excellent instruction following | - Can be verbose |
| | | | - Great at JSON/structured output | - Adds unnecessary conversational |
| | | | - Strong decomposition skills | pre-amble |
+----------+----------------+---------+-------------------------------------+-----------------------------------+
| Reasoner | Mistral (7B) | 4.1 GB | - Dense and efficient | - Can be overly succinct |
| | | | - High logical "snappiness" | - Might lack nuance in very |
| | | | - Excellent context management | complex logic |
+----------+----------------+---------+-------------------------------------+-----------------------------------+
| Critic | Qwen 3 (8B) | 5.2 GB | - Exceptional fact-checking | - Formatting defaults can vary |
| | | | - High logic performance | - Different prompt sensitivity |
| | | | - Diverse training data perspective | than Llama/Mistral |
+----------+----------------+---------+-------------------------------------+-----------------------------------+
-
Clone the repository:
git clone https://github.com/FlyingMatrix/multi-agent-rag.git cd ./multi-agent-rag -
Create and activate a virtual environment (optional):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Pull Llama3 (8B), Mistral (7B), and Qwen 3 (8B) models locally:
ollama pull llama3 ollama pull mistral ollama pull qwen3:8b ollama list
-
Copy the target documents to be ingested into the designated folders below:
./files/pdf ./files/markdown
-
Ingest documents from the files folder and its subfolders to generate the vector database:
python main.py ingest ./files/pdfs
-
Query the multi-agent RAG system in command to generate answers:
python main.py query "<your query>" -
Alternatively, access the multi-agent RAG system via a web-based UI:
node server.js



