Skip to content

FlyingMatrix/multi-agent-rag

Repository files navigation

Multi-agent RAG

License Multi-agent RAG

🎯 About This Repo

multi-agent-rag is a modular, extensible framework for building advanced Retrieval-Augmented Generation (RAG) systems powered by multiple collaborative AI agents. By combining modern LLM orchestration with efficient vector search and local model serving, it enables scalable, intelligent workflows that seamlessly retrieve, reason, and generate high-quality responses.

🛠️ Built With

The repository leverages a powerful, modern AI stack:

  • LlamaIndex – For structured data ingestion and advanced retrieval pipelines.

  • ChromaDB – As a lightweight, high-performance vector database.

  • Ollama – For running local LLMs dedicated to planning, reasoning, and criticizing.

  • CrossEncoder – To meticulously score and re-rank retrieved text chunks.

  • Pdfplumber – For advanced, high-fidelity text and table extraction from PDFs.

  • Typer – For building intuitive, production-ready CLI interfaces.

  • Rich – For beautiful, enhanced terminal output and easier debugging.

🚀 Key Features

  • Multi-Agent Collaboration: Employs specialized, collaborating agents to handle complex retrieval and reasoning tasks.

  • Modular & Extensible: Built from the ground up to let you easily swap models, vector databases, and orchestration tools.

  • Local & Scalable: Integrated with local model serving and optimized vector search for secure, high-performance, and cost-effective deployment.

🧩 Repository Structure

multi-agent-rag/
│
├── main.py
├── ingest.py
├── query.py
├── settings.py
├── embedding.py
├── llm.py
├── skill_registry.py
├── server.js
├── package.json
├── package-lock.json
├── public/
│   ├── index.html
│   ├── tifo.txt
├── agents/
│   ├── router.py
│   ├── retriever.py
│   ├── reranker.py
│   ├── reasoner.py
│   ├── planner.py
├── rag/
│   ├── index.py
│   ├── loader.py
├── vector_database/
├── files/
├── skills/
│   ├── rag_context_critic.md
│   ├── rag_context_qa.md
├── LICENSE
├── .gitignore
├── requirements.txt
└── README.md

🔮 Multi-agent RAG Pipeline

    Query Input
            ↓
    Retriever.retrieve()    ----    (Vector, Keyword, or Hybrid)
            ↓
    Top Vector Chunks
            ↓
    Deduplication           ----    (Removes redundant context early)
            ↓
    Cross-Encoder Reranker  ----    (Computes high-quality relevance)
            ↓
    Top Reranked Chunks
            ↓
    Prompt Construction     ----    (Context formatting & system prompts)
            ↓
    LLM Grounded Generation
            ↓
    Optional Critic
            ↓
    Answer & Citations

🔭 Agent Model Configuration

+----------+----------------+---------+-------------------------------------+-----------------------------------+
|  AGENT   |  MODEL         |  SIZE   |  STRENGTHS                          |  WEAKNESSES                       |
+----------+----------------+---------+-------------------------------------+-----------------------------------+
| Planner  | Llama 3 (8B)   | 4.7 GB  | - Excellent instruction following   | - Can be verbose                  |
|          |                |         | - Great at JSON/structured output   | - Adds unnecessary conversational |
|          |                |         | - Strong decomposition skills       |   pre-amble                       |
+----------+----------------+---------+-------------------------------------+-----------------------------------+
| Reasoner | Mistral (7B)   | 4.1 GB  | - Dense and efficient               | - Can be overly succinct          |
|          |                |         | - High logical "snappiness"         | - Might lack nuance in very       |
|          |                |         | - Excellent context management      |   complex logic                   |
+----------+----------------+---------+-------------------------------------+-----------------------------------+
| Critic   | Qwen 3 (8B)    | 5.2 GB  | - Exceptional fact-checking         | - Formatting defaults can vary    |
|          |                |         | - High logic performance            | - Different prompt sensitivity    |
|          |                |         | - Diverse training data perspective |   than Llama/Mistral              |
+----------+----------------+---------+-------------------------------------+-----------------------------------+

💻 Installation and Usage

  1. Clone the repository:

    git clone https://github.com/FlyingMatrix/multi-agent-rag.git
    cd ./multi-agent-rag
  2. Create and activate a virtual environment (optional):

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install the required dependencies:

    pip install -r requirements.txt
  4. Pull Llama3 (8B), Mistral (7B), and Qwen 3 (8B) models locally:

    ollama pull llama3
    ollama pull mistral
    ollama pull qwen3:8b
    ollama list
  5. Copy the target documents to be ingested into the designated folders below:

    ./files/pdf
    ./files/markdown
  6. Ingest documents from the files folder and its subfolders to generate the vector database:

    python main.py ingest ./files/pdfs
  7. Query the multi-agent RAG system in command to generate answers:

    python main.py query "<your query>"

    Multi-agent RAG

  8. Alternatively, access the multi-agent RAG system via a web-based UI:

    node server.js

    Multi-agent RAG Multi-agent RAG

About

A modular multi-agent RAG framework that combines LLM orchestration, vector search, and local model execution to enable scalable, collaborative AI workflows

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors