RAGRoute

This repository contains the code for the paper "Efficient Federated Search for Retrieval-Augmented Generation using Lightweight Routing". RAGRoute enables intelligent routing across federated data sources to improve retrieval-augmented generation (RAG) performance.

System Architecture

RAGRoute system architecture.

The system processes a query by routing it to relevant data sources, retrieving documents in parallel, and using them to generate an answer with an LLM.

RAGRoute consists of the following components, each implemented as a separate process.

Coordinator and HTTP Server

At the core of RAGRoute is a main process containing a coordinator and an HTTP server.

The HTTP server receives incoming requests from users and returns responses once the query is processed.
When a request is received, the query is forwarded to the coordinator.
The coordinator manages communication between the different components.

Router

The coordinator first forwards the query to the routing process (step 3).

This process has the relevant embedding models loaded in memory and hosts the RAGRoute router model.
After embedding generation, these embeddings are forwarded to the router model (step 4).
The router outputs a list of relevant data sources.
The identifiers of these data sources and the embeddings are returned to the coordinator (step 5).

Data Sources

Next, the coordinator sends the compatible embedding to each of the selected data sources in parallel.

Each data source retrieves the top-k_ret relevant documents.
These documents are returned to the coordinator.

After receiving all responses:

The coordinator reranks and filters the documents resulting in a final top-k list of relevant document chunks.

LLM Engine

Finally, the coordinator constructs the prompt sent to the LLM engine.

The prompt contains:
- The user query.
- The retrieved documents.
The LLM returns a response to the coordinator (step 9).
The coordinator sends the final reply back to the user (step 10 and 11).

Project Structure

main.py: Launches the RAGRoute server and router logic.
run_benchmark.py: Sends benchmark queries asynchronously to evaluate the system.
ragroute/: Core logic including routing, HTTP server, LLM handling, data sources, and configuration.
data/: Benchmark datasets, output files, and logs.

Quickstart

1. Install Dependencies

Make sure you're using Python 3.8+ and run:

pip install -r requirements.txt

Also ensure Ollama is installed and running:

ollama serve

2. Start the RAGRoute Server

In a terminal:

python3 main.py --dataset <dataset> --routing <routing>

Arguments:

--dataset: medrag or feb4rag
--routing: ragroute, random, all, or none

Example:

python3 main.py --dataset feb4rag --routing ragroute

This will:

Launch the HTTP server
Initialize data source clients

Keep this terminal running.

3. Run the Benchmark Script

In a separate terminal, run:

python3 run_benchmark.py --benchmark <benchmark> --routing <routing> --parallel <n>

Arguments:

--benchmark: FeB4RAG or MIRAGE
--routing: Match the routing method from the server
--parallel: Number of parallel queries (default: 1)

Example:

python3 run_benchmark.py --benchmark FeB4RAG --routing ragroute --parallel 1

Benchmark Output

Benchmark results are saved to the data/ folder:

benchmark_<benchmark>_<routing>.csv: Per-query performance metrics
answers_<benchmark>_<routing>.jsonl: Raw LLM responses
ds_stats_<benchmark>_<routing>.csv: Data source latency and message sizes

CLI Reference

`main.py`

--dataset          Dataset to use (medrag or feb4rag)
--routing          Routing strategy (ragroute, random, all, none)
--disable-llm      Skip LLM call (only retrieval)
--simulate         Add artificial delay
--model            LLM model to use (must be in SUPPORTED_MODELS)

`run_benchmark.py`

--benchmark        Benchmark name (FeB4RAG or MIRAGE)
--routing          Routing strategy used
--parallel         Number of concurrent queries to send
--questions        (Optional) Specific question set (e.g., medqa)

Notes

Ollama must be running in the background (ollama serve) before launching the server.
Ensure ports required by the system (e.g., 8000, 5555–5560) are available.

Extending

Add new data sources in ragroute/config.py
Create custom routing logic in ragroute/router/
Add new benchmarks under data/benchmark/
Customize reranking in ragroute/rerank.py

Citation

If you use this code, please cite the associated paper.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
data		data
image		image
ragroute		ragroute
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
feb4rag.py		feb4rag.py
main.py		main.py
med_rag.py		med_rag.py
mmlu.py		mmlu.py
requirements.txt		requirements.txt
run_benchmark.py		run_benchmark.py
run_benchmark_parallel.py		run_benchmark_parallel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAGRoute

System Architecture

Coordinator and HTTP Server

Router

Data Sources

LLM Engine

Project Structure

Quickstart

1. Install Dependencies

2. Start the RAGRoute Server

3. Run the Benchmark Script

Benchmark Output

CLI Reference

`main.py`

`run_benchmark.py`

Notes

Extending

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAGRoute

System Architecture

Coordinator and HTTP Server

Router

Data Sources

LLM Engine

Project Structure

Quickstart

1. Install Dependencies

2. Start the RAGRoute Server

3. Run the Benchmark Script

Benchmark Output

CLI Reference

main.py

run_benchmark.py

Notes

Extending

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`main.py`

`run_benchmark.py`

Packages