This project focuses on developing a web application that enables private corporations to retrieve documentation. The application allows users to obtain information about variables or techniques used in their technologies through question answering, utilizing Large Language Models (LLMs).
This research was conducted as part of the EU Horizon project SEUS – Smart European Shipbuilding (Grant Agreement No. 101096224), funded by the European Union.
Follow these steps to set up the project locally:
- Clone the repository (You can use HTTPS too):
git clone git@github.com:TurkuNLP/RAG-web-app.git
- Create a virtual environment :
python3 -m venv env
Note: Replace env with your preferred environment name.
- Activate the virtual environment :
source env/bin/activate - Install dependencies :
pip install -r /path/to/requirements.txt
Use run.py to select which Flask app to run with APP_NAME. Optional PORT, HOST, and DEBUG environment variables control runtime.
Supported apps (APP_NAME):
localseusarch-ruarch-ennewslaw
Examples:
APP_NAME=local PORT=5000 python run.py
APP_NAME=law PORT=8080 python run.py
APP_NAME=arch-en python run.pyBuild the image:
docker build -t rag-web-app .Run a selected app:
docker run --rm \
--env-file .env \
-v "$PWD:/app" \
-w /app \
rag-web-app \
python python_script/populate_database.py --config localdocker run --rm \
--env-file .env \
-v "$PWD/data:/app/data" \
-p 8000:8000 \
-e APP_NAME=local \
-e PORT=8000 \
rag-web-appUsing Docker Compose:
docker compose up --buildChange the app by editing APP_NAME in docker-compose.yml or passing -e APP_NAME=law to docker run.
If you use Conda, create an environment and install Python dependencies with pip:
conda create -n rag-web-app python=3.11
conda activate rag-web-app
pip install -r requirements.txtconfig.json stores configuration settings for different environments or use cases. Each configuration specifies:
data_path: Path to the data folder.chroma_root_path: Path where the Chroma database will be stored.embedding_model: Name of the model used for embeddings.llm_model: Name of the language model to be used.
API keys are loaded from a .env file via python_script/parameters.py. Common keys:
OPENAI_API_KEY: required when using OpenAI models.HF_API_TOKEN: required when using Hugging Face hosted models.VOYAGE_API_KEY: required when using VoyageAI models (if enabled).
Runtime settings:
APP_NAME: which app to run (see Supported apps above).HOST: bind address (default0.0.0.0).PORT: port to listen on (defaults to the app’s configured port).DEBUG: set totrue/1to enable Flask debug (local dev only).
Four files in python_script/ are related to database setup and management:
parameters.py: loads parameters fromconfig.json.get_embedding_function.py: loads the embedding model.populate_database.py: create, reset, or clear the database.
The main() function is the CLI entry point for database management.
Arguments:
--config(str): configuration name inconfig.json.--reset: clear the database subfolder for the config, then repopulate.--clear: clear the database (optionally scoped to the config).
Functionality:
- Populate:
--configonly loads documents, splits them, and adds them to Chroma. - Reset:
--config --resetclears the config’s subfolder, then repopulates. - Clear:
--clearremoves data; if--configis provided, it targets the config’s subfolder (based onEMBEDDING_MODEL).
Example usage:
python populate_database.py --config CONFIG_NAME
python populate_database.py --config CONFIG_NAME --reset
python populate_database.py --config CONFIG_NAME --clear