LLM Chatbot

Overview

A Retrieval-Augmented Generation (RAG) system for scraping website data, embedding text, and answering questions via LLM

How to install dependencies

Declare any dependencies in requirements.txt and pyproject.toml for pip installation.

clone the repository

git clone https://github.com/DeliciousBoy/llm-chatbot-backend.git
cd llm-chatbot-backend

Installing `uv`

this project uses uv to manage virtual environments and dependencies for different Python versions. You can install uv run:

curl -Ls https://astral.sh/uv/install.sh | sh

Or follow the instructions from the official GitHub repository: https://github.com/astral-sh/uv Once installed, you can set up the environment with:

Install with `uv` (Recommended) `This project requires Python 3.11.11`

uv venv
source .venv/bin/activate # Or .venv/Scripts/activate for Windows
uv pip install -r requirements.txt
uv pip install -e .[dev, docs]

If you prefer not to use uv, you can fall back to pip (see below).

Install with `pip` (Not recommended)

This is not recommended as it may lead to dependency conflicts, especially if you are using different Python versions.

python -m venv .venv
source .venv/bin/activate # Or .venv/Scripts/activate for Windows
pip install -r requirements.txt
pip install -e .[dev,docs]

How to run Kedro pipeline

This project uses Kedro to organize data workflows into modular pipelines.

Avaliable pipelines

Pipeline Name	Description
`data_processing`	Cleans and embeds text data into vectors
`web_scraping`	Asynchronously scrapes web content and stores it as raw data

Each pipeline is defined in src/llm_chatbot_backend/pipelines/ and can be run individually or as a group. You can also run specific nodes within a pipeline.

kedro run # Run all pipelines
kedro run --pipeline=web_scraping # Run web scraping pipeline
kedro run --pipeline=data_processing # Run data processing pipeline

Visualize Kedro pipeline

You can visualize the pipeline using Kedro's built-in visualization tool. This will generate a graph of the pipeline nodes and their dependencies.

 kedro viz run --autoreload

Running Scheduled Jobs

This project includes a scheduler using APScheduler to automate periodic tasks such as scraping data, generating embeddings, or updating indexes.

To start the scheduler, run:

python scheduler.py

How to test your Kedro project

this project uses pytest to run test cases. You can run your tests with:

pytest

How to run chat interface

This project includes a Streamlit app for interacting with the chatbot. You can run the app with:

streamlit run main.py

To run the app locally, make sure the virtual environment is activated and dependencies are installed

Proejct Structure

This project follows the Kedro project layout with additional components for web scraping, vector embeddings, and an LLM chatbot interface via Streamlit.

📁llm-chatbot-backend/
├── 📁conf/ # Kedro configuration files
│ └── 📁base/
│   └──📄catalog.yml # Dataset definitions (inputs/outputs for pipelines)
│   └──📄parameters.yml # Project-level parameters for nodes/pipelines
├── 📁data/ # raw/cleaned/embedded/chromadb
├── 📁src/ # Source code (Kedro pipelines, modules)
│ └── 📁llm_chatbot_backend/
│   └── 📁datasets/ # Custom Kedro dataset classes
│   |   └── 📄utf8_json.py # Custom JSON
│   └── 📁pipelines/ # All Kedro pipelines
│       └── 📁data_processing/
│       |  └──📄nodes.py  # Data cleaning / embedding logic
│       |  └──📄pipeline.py # Defines the data_processing pipeline
│       └── 📁web_scraping/
│          └──📄nodes.py # Async scraping logic
│          └──📄pipeline.py # Defines the web_scraping pipeline
├── 📁tests/ # Pytest test cases
│   └── 📁pipelines/
│       └── 📁data_processing/
│       |   └──📄test_pipeline.py
│       └── 📁web_scraping/
|           └──📄test_pipeline.py
├──📄main.py # Streamlit chat interface\
├──📄scheduler.py # Automate Web Scraping Task
├──📄pyproject.toml # Project config & dependencies
├──📄requirements.txt # Pip requirements
├──📄uv.lock # uv dependency lockfile
└──📄.env # Environment variables

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Chatbot

Overview

How to install dependencies

clone the repository

Installing `uv`

Install with `uv` (Recommended) `This project requires Python 3.11.11`

Install with `pip` (Not recommended)

How to run Kedro pipeline

Avaliable pipelines

Visualize Kedro pipeline

Running Scheduled Jobs

How to test your Kedro project

How to run chat interface

Proejct Structure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
conf		conf
data		data
docs/source		docs/source
notebooks		notebooks
src/llm_chatbot_backend		src/llm_chatbot_backend
tests		tests
.env-sample		.env-sample
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
scheduler.py		scheduler.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

LLM Chatbot

Overview

How to install dependencies

clone the repository

Installing uv

Install with uv (Recommended) This project requires Python 3.11.11

Install with pip (Not recommended)

How to run Kedro pipeline

Avaliable pipelines

Visualize Kedro pipeline

Running Scheduled Jobs

How to test your Kedro project

How to run chat interface

Proejct Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Installing `uv`

Install with `uv` (Recommended) `This project requires Python 3.11.11`

Install with `pip` (Not recommended)

Packages