[Feature] Proposal for Louis chatbot - expand functionality from textbook to all course content

## 1. Issue

We currently have a SCIP textbook bot that answers student questions about textbook pages only. However we would like to increase its functionality to questions about tutorial, recitations, PYPs and lecture content as well. We would like to propose a method to store these content and use RAG to respond.

## 2. Summary

We keep a map of every document we have (course, year, description), initial prompt to let the AI pick which documents are relevant, fetch those documents, and then make a 2nd prompt to chatgpt.

**Scope**

In: Past-year exams, lecture slides, tutorial sheets, recitation sheets. Web chat interface.

## 3. How It Works

The whole flow is two LLM calls with a simple backend fetch in between.

**Step 1 — The AI picks the documents**

When a student asks a question, our Elixir backend prepends the document map to the prompt. This map is a (JSON) list of every document we have: its title, course, year, type (exam/lecture/etc.), a one-line description, and an S3 link. The AI returns a JSON array of links to the documents it thinks are relevant.

**Step 2 — We fetch the documents**

The backend takes that JSON array of S3 links and fetches the actual document content. We extract the files and bundle it up for the next step.

**Step 3 — The AI answers the question**

We send a second prompt to GPT-5 containing the original student question plus the selected documents. The prompt tells the model to answer using only the provided materials and to cite its sources (document name, year, page). If the documents don't contain enough information, it says so instead of making something up.

## 4. What We Store

### S3 Buckets

We will make one new S3 bucket/ route to the existing one:

- **Content** – the uploaded files (PDF, PPTX, DOCX) for download/reference.

### JSON/ Postgres (document map)

One table that serves as the "index" the AI reads in Step 1. This gives the AI enough context to return a list of documents we need to fetch.

*Formatted as PostgreSQL columns but may change to JSON stored in backend*

| Column | Type | What it's for |
|--------|------|---------------|
| id | UUID | Primary key. |
| title | TEXT | Human-readable name (e.g. "CS1101S Final 2023"). |
| description | TEXT | One-line summary the AI reads to decide relevance. |
| doc_type | TEXT | 'exam', 'lecture', 'tutorial', or 'recitation'. |
| year | INT | Academic year. |
| week | INT | Week number (null for lectures/tutorials). |
| s3_original_url | TEXT | Link to the original file in S3. |

At query time, we load this table into the Step 1 prompt as a JSON array.

## 5. Tech Stack Needed

| Layer | Choice | Notes |
|-------|--------|-------|
| Database | PostgreSQL | Stores the document map/ Store in backend JSON file |
| Storage | AWS S3 (3 buckets) | Originals, extracted text, and assets. Accessed via ex_aws. |
| Routing LLM | GPT-5 | Cheap and fast for reading the map and returning a JSON array of links. |
| Generation LLM | GPT-5 | Handles the actual answer generation with full document context. |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Proposal for Louis chatbot - expand functionality from textbook to all course content #1344

1. Issue

2. Summary

3. How It Works

4. What We Store

S3 Buckets

JSON/ Postgres (document map)

5. Tech Stack Needed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Column	Type	What it's for
id	UUID	Primary key.
title	TEXT	Human-readable name (e.g. "CS1101S Final 2023").
description	TEXT	One-line summary the AI reads to decide relevance.
doc_type	TEXT	'exam', 'lecture', 'tutorial', or 'recitation'.
year	INT	Academic year.
week	INT	Week number (null for lectures/tutorials).
s3_original_url	TEXT	Link to the original file in S3.

Layer	Choice	Notes
Database	PostgreSQL	Stores the document map/ Store in backend JSON file
Storage	AWS S3 (3 buckets)	Originals, extracted text, and assets. Accessed via ex_aws.
Routing LLM	GPT-5	Cheap and fast for reading the map and returning a JSON array of links.
Generation LLM	GPT-5	Handles the actual answer generation with full document context.

[Feature] Proposal for Louis chatbot - expand functionality from textbook to all course content #1344

Description

1. Issue

2. Summary

3. How It Works

4. What We Store

S3 Buckets

JSON/ Postgres (document map)

5. Tech Stack Needed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions