Skip to content

[Feature] Proposal for Louis chatbot - expand functionality from textbook to all course content #1344

@yiilinzhang

Description

@yiilinzhang

1. Issue

We currently have a SCIP textbook bot that answers student questions about textbook pages only. However we would like to increase its functionality to questions about tutorial, recitations, PYPs and lecture content as well. We would like to propose a method to store these content and use RAG to respond.

2. Summary

We keep a map of every document we have (course, year, description), initial prompt to let the AI pick which documents are relevant, fetch those documents, and then make a 2nd prompt to chatgpt.

Scope

In: Past-year exams, lecture slides, tutorial sheets, recitation sheets. Web chat interface.

3. How It Works

The whole flow is two LLM calls with a simple backend fetch in between.

Step 1 — The AI picks the documents

When a student asks a question, our Elixir backend prepends the document map to the prompt. This map is a (JSON) list of every document we have: its title, course, year, type (exam/lecture/etc.), a one-line description, and an S3 link. The AI returns a JSON array of links to the documents it thinks are relevant.

Step 2 — We fetch the documents

The backend takes that JSON array of S3 links and fetches the actual document content. We extract the files and bundle it up for the next step.

Step 3 — The AI answers the question

We send a second prompt to GPT-5 containing the original student question plus the selected documents. The prompt tells the model to answer using only the provided materials and to cite its sources (document name, year, page). If the documents don't contain enough information, it says so instead of making something up.

4. What We Store

S3 Buckets

We will make one new S3 bucket/ route to the existing one:

  • Content – the uploaded files (PDF, PPTX, DOCX) for download/reference.

JSON/ Postgres (document map)

One table that serves as the "index" the AI reads in Step 1. This gives the AI enough context to return a list of documents we need to fetch.

Formatted as PostgreSQL columns but may change to JSON stored in backend

Column Type What it's for
id UUID Primary key.
title TEXT Human-readable name (e.g. "CS1101S Final 2023").
description TEXT One-line summary the AI reads to decide relevance.
doc_type TEXT 'exam', 'lecture', 'tutorial', or 'recitation'.
year INT Academic year.
week INT Week number (null for lectures/tutorials).
s3_original_url TEXT Link to the original file in S3.

At query time, we load this table into the Step 1 prompt as a JSON array.

5. Tech Stack Needed

Layer Choice Notes
Database PostgreSQL Stores the document map/ Store in backend JSON file
Storage AWS S3 (3 buckets) Originals, extracted text, and assets. Accessed via ex_aws.
Routing LLM GPT-5 Cheap and fast for reading the map and returning a JSON array of links.
Generation LLM GPT-5 Handles the actual answer generation with full document context.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNew feature or requestProposalProposing a feature, please discuss

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions