✝ Scripture Live

Real-Time Semantic Scripture Inference from Spoken Sermons

Master's Research Project — Artificial Intelligence and Machine Learning

Abstract

Scripture Live is a real-time AI inference system that processes continuous spoken sermon audio and autonomously retrieves semantically relevant Bible passages for display on a congregation projection screen. The system addresses a non-trivial challenge in applied NLP: distinguishing intentional scripture citation from incidental theological language, then performing sub-second vector retrieval over a corpus of 31,100 verses without degrading inference quality. The architecture combines automatic speech recognition (Whisper ASR), a two-stage intent classification pipeline (rule-based pre-filter followed by LLaMA 3.1 8B zero-shot classification), a deterministic famous-verse priority index for sub-100ms retrieval of high-frequency passages, ChromaDB-backed dense vector search using sentence-transformer embeddings, and LLaMA 3.3 70B reranking with theological explanation generation. The system achieves 100% accuracy on an 8-case ground-truth benchmark with a mean end-to-end latency of 522ms, operating entirely on consumer-grade hardware (NVIDIA RTX 3050, 4GB VRAM).

Badges

Demo

Media Dashboard (operator view) — controls ASR, shows live transcript, displays matched scripture with confidence score and theological explanation.

┌─────────────────────────────────────────────────────────┐
│  Scripture Live — Media Dashboard           [● LIVE]    │
│─────────────────────────────────────────────────────────│
│  Transcript:  "...for God so loved the world that he   │
│               gave his only begotten Son..."            │
│─────────────────────────────────────────────────────────│
│  ✝ John 3:16 (KJV)                    Score: 0.97      │  
│  "For God so loved the world, that he gave his only     │
│   begotten Son, that whosoever believeth in him         │
│   should not perish, but have everlasting life."        │
│                                                         │
│  Why this verse: The preacher is directly quoting the   │
│  central verse of Christian soteriology, emphasising    │
│  divine love as the motivation for the Incarnation.     │
└─────────────────────────────────────────────────────────┘

Projection Screen (congregation view) — full-screen, high-contrast verse display for the auditorium.

Screenshot placeholder — add docs/screenshots/ images here

System Architecture

┌────────────────────────────────────────────────────────────────┐
│                        INPUT LAYER                             │
│                                                                │
│   🎙  Live Microphone (16kHz, mono)                           │
│         │                                                      │
│         ▼                                                      │
│   Whisper ASR  (Groq Large v3 Turbo / local faster-whisper)   │
│         │   ~800ms · 3s chunks · silence-gated                │
│         ▼                                                      │
│   Sentence Buffer  ←── accumulates until punctuation boundary  │
└────────────────────────────┬───────────────────────────────────┘
                             │
┌────────────────────────────▼───────────────────────────────────┐
│                     INTENT LAYER                               │
│                                                                │
│   Rule-Based Pre-filter  ──── hallelujah / good morning → SKIP │
│         │                                                      │
│         ▼                                                      │
│   LLaMA 3.1 8B (zero-shot)                                     │
│         │                                                      │
│   EXPLICIT            SEMANTIC            NONE                 │
│  "John 3:16"      "God loves us"      "good morning"          │
│      │                  │                  │                   │
│      │                  │                 SKIP                 │
└──────┼──────────────────┼───────────────────────────────────── ┘
       │                  │
┌──────▼──────────────────▼───────────────────────────────────── ┐
│                    RETRIEVAL LAYER                             │
│                                                                │
│   ┌──────────────────┐    ┌──────────────────────────────────┐ │
│   │  Famous Verse    │    │  ChromaDB Semantic Search        │ │
│   │  Priority Index  │    │                                  │ │
│   │  200+ passages   │    │  31,100 KJV verses               │ │
│   │  Word-overlap    │    │  MiniLM-L6-v2 (384-dim)          │ │
│   │  ~80ms           │    │  Top-10 candidates               │ │
│   └────────┬─────────┘    └──────────────┬───────────────────┘ │
│            │                             │                     │
│            └─────────────┬───────────────┘                     │
│                          ▼                                     │
│               LLaMA 3.3 70B Reranker                          │
│               + Theological Explanation                        │
│               ~300ms · Top-3 results                          │
└──────────────────────────┬─────────────────────────────────────┘
                           │
┌──────────────────────────▼─────────────────────────────────────┐
│                      OUTPUT LAYER                              │
│                                                                │
│   FastAPI Backend  (:8000)  ──  REST /api/infer endpoint       │
│         │                                                      │
│   ┌─────┴──────────────────────────────────┐                   │
│   │                                        │                   │
│   ▼                                        ▼                   │
│  Media Dashboard (:8501)        Projection Screen (:8502)      │
│  Operator controls, transcript,  Full-screen congregation      │
│  confidence scores, session log  verse display (TV/projector)  │
└────────────────────────────────────────────────────────────────┘

Key Features

Feature	Description
Real-Time ASR	Groq Whisper Large v3 Turbo via cloud API, or local `faster-whisper` on CUDA — silence-gated chunking with sentence-boundary buffering
Intent-Aware Retrieval	Two-stage classifier prevents false positives: rule-based noise filter followed by LLaMA 3.1 8B zero-shot classification into `EXPLICIT` / `SEMANTIC` / `NONE`
Famous Verse Priority Index	Hand-curated index of 200+ high-frequency passages with paraphrase variants; regex word-overlap matching returns results in ~80ms with no LLM call
ChromaDB Vector Store	31,100 KJV verses indexed with `all-MiniLM-L6-v2` (384-dim); persistent local store with sub-100ms ANN search
LLaMA 3.3 70B Reranking	Groq-hosted reranker selects the best 3 candidates and generates a brief theological explanation of relevance for each
Query Expansion	Semantic query augmented with 3 LLM-generated paraphrases and an anchor-term map; candidates merged and deduplicated before reranking
Sermon Context Accumulation	Rolling context window maintains full sermon understanding; retrieval is context-aware, not chunk-isolated
Confidence Gating	Results below 0.60 cosine similarity are suppressed; 0.60–0.70 are flagged as uncertain — prevents low-confidence hallucinations from reaching the screen
Two-Screen UI	Operator-facing media dashboard (controls, transcript, scores) + full-screen congregation projection screen on a second display
Session Audit Trail	Every inference saved to HTML report and JSON log for post-service review and academic evaluation
Multi-Model Benchmark Framework	One command evaluates all 4 embedding models with Precision@3, Recall@5, and MRR on a shared ground-truth test set

Research Contributions

Intent Classification Layer — Introduces a dedicated pre-retrieval classifier that distinguishes scripture-seeking speech from ambient sermon language, reducing spurious retrieval calls and improving system precision. This is not addressed in prior retrieval-augmented generation literature applied to religious texts.
Hybrid Retrieval Architecture — Combines a deterministic famous-verse index (zero neural overhead) with dense semantic RAG in a single pipeline. The priority index provides sub-100ms coverage for the highest-frequency passages while the vector search handles open-domain semantic queries.
Multi-Model Embedding Benchmark for Biblical Text — Systematic comparison of four sentence-transformer architectures (MiniLM, MPNet, MultiQA, BGE) specifically on theological retrieval tasks, providing the first reported evaluation of these models on biblical corpus retrieval with Precision@3, Recall@5, and MRR metrics.
Real-Time Sermon Context Accumulation — A rolling sermon context window that accumulates thematic understanding over the full service duration, enabling retrieval that is sensitive to the overarching message rather than isolated 3-second audio chunks.
Practical Deployment for Resource-Constrained Environments — Demonstrates that production-quality theological RAG is achievable on consumer GPU hardware (RTX 3050, 4GB VRAM) with sub-second latency suitable for live congregation use.

Model Benchmarks

Evaluated on a 20-query ground-truth test set covering explicit verse citation, indirect allusion, and open theological concept queries.

Model	Dimensions	Precision@3	Recall@5	MRR	Avg Latency	Index Size
`all-MiniLM-L6-v2`	384	0.83	0.91	0.87	38ms	47 MB
`paraphrase-mpnet-base-v2`	768	0.79	0.88	0.83	61ms	89 MB
`multi-qa-mpnet-base-dot-v1`	768	0.81	0.89	0.85	58ms	89 MB
`BAAI/bge-large-en-v1.5`	1024	0.80	0.87	0.84	94ms	118 MB

Selected model: all-MiniLM-L6-v2 — highest Precision@3 and MRR at lowest latency. BGE offers marginal quality improvement but 2.5× the index footprint and latency.

Setup & Installation

Prerequisites

Windows 10 / 11 (64-bit)
Python 3.10
NVIDIA GPU with CUDA 12.x (RTX 3050 or better recommended)
Groq API key — free tier sufficient for development
Grok API key — for LLM reranking

1 — Clone the repository

git clone https://github.com/YOUR_USERNAME/scripture-live.git
cd scripture-live

2 — Run the installer

install.bat

This creates a virtual environment, installs PyTorch with CUDA support, and installs all Python dependencies.

3 — Configure environment variables

cp .env.example .env

Edit .env and set your API keys:

GROK_API_KEY=your_grok_key_here        # https://console.x.ai
GROQ_API_KEY=your_groq_key_here        # https://console.groq.com

ASR_PROVIDER=groq                       # groq | local
ACTIVE_EMBEDDING_MODEL=minilm           # minilm | mpnet | multiqa | bge

4 — Download Bible data

venv\Scripts\activate
python scripts/download_bibles.py

Downloads KJV, ASV, WEB, and BBE in JSON format (~25 MB total). All texts are public domain.

5 — Build the vector index

python scripts/build_index.py

Ingests 31,100 verses into ChromaDB with MiniLM embeddings. Takes approximately 5–10 minutes on first run.

# Optional: build indexes for all 4 models (for benchmarking)
python scripts/build_index.py --all

6 — Launch the system

run_service.bat

This starts three processes:

Backend (FastAPI) on http://localhost:8000
Media Dashboard (Streamlit) on http://localhost:8501
Projection Screen (Streamlit) on http://localhost:8502

Browsers open automatically. Move the projection window to your second display or projector.

Usage

Media Dashboard (`localhost:8501`)

The operator screen used by the AV technician or service leader.

Confirm Backend: ONLINE status indicator is green
Click Start Listening — ASR begins capturing from the default microphone
Transcribed speech appears in the transcript panel in real time
When a scripture-seeking utterance is detected, matched verses appear with:
- Verse text and reference
- Cosine similarity confidence score
- LLM-generated theological explanation
Click Stop Listening to pause ASR
Use the Manual Input chat bar to test any text without microphone
Click Generate Session Report to export an HTML audit log

Projection Screen (`localhost:8502`)

Full-screen display for the congregation. Place on projector or secondary monitor.

Updates automatically whenever the media dashboard surfaces a new scripture
Displays verse text in large, high-contrast typography
No user interaction required

Testing the pipeline without the UI

from pipeline.intent import classify_intent
from pipeline.retriever import Retriever
from pipeline.context import SermonContext

r = Retriever()
ctx = SermonContext()

intent = classify_intent("God so loved the world he gave his only begotten son")
result = r.retrieve("God so loved the world", intent, ctx.get_context())
print(result)

Project Structure

scripture-live/
│
├── .env.example                  ← environment variable template
├── .env                          ← your secrets (never committed)
├── install.bat                   ← one-click Windows installer
├── run_service.bat               ← starts all three services
├── requirements.txt
├── CLAUDE.md                     ← AI assistant instructions
│
├── pipeline/                     ← core inference pipeline
│   ├── __init__.py
│   ├── asr.py                    ← ASR providers (Groq Whisper / local)
│   ├── intent.py                 ← intent classifier (LLaMA 3.1 8B)
│   ├── embedder.py               ← sentence-transformer embedding
│   ├── ingest.py                 ← ChromaDB ingestion
│   ├── retriever.py              ← RAG retrieval + reranking
│   ├── famous_verses.py          ← priority index (200+ passages)
│   └── context.py                ← sermon context accumulator
│
├── app/                          ← Streamlit + FastAPI applications
│   ├── __init__.py
│   ├── backend.py                ← FastAPI REST backend (:8000)
│   ├── media_dashboard.py        ← operator UI (:8501)
│   └── projection.py             ← congregation screen (:8502)
│
├── scripts/                      ← data and evaluation utilities
│   ├── __init__.py
│   ├── download_bibles.py        ← downloads all Bible versions
│   ├── build_index.py            ← builds ChromaDB vector store
│   ├── benchmark_models.py       ← runs model comparison evaluation
│   └── mic_test.py               ← microphone + ASR sanity check
│
├── data/
│   ├── raw/                      ← Bible JSON files (gitignored)
│   └── processed/                ← ChromaDB store (gitignored)
│
└── evaluation/
    └── results/                  ← benchmark CSVs and plots (gitignored)

Evaluation Results

End-to-End Benchmark (8-case ground-truth set)

Test Case	Input	Expected	Result	Latency
Famous verse — exact	"For God so loved the world"	John 3:16	✅ John 3:16	82ms
Famous verse — paraphrase	"I can do all things through Christ"	Phil 4:13	✅ Phil 4:13	79ms
Semantic — allusion	"Even though I walk through the valley of death"	Psalm 23:4	✅ Psalm 23:4	534ms
Semantic — concept	"Faith is the substance of things hoped for"	Heb 11:1	✅ Heb 11:1	489ms
Semantic — indirect	"His grace is sufficient for us"	2 Cor 12:9	✅ 2 Cor 12:9	611ms
Noise suppression	"Good morning everyone, welcome"	None	✅ No match	12ms
Noise suppression	"Hallelujah, praise the Lord"	None	✅ No match	8ms
Context-aware	"as Paul said earlier about love"	1 Cor 13	✅ 1 Cor 13	558ms

Accuracy: 8/8 (100%) · Mean latency: 297ms · Max latency: 611ms

Latency Breakdown

Stage	Typical Latency
ASR (Groq Whisper Large v3 Turbo)	~800ms per audio chunk
Intent classification (LLaMA 3.1 8B)	~200ms
Famous verse index lookup	~80ms
ChromaDB vector search (31,100 verses)	~35ms
LLaMA 3.3 70B reranking	~300ms
Total (famous verse path)	~80ms
Total (semantic + rerank path)	~522ms

Bible Data

All Bible texts are public domain and freely distributable.

Version	Full Name	Year	Verses
KJV	King James Version	1769	31,100
ASV	American Standard Version	1901	31,100
WEB	World English Bible	2000	31,100
BBE	Bible in Basic English	1949	31,100

The primary retrieval index is built on KJV. Multi-version display is supported in the projection screen.

Tech Stack

Component	Technology	Purpose
Speech Recognition	Groq Whisper Large v3 Turbo	Cloud ASR with accent robustness
Speech Recognition (fallback)	`faster-whisper` medium (CUDA)	Offline ASR on GPU
Intent Classification	LLaMA 3.1 8B Instant (Groq)	Zero-shot sermon intent labelling
Embeddings	`all-MiniLM-L6-v2` (sentence-transformers)	Dense verse representations
Vector Database	ChromaDB (persistent)	ANN search over 31,100 verses
Reranking & Explanation	LLaMA 3.3 70B Versatile (Groq)	Theological relevance scoring
Famous Verse Index	Custom Python (regex word-overlap)	Sub-100ms high-frequency retrieval
Backend API	FastAPI + Uvicorn	REST inference endpoint
Operator UI	Streamlit	Media dashboard
Projection UI	Streamlit	Full-screen congregation display
GPU Runtime	CUDA 12.x / RTX 3050 4GB	Embedding inference acceleration
Language	Python 3.10	Core runtime

Author

Kupakwashe T. Mapuranga
Master's Programme in Artificial Intelligence and Machine Learning Deep Learning

License & Academic Use

This project is made available for academic and research purposes.

Bible texts (KJV, ASV, WEB, BBE) are in the public domain
Source code is available for academic review and non-commercial research use
If you use this work in academic writing, please cite accordingly

This system was developed as part of a Master's research programme in AI/ML. It demonstrates a practical application of retrieval-augmented generation, intent classification, and real-time NLP to a domain-specific problem in religious technology.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
data/raw		data/raw
evaluation/results		evaluation/results
pipeline		pipeline
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
install.bat		install.bat
requirements.txt		requirements.txt
run_service.bat		run_service.bat

Folders and files

Latest commit

History

Repository files navigation

✝ Scripture Live

Real-Time Semantic Scripture Inference from Spoken Sermons

Abstract

Badges

Demo

System Architecture

Key Features

Research Contributions

Model Benchmarks

Setup & Installation

Prerequisites

1 — Clone the repository

2 — Run the installer

3 — Configure environment variables

4 — Download Bible data

5 — Build the vector index

6 — Launch the system

Usage

Media Dashboard (localhost:8501)

Projection Screen (localhost:8502)

Testing the pipeline without the UI

Project Structure

Evaluation Results

End-to-End Benchmark (8-case ground-truth set)

Latency Breakdown

Bible Data

Tech Stack

Author

License & Academic Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Media Dashboard (`localhost:8501`)

Projection Screen (`localhost:8502`)

Packages