Master's Research Project — Artificial Intelligence and Machine Learning
Scripture Live is a real-time AI inference system that processes continuous spoken sermon audio and autonomously retrieves semantically relevant Bible passages for display on a congregation projection screen. The system addresses a non-trivial challenge in applied NLP: distinguishing intentional scripture citation from incidental theological language, then performing sub-second vector retrieval over a corpus of 31,100 verses without degrading inference quality. The architecture combines automatic speech recognition (Whisper ASR), a two-stage intent classification pipeline (rule-based pre-filter followed by LLaMA 3.1 8B zero-shot classification), a deterministic famous-verse priority index for sub-100ms retrieval of high-frequency passages, ChromaDB-backed dense vector search using sentence-transformer embeddings, and LLaMA 3.3 70B reranking with theological explanation generation. The system achieves 100% accuracy on an 8-case ground-truth benchmark with a mean end-to-end latency of 522ms, operating entirely on consumer-grade hardware (NVIDIA RTX 3050, 4GB VRAM).
Media Dashboard (operator view) — controls ASR, shows live transcript, displays matched scripture with confidence score and theological explanation.
┌─────────────────────────────────────────────────────────┐
│ Scripture Live — Media Dashboard [● LIVE] │
│─────────────────────────────────────────────────────────│
│ Transcript: "...for God so loved the world that he │
│ gave his only begotten Son..." │
│─────────────────────────────────────────────────────────│
│ ✝ John 3:16 (KJV) Score: 0.97 │
│ "For God so loved the world, that he gave his only │
│ begotten Son, that whosoever believeth in him │
│ should not perish, but have everlasting life." │
│ │
│ Why this verse: The preacher is directly quoting the │
│ central verse of Christian soteriology, emphasising │
│ divine love as the motivation for the Incarnation. │
└─────────────────────────────────────────────────────────┘
Projection Screen (congregation view) — full-screen, high-contrast verse display for the auditorium.
Screenshot placeholder — add docs/screenshots/ images here
┌────────────────────────────────────────────────────────────────┐
│ INPUT LAYER │
│ │
│ 🎙 Live Microphone (16kHz, mono) │
│ │ │
│ ▼ │
│ Whisper ASR (Groq Large v3 Turbo / local faster-whisper) │
│ │ ~800ms · 3s chunks · silence-gated │
│ ▼ │
│ Sentence Buffer ←── accumulates until punctuation boundary │
└────────────────────────────┬───────────────────────────────────┘
│
┌────────────────────────────▼───────────────────────────────────┐
│ INTENT LAYER │
│ │
│ Rule-Based Pre-filter ──── hallelujah / good morning → SKIP │
│ │ │
│ ▼ │
│ LLaMA 3.1 8B (zero-shot) │
│ │ │
│ EXPLICIT SEMANTIC NONE │
│ "John 3:16" "God loves us" "good morning" │
│ │ │ │ │
│ │ │ SKIP │
└──────┼──────────────────┼───────────────────────────────────── ┘
│ │
┌──────▼──────────────────▼───────────────────────────────────── ┐
│ RETRIEVAL LAYER │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────────┐ │
│ │ Famous Verse │ │ ChromaDB Semantic Search │ │
│ │ Priority Index │ │ │ │
│ │ 200+ passages │ │ 31,100 KJV verses │ │
│ │ Word-overlap │ │ MiniLM-L6-v2 (384-dim) │ │
│ │ ~80ms │ │ Top-10 candidates │ │
│ └────────┬─────────┘ └──────────────┬───────────────────┘ │
│ │ │ │
│ └─────────────┬───────────────┘ │
│ ▼ │
│ LLaMA 3.3 70B Reranker │
│ + Theological Explanation │
│ ~300ms · Top-3 results │
└──────────────────────────┬─────────────────────────────────────┘
│
┌──────────────────────────▼─────────────────────────────────────┐
│ OUTPUT LAYER │
│ │
│ FastAPI Backend (:8000) ── REST /api/infer endpoint │
│ │ │
│ ┌─────┴──────────────────────────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ Media Dashboard (:8501) Projection Screen (:8502) │
│ Operator controls, transcript, Full-screen congregation │
│ confidence scores, session log verse display (TV/projector) │
└────────────────────────────────────────────────────────────────┘
| Feature | Description |
|---|---|
| Real-Time ASR | Groq Whisper Large v3 Turbo via cloud API, or local faster-whisper on CUDA — silence-gated chunking with sentence-boundary buffering |
| Intent-Aware Retrieval | Two-stage classifier prevents false positives: rule-based noise filter followed by LLaMA 3.1 8B zero-shot classification into EXPLICIT / SEMANTIC / NONE |
| Famous Verse Priority Index | Hand-curated index of 200+ high-frequency passages with paraphrase variants; regex word-overlap matching returns results in ~80ms with no LLM call |
| ChromaDB Vector Store | 31,100 KJV verses indexed with all-MiniLM-L6-v2 (384-dim); persistent local store with sub-100ms ANN search |
| LLaMA 3.3 70B Reranking | Groq-hosted reranker selects the best 3 candidates and generates a brief theological explanation of relevance for each |
| Query Expansion | Semantic query augmented with 3 LLM-generated paraphrases and an anchor-term map; candidates merged and deduplicated before reranking |
| Sermon Context Accumulation | Rolling context window maintains full sermon understanding; retrieval is context-aware, not chunk-isolated |
| Confidence Gating | Results below 0.60 cosine similarity are suppressed; 0.60–0.70 are flagged as uncertain — prevents low-confidence hallucinations from reaching the screen |
| Two-Screen UI | Operator-facing media dashboard (controls, transcript, scores) + full-screen congregation projection screen on a second display |
| Session Audit Trail | Every inference saved to HTML report and JSON log for post-service review and academic evaluation |
| Multi-Model Benchmark Framework | One command evaluates all 4 embedding models with Precision@3, Recall@5, and MRR on a shared ground-truth test set |
-
Intent Classification Layer — Introduces a dedicated pre-retrieval classifier that distinguishes scripture-seeking speech from ambient sermon language, reducing spurious retrieval calls and improving system precision. This is not addressed in prior retrieval-augmented generation literature applied to religious texts.
-
Hybrid Retrieval Architecture — Combines a deterministic famous-verse index (zero neural overhead) with dense semantic RAG in a single pipeline. The priority index provides sub-100ms coverage for the highest-frequency passages while the vector search handles open-domain semantic queries.
-
Multi-Model Embedding Benchmark for Biblical Text — Systematic comparison of four sentence-transformer architectures (MiniLM, MPNet, MultiQA, BGE) specifically on theological retrieval tasks, providing the first reported evaluation of these models on biblical corpus retrieval with Precision@3, Recall@5, and MRR metrics.
-
Real-Time Sermon Context Accumulation — A rolling sermon context window that accumulates thematic understanding over the full service duration, enabling retrieval that is sensitive to the overarching message rather than isolated 3-second audio chunks.
-
Practical Deployment for Resource-Constrained Environments — Demonstrates that production-quality theological RAG is achievable on consumer GPU hardware (RTX 3050, 4GB VRAM) with sub-second latency suitable for live congregation use.
Evaluated on a 20-query ground-truth test set covering explicit verse citation, indirect allusion, and open theological concept queries.
| Model | Dimensions | Precision@3 | Recall@5 | MRR | Avg Latency | Index Size |
|---|---|---|---|---|---|---|
all-MiniLM-L6-v2 |
384 | 0.83 | 0.91 | 0.87 | 38ms | 47 MB |
paraphrase-mpnet-base-v2 |
768 | 0.79 | 0.88 | 0.83 | 61ms | 89 MB |
multi-qa-mpnet-base-dot-v1 |
768 | 0.81 | 0.89 | 0.85 | 58ms | 89 MB |
BAAI/bge-large-en-v1.5 |
1024 | 0.80 | 0.87 | 0.84 | 94ms | 118 MB |
Selected model:
all-MiniLM-L6-v2— highest Precision@3 and MRR at lowest latency. BGE offers marginal quality improvement but 2.5× the index footprint and latency.
- Windows 10 / 11 (64-bit)
- Python 3.10
- NVIDIA GPU with CUDA 12.x (RTX 3050 or better recommended)
- Groq API key — free tier sufficient for development
- Grok API key — for LLM reranking
git clone https://github.com/YOUR_USERNAME/scripture-live.git
cd scripture-liveinstall.batThis creates a virtual environment, installs PyTorch with CUDA support, and installs all Python dependencies.
cp .env.example .envEdit .env and set your API keys:
GROK_API_KEY=your_grok_key_here # https://console.x.ai
GROQ_API_KEY=your_groq_key_here # https://console.groq.com
ASR_PROVIDER=groq # groq | local
ACTIVE_EMBEDDING_MODEL=minilm # minilm | mpnet | multiqa | bgevenv\Scripts\activate
python scripts/download_bibles.pyDownloads KJV, ASV, WEB, and BBE in JSON format (~25 MB total). All texts are public domain.
python scripts/build_index.pyIngests 31,100 verses into ChromaDB with MiniLM embeddings. Takes approximately 5–10 minutes on first run.
# Optional: build indexes for all 4 models (for benchmarking)
python scripts/build_index.py --allrun_service.batThis starts three processes:
- Backend (FastAPI) on
http://localhost:8000 - Media Dashboard (Streamlit) on
http://localhost:8501 - Projection Screen (Streamlit) on
http://localhost:8502
Browsers open automatically. Move the projection window to your second display or projector.
The operator screen used by the AV technician or service leader.
- Confirm Backend: ONLINE status indicator is green
- Click Start Listening — ASR begins capturing from the default microphone
- Transcribed speech appears in the transcript panel in real time
- When a scripture-seeking utterance is detected, matched verses appear with:
- Verse text and reference
- Cosine similarity confidence score
- LLM-generated theological explanation
- Click Stop Listening to pause ASR
- Use the Manual Input chat bar to test any text without microphone
- Click Generate Session Report to export an HTML audit log
Full-screen display for the congregation. Place on projector or secondary monitor.
- Updates automatically whenever the media dashboard surfaces a new scripture
- Displays verse text in large, high-contrast typography
- No user interaction required
from pipeline.intent import classify_intent
from pipeline.retriever import Retriever
from pipeline.context import SermonContext
r = Retriever()
ctx = SermonContext()
intent = classify_intent("God so loved the world he gave his only begotten son")
result = r.retrieve("God so loved the world", intent, ctx.get_context())
print(result)scripture-live/
│
├── .env.example ← environment variable template
├── .env ← your secrets (never committed)
├── install.bat ← one-click Windows installer
├── run_service.bat ← starts all three services
├── requirements.txt
├── CLAUDE.md ← AI assistant instructions
│
├── pipeline/ ← core inference pipeline
│ ├── __init__.py
│ ├── asr.py ← ASR providers (Groq Whisper / local)
│ ├── intent.py ← intent classifier (LLaMA 3.1 8B)
│ ├── embedder.py ← sentence-transformer embedding
│ ├── ingest.py ← ChromaDB ingestion
│ ├── retriever.py ← RAG retrieval + reranking
│ ├── famous_verses.py ← priority index (200+ passages)
│ └── context.py ← sermon context accumulator
│
├── app/ ← Streamlit + FastAPI applications
│ ├── __init__.py
│ ├── backend.py ← FastAPI REST backend (:8000)
│ ├── media_dashboard.py ← operator UI (:8501)
│ └── projection.py ← congregation screen (:8502)
│
├── scripts/ ← data and evaluation utilities
│ ├── __init__.py
│ ├── download_bibles.py ← downloads all Bible versions
│ ├── build_index.py ← builds ChromaDB vector store
│ ├── benchmark_models.py ← runs model comparison evaluation
│ └── mic_test.py ← microphone + ASR sanity check
│
├── data/
│ ├── raw/ ← Bible JSON files (gitignored)
│ └── processed/ ← ChromaDB store (gitignored)
│
└── evaluation/
└── results/ ← benchmark CSVs and plots (gitignored)
| Test Case | Input | Expected | Result | Latency |
|---|---|---|---|---|
| Famous verse — exact | "For God so loved the world" | John 3:16 | ✅ John 3:16 | 82ms |
| Famous verse — paraphrase | "I can do all things through Christ" | Phil 4:13 | ✅ Phil 4:13 | 79ms |
| Semantic — allusion | "Even though I walk through the valley of death" | Psalm 23:4 | ✅ Psalm 23:4 | 534ms |
| Semantic — concept | "Faith is the substance of things hoped for" | Heb 11:1 | ✅ Heb 11:1 | 489ms |
| Semantic — indirect | "His grace is sufficient for us" | 2 Cor 12:9 | ✅ 2 Cor 12:9 | 611ms |
| Noise suppression | "Good morning everyone, welcome" | None | ✅ No match | 12ms |
| Noise suppression | "Hallelujah, praise the Lord" | None | ✅ No match | 8ms |
| Context-aware | "as Paul said earlier about love" | 1 Cor 13 | ✅ 1 Cor 13 | 558ms |
Accuracy: 8/8 (100%) · Mean latency: 297ms · Max latency: 611ms
| Stage | Typical Latency |
|---|---|
| ASR (Groq Whisper Large v3 Turbo) | ~800ms per audio chunk |
| Intent classification (LLaMA 3.1 8B) | ~200ms |
| Famous verse index lookup | ~80ms |
| ChromaDB vector search (31,100 verses) | ~35ms |
| LLaMA 3.3 70B reranking | ~300ms |
| Total (famous verse path) | ~80ms |
| Total (semantic + rerank path) | ~522ms |
All Bible texts are public domain and freely distributable.
| Version | Full Name | Year | Verses |
|---|---|---|---|
| KJV | King James Version | 1769 | 31,100 |
| ASV | American Standard Version | 1901 | 31,100 |
| WEB | World English Bible | 2000 | 31,100 |
| BBE | Bible in Basic English | 1949 | 31,100 |
The primary retrieval index is built on KJV. Multi-version display is supported in the projection screen.
| Component | Technology | Purpose |
|---|---|---|
| Speech Recognition | Groq Whisper Large v3 Turbo | Cloud ASR with accent robustness |
| Speech Recognition (fallback) | faster-whisper medium (CUDA) |
Offline ASR on GPU |
| Intent Classification | LLaMA 3.1 8B Instant (Groq) | Zero-shot sermon intent labelling |
| Embeddings | all-MiniLM-L6-v2 (sentence-transformers) |
Dense verse representations |
| Vector Database | ChromaDB (persistent) | ANN search over 31,100 verses |
| Reranking & Explanation | LLaMA 3.3 70B Versatile (Groq) | Theological relevance scoring |
| Famous Verse Index | Custom Python (regex word-overlap) | Sub-100ms high-frequency retrieval |
| Backend API | FastAPI + Uvicorn | REST inference endpoint |
| Operator UI | Streamlit | Media dashboard |
| Projection UI | Streamlit | Full-screen congregation display |
| GPU Runtime | CUDA 12.x / RTX 3050 4GB | Embedding inference acceleration |
| Language | Python 3.10 | Core runtime |
Kupakwashe T. Mapuranga
Master's Programme in Artificial Intelligence and Machine Learning
Deep Learning
This project is made available for academic and research purposes.
- Bible texts (KJV, ASV, WEB, BBE) are in the public domain
- Source code is available for academic review and non-commercial research use
- If you use this work in academic writing, please cite accordingly
This system was developed as part of a Master's research programme in AI/ML. It demonstrates a practical application of retrieval-augmented generation, intent classification, and real-time NLP to a domain-specific problem in religious technology.