A transcription pipeline for Bitcoin conference talks, podcasts, and technical content. Ingests YouTube videos, transcribes audio via multiple STT providers, corrects transcripts with LLM, extracts metadata, generates summaries, and stores everything in PostgreSQL.
YouTube Video URL
|
[Preprocess] --> Download video, extract audio (FFmpeg)
|
[Transcribe] --> STT (Whisper / Deepgram / SmallestAI)
|
[Metadata Extraction] --> Gemini LLM (speakers, conference, topics)
|
[Correction] --> Gemini LLM (fix ASR errors, technical terms)
|
[Summarization] --> Gemini LLM (structured summary)
|
[Postprocess] --> Export to Markdown, save to PostgreSQL
youtube_channels (DB)
|
[ChannelScanner] --> YouTube Data API v3, discover new videos
|
[ContentClassifier] --> Gemini LLM, filter technical content
|
[IngestionService] --> Queue approved videos for transcription
| Provider | Type | Best For |
|---|---|---|
| Whisper | Local (OpenAI) | Offline, privacy-sensitive |
| Deepgram | Cloud API | Fast, accurate, diarization |
| SmallestAI | Cloud API | Multi-speaker, emotion detection |
All LLM services use google-genai SDK with gemini-3-flash-preview and include retry logic with exponential backoff for 503/429 errors.
| Service | Purpose | Chunk Size |
|---|---|---|
| MetadataExtractor | Extract speakers, conference, topics from video metadata | Single call |
| CorrectionService | Fix ASR errors, technical terminology | 5000 chars/chunk |
| SummarizerService | Generate structured summaries | 30000 chars/chunk |
| ContentClassifier | Classify videos as technical/non-technical | Single call |
- Backend: FastAPI (Python)
- Database: AWS RDS PostgreSQL
- Deployment: AWS EC2 (t3.small)
- Frontend: React + TypeScript + Vite (separate repo)
- Frontend Backend: Express.js (reads from RDS)
- Frontend Hosting: GitHub Pages
- HTTPS: Cloudflare Tunnel
- Linting: Ruff
# Clone the repo
git clone https://github.com/staru09/transcription_engine.git
cd transcription_engine
# Create venv and install deps
uv venv
uv pip install -r requirements.txt
# Or with pip
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
pip install -r requirements.txtcp env.example .envRequired environment variables:
| Variable | Purpose |
|---|---|
DATABASE_URL |
PostgreSQL connection string (local Docker or AWS RDS) |
GOOGLE_API_KEY |
Gemini API for correction, summarization, classification, metadata extraction |
YOUTUBE_API_KEY |
YouTube Data API v3 for channel scanning |
DEEPGRAM_API_KEY |
Deepgram STT (if using Deepgram) |
SMALLEST_API_KEY |
SmallestAI STT (if using SmallestAI) |
Optional:
| Variable | Purpose |
|---|---|
TRANSCRIPTION_SERVER_URL |
Override transcription server URL (default: http://localhost:8000) |
You can run PostgreSQL either locally via Docker or against AWS RDS. The app picks whichever DATABASE_URL points at.
Option A — Local PostgreSQL via Docker (no AWS needed):
# Start just the postgres container
docker compose up -d postgres
# Point the app at it (already the default in env.example)
# DATABASE_URL=postgresql://bitcoin:bitcoin@localhost:5432/transcription_engine
# Create the schema
tstbtc db initData persists in the postgres_data Docker volume. To wipe it: docker compose down -v.
Option B — AWS RDS (or any remote Postgres):
Set DATABASE_URL in .env to your RDS connection string, then run tstbtc db init once to create tables.
Check connectivity anytime with tstbtc db check.
Pipeline settings are in config.ini:
[DEFAULT]
deepgram = True
diarize = True
summarize = False
llm_provider = google
llm_correction_model = gemini-3-flash-preview
llm_summary_model = gemini-3-flash-preview
smallestai = False
classification_model = gemini-3-flash-preview
classification_min_duration = 600
classification_max_duration = 3000The project provides both a CLI (tstbtc / transcriber.py) and an HTTP API. The CLI auto-starts the server in the background, so for most workflows you only need the CLI.
Transcribe a YouTube video with Deepgram:
tstbtc transcribe "https://www.youtube.com/watch?v=VIDEO_ID" \
--deepgram \
--diarize \
--markdown \
--summarize \
--correct \
--llm-provider google \
--loc "tabconf" \
--username "your_name"With SmallestAI instead:
tstbtc transcribe "https://www.youtube.com/watch?v=VIDEO_ID" \
--smallestai \
--diarize \
--markdown \
--summarize \
--correct \
--llm-provider google \
--loc "tabconf" \
--username "your_name"With Whisper (local, no cloud STT key required) — omit both --deepgram and --smallestai:
tstbtc transcribe "https://www.youtube.com/watch?v=VIDEO_ID" \
--markdown \
--username "your_name"Other useful commands:
# Server management
tstbtc server start # start FastAPI server in background
tstbtc server status # check if running
tstbtc server stop # stop (required before switching STT provider)
tstbtc server logs --follow # tail logs live
# Transcribe a local audio file
tstbtc transcribe "/path/to/file.mp3" --deepgram --username "your_name"
# See all flags
tstbtc transcribe --helpNote on switching services: the server caches the transcription service (Deepgram/Whisper/SmallestAI) after the first request. To switch between them, run tstbtc server stop before the next transcription.
Output locations:
| What | Where |
|---|---|
| Markdown transcript | local_models/<loc>/<slug>.md |
| Raw STT output + metadata | metadata/<loc>/<slug>/ |
| Database row | transcripts table (local Postgres or AWS RDS) |
If you want to call the server directly (e.g. from another service):
# Start the server manually
python -m uvicorn server:app --host 0.0.0.0 --port 8000
# Queue a video
curl -X POST http://localhost:8000/transcription/add_to_queue/ \
-F "source=https://www.youtube.com/watch?v=VIDEO_ID" \
-F "loc=tabconf" \
-F "username=your_username" \
-F "smallestai=true" \
-F "diarize=true" \
-F "markdown=true" \
-F "correct=true" \
-F "summarize=true" \
-F "llm_provider=google"
# Start processing
curl -X POST http://localhost:8000/transcription/start/
# Check queue status
curl http://localhost:8000/transcription/queue/Scan a YouTube channel for new videos, classify them, and queue for transcription:
# Seed channels
python -m scripts.seed_channels
# Scan, classify, and queue via API
curl -X POST http://localhost:8000/ingestion/scan
curl -X POST http://localhost:8000/ingestion/classify
curl -X POST http://localhost:8000/ingestion/queue| Method | Endpoint | Description |
|---|---|---|
| POST | /transcription/add_to_queue/ |
Add a video to the transcription queue |
| POST | /transcription/start/ |
Start processing the queue |
| GET | /transcription/queue/ |
View current queue status |
| GET | /transcription/corrected/ |
Get corrected transcripts |
| GET | /transcription/summaries/ |
Get summaries |
| Method | Endpoint | Description |
|---|---|---|
| GET | /transcription/db/transcripts/ |
All transcripts from DB |
| GET | /transcription/db/transcripts/{id} |
Single transcript by ID |
| GET | /transcription/db/corrected/ |
Corrected transcripts from DB |
| GET | /transcription/db/summaries/ |
Summaries from DB |
| Method | Endpoint | Description |
|---|---|---|
| POST | /ingestion/scan |
Scan channels for new videos |
| POST | /ingestion/classify |
Classify pending videos |
| POST | /ingestion/queue |
Queue approved videos |
app/
config.py # Settings, env vars, config.ini
transcript.py # Transcript data model
transcription.py # Pipeline orchestrator
media_processor.py # Audio/video download and conversion
services/
correction.py # LLM transcript correction (Gemini)
summarizer.py # LLM summarization (Gemini)
metadata_extractor.py # LLM metadata extraction (Gemini)
content_classifier.py # LLM content classification (Gemini)
channel_scanner.py # YouTube channel scanning
ingestion_service.py # Automated ingestion pipeline
database_service.py # SQLAlchemy ORM for PostgreSQL
smallestai.py # SmallestAI STT provider
deepgram.py # Deepgram STT provider
routes/
transcription.py # Transcription API routes
ingestion.py # Ingestion API routes
scripts/
scan_tabconf.py # Standalone channel scanner
generate_audio.py # TTS audio generation
seed_channels.py # Seed YouTube channels in DB
server.py # FastAPI app entry point
config.ini # Pipeline configuration
Contributors can join our discord server here
This project is a fork of tstbtc, built by the Bitcoin Transcripts team. Their work on creating an open-source transcription pipeline for Bitcoin technical content made this project possible. We've extended the original engine with LLM-powered operations but the core transcription architecture and the vision of making Bitcoin knowledge accessible to everyone comes from their efforts. Thank you to the Bitcoin Transcripts contributors for building and maintaining this foundation.
MIT License. See LICENSE.