AI Voice Transcript

title	AI Voice Transcript
sdk	streamlit
sdk_version	1.40.0
app_file	streamlit_app.py
pinned	false
license	mit

AI Voice Transcript

Local voice-to-text powered by OpenAI's Whisper (via faster-whisper). Runs fully offline on your machine. No API keys, no per-minute charges. Three ways to use it:

Mode	File	Best for
Desktop GUI	`app.py`	Day-to-day use on your own machine
Web app	`streamlit_app.py`	Sharing via a URL, deploying to the cloud
CLI	`transcribe_file.py`, `record_and_transcribe.py`	Scripting, batch jobs

The YAML block above is read by Hugging Face Spaces — it tells HF this is a Streamlit app and which file to run.

Features

Drag-and-drop audio/video file transcription (desktop)
Microphone capture with device selector and silent-recording detection (desktop)
Live-streaming transcript display as Whisper decodes
9 pre-configured languages plus auto-detect
5 model sizes — pick speed vs. accuracy
Optional timestamps
Copy / Save As / Open Folder shortcuts
Same backend across all three modes

Quick start (local)

git clone https://github.com/<your-username>/ai-voice-transcript.git
cd ai-voice-transcript

py -3.12 -m venv .venv
.\.venv\Scripts\Activate.ps1

pip install -r requirements.txt           # core deps (web + transcription)
pip install -r requirements-desktop.txt   # add GUI + mic recording

On Linux/macOS use bash and source .venv/bin/activate instead.

Run the desktop GUI

Windows: double-click Launch App.bat.

Any OS:

python app.py

Run the web app locally

streamlit run streamlit_app.py

Open the URL Streamlit prints (usually http://localhost:8501).

Run the CLI

python transcribe_file.py path/to/audio.m4a --model base --language en
python record_and_transcribe.py --seconds 15

Model sizes

Model	Disk	Speed	Accuracy	Memory
`tiny`	~75 MB	fastest	OK	~400 MB
`base`	~150 MB	fast	good (default)	~500 MB
`small`	~500 MB	medium	better	~1 GB
`medium`	~1.5 GB	slow	great	~3 GB
`large-v3`	~3 GB	slowest	best	~6 GB

Models auto-download to ~/.cache/huggingface/ on first use.

Deploying online

See DEPLOY.md for step-by-step guides:

Hugging Face Spaces — recommended (16 GB RAM free tier, no cold starts)
Render.com — generic web host (works but 512 MB free tier is tight for Whisper)
Streamlit Community Cloud — native home for Streamlit apps

How it works

Whisper is a neural network trained on ~680,000 hours of multilingual audio.
faster-whisper converts Whisper to the CTranslate2 runtime with int8 quantization, giving ~4x speedup on CPU and lower memory use.
VAD (Voice Activity Detection) filters silence before decoding — speeds things up and prevents Whisper from "hallucinating" text on empty audio.
16 kHz mono is Whisper's native input; the desktop recorder uses this directly so no resampling is needed.
Background threading in the GUI keeps the UI responsive during long transcriptions — workers post events to a queue, the main thread polls and renders.

Project layout

ai-voice-transcript/
├── app.py                       # Desktop GUI (Tkinter + tkinterdnd2)
├── streamlit_app.py             # Web app (Streamlit)
├── transcribe_file.py           # CLI: transcribe a file
├── record_and_transcribe.py     # CLI: record from mic + transcribe
├── mic_test.py                  # CLI: diagnose mic issues
├── Launch App.bat               # Windows: double-click to launch GUI
├── Launch App (debug).bat       # Windows: same but with visible console
├── requirements.txt             # Cloud / web deps
├── requirements-desktop.txt     # Adds GUI + mic recording deps
├── DEPLOY.md                    # Step-by-step deployment guide
├── recordings/                  # Captured audio (gitignored)
└── transcripts/                 # Generated text output (gitignored)

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Voice Transcript

Features

Quick start (local)

Run the desktop GUI

Run the web app locally

Run the CLI

Model sizes

Deploying online

How it works

Project layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
recordings		recordings
transcripts		transcripts
.gitignore		.gitignore
DEPLOY.md		DEPLOY.md
Launch App (debug).bat		Launch App (debug).bat
Launch App.bat		Launch App.bat
README.md		README.md
app.py		app.py
create-desktop-shortcut.ps1		create-desktop-shortcut.ps1
mic_test.py		mic_test.py
record_and_transcribe.py		record_and_transcribe.py
requirements-desktop.txt		requirements-desktop.txt
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
transcribe_file.py		transcribe_file.py

Folders and files

Latest commit

History

Repository files navigation

AI Voice Transcript

Features

Quick start (local)

Run the desktop GUI

Run the web app locally

Run the CLI

Model sizes

Deploying online

How it works

Project layout

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages