Watch the walkthrough
The link above and the thumbnail below both go to the same demo.
Click the thumbnail to open the video.
VoicePrompt listens to your microphone, transcribes with Whisper, sends the line to LM Studio for a single detailed FLUX-style prompt (see voiceprompt/lm_system_default.txt), then queues your ComfyUI graph and saves a PNG under outputs/history/. Everything stays on your machine — no cloud API required for the default path.
Console flow: The cat on the table [You said] (transcript) then [Enhanced] (super detailed FLUX text sent to ComfyUI).
- Python 3.10+ — python.org (enable “Add Python to PATH” when offered).
- LM Studio — local LLM + OpenAI-compatible server (default
http://127.0.0.1:1234/v1). - ComfyUI — with a workflow exported as API JSON (default path:
workflows/voice-prompt_v01.json). - A microphone.
Stack in the screenshot
| Role | What’s running |
|---|---|
| LLM (speech → rich image prompt) | Meta Llama 3 8B Instruct — llama-3-8b-instruct in LM Studio |
| Image model (pixels) | FLUX.2 Klein in ComfyUI |
You can swap in other local LLMs or Comfy workflows; VoicePrompt only needs an OpenAI-compatible LM server and a workflow JSON with a text node to inject the prompt.
From the project folder:
- Double‑click
start_voiceprompt.bat,
or run:
py -3 app.pyUse py -3 (not guessed python) on Windows so you pick the right interpreter. The first run may download Whisper weights.
- Open LM Studio.
- Load your instruct model (e.g. Llama 3 8B or Qwen).
- Start the local server (port 1234 is typical).
- Launch ComfyUI (desktop, Pinokio, etc.).
- Confirm the API port (8188 by default).
If you only load one model in LM Studio, VoicePrompt can use the first id from http://127.0.0.1:1234/v1/models. If multiple models appear, set:
VOICEPROMPT_LM_MODEL=llama-3-8b-instruct
(use your model’s exact id from /v1/models).
Defaults expect workflows\voice-prompt_v01.json and prompt injection on node 76, field value. Override if needed:
| Variable | Purpose |
|---|---|
VOICEPROMPT_COMFY_WORKFLOW |
Path to your API workflow JSON |
VOICEPROMPT_COMFY_PROMPT_NODE |
Node id (e.g. 76) |
VOICEPROMPT_COMFY_PROMPT_FIELD |
Field name (e.g. value) |
| Step | Action |
|---|---|
| 1 | Start LM Studio (model loaded, server on). |
| 2 | Start ComfyUI. |
| 3 | Run start_voiceprompt.bat or py -3 app.py. |
At the voice-prompt> prompt:
help— command listwarmuponce — preloads Whisper (recommended)start— listen; speak, then pause ~1 s so end-of phrase is detectedstopwhen finished (or leavestarton for another line)
Mic selection: mics, then mic <n>, then start.
voice-prompt\outputs\history\
Files look like vp_<timestamp>.png. Successful runs also log:
[VoicePrompt] Saved gallery image → ...\outputs\history\vp_....png (... bytes)
| Problem | Try this |
|---|---|
No module named … |
py -3 app.py or start_voiceprompt.bat |
| LM Studio fails / empty reply | Server on? Model loaded? VOICEPROMPT_LM_MODEL matches /v1/models? VOICEPROMPT_LM_BASE_URL if cloud OPENAI_BASE_URL hijacks VoicePrompt |
| Comfy fails | Comfy listening on 8188? Workflow path and injection node correct? Run the workflow once manually in Comfy |
| Retry same prompt | regen — last [Enhanced] text → Comfy again |
| See last error | status |
VoicePrompt sends a fixed system message loaded from voiceprompt/lm_system_default.txt plus your transcription as user. Edit that file to match your tone, length, or model family (still aimed at FLUX-style prompts).
| Variable | Meaning |
|---|---|
OPENAI_BASE_URL |
LM Studio URL (default http://127.0.0.1:1234/v1) |
VOICEPROMPT_LM_BASE_URL |
Overrides OPENAI_BASE_URL for VoicePrompt only |
VOICEPROMPT_LM_MODEL |
Model id (optional; else first from /v1/models) |
VOICEPROMPT_COMFY_HOST / VOICEPROMPT_COMFY_PORT |
ComfyUI host/port |
VOICEPROMPT_COMFY_WORKFLOW |
Workflow JSON path |
VOICEPROMPT_COMFY_OUTPUT_DIR |
Comfy output folder (fallback if HTTP fetch fails) |
VOICEPROMPT_OUTPUT_DIR |
Where vp_*.png copies go |
LM Studio → ComfyUI → start_voiceprompt.bat → warmup → start → speak → check outputs\history.
