VoicePrompt

Watch the walkthrough

The link above and the thumbnail below both go to the same demo.

Click the thumbnail to open the video.

VoicePrompt

VoicePrompt listens to your microphone, transcribes with Whisper, sends the line to LM Studio for a single detailed FLUX-style prompt (see voiceprompt/lm_system_default.txt), then queues your ComfyUI graph and saves a PNG under outputs/history/. Everything stays on your machine — no cloud API required for the default path.

Console flow: The cat on the table `[You said]` (transcript) then `[Enhanced]` (super detailed FLUX text sent to ComfyUI).

What you need (one-time)

Python 3.10+ — python.org (enable “Add Python to PATH” when offered).
LM Studio — local LLM + OpenAI-compatible server (default http://127.0.0.1:1234/v1).
ComfyUI — with a workflow exported as API JSON (default path: workflows/voice-prompt_v01.json).
A microphone.

Desktop screenshot

VoicePrompt desktop: LM Studio, ComfyUI with FLUX.2 Klein, gallery, and VoicePrompt console

Stack in the screenshot

Role	What’s running
LLM (speech → rich image prompt)	Meta Llama 3 8B Instruct — `llama-3-8b-instruct` in LM Studio
Image model (pixels)	FLUX.2 Klein in ComfyUI

You can swap in other local LLMs or Comfy workflows; VoicePrompt only needs an OpenAI-compatible LM server and a workflow JSON with a text node to inject the prompt.

First-time setup

A. Install Python dependencies

From the project folder:

Double‑click start_voiceprompt.bat,
or run:

py -3 app.py

Use py -3 (not guessed python) on Windows so you pick the right interpreter. The first run may download Whisper weights.

B. LM Studio — model + server

Open LM Studio.
Load your instruct model (e.g. Llama 3 8B or Qwen).
Start the local server (port 1234 is typical).

C. ComfyUI

Launch ComfyUI (desktop, Pinokio, etc.).
Confirm the API port (8188 by default).

D. Tell VoicePrompt which LM to call (optional)

If you only load one model in LM Studio, VoicePrompt can use the first id from http://127.0.0.1:1234/v1/models. If multiple models appear, set:

VOICEPROMPT_LM_MODEL=llama-3-8b-instruct

(use your model’s exact id from /v1/models).

E. Workflow path (optional)

Defaults expect workflows\voice-prompt_v01.json and prompt injection on node 76, field value. Override if needed:

Variable	Purpose
`VOICEPROMPT_COMFY_WORKFLOW`	Path to your API workflow JSON
`VOICEPROMPT_COMFY_PROMPT_NODE`	Node id (e.g. `76`)
`VOICEPROMPT_COMFY_PROMPT_FIELD`	Field name (e.g. `value`)

Every session (quick checklist)

Step	Action
1	Start LM Studio (model loaded, server on).
2	Start ComfyUI.
3	Run `start_voiceprompt.bat` or `py -3 app.py`.

At the voice-prompt> prompt:

help — command list
warmup once — preloads Whisper (recommended)
start — listen; speak, then pause ~1 s so end-of phrase is detected
stop when finished (or leave start on for another line)

Mic selection: mics, then mic <n>, then start.

Where images are saved

voice-prompt\outputs\history\

Files look like vp_<timestamp>.png. Successful runs also log:

[VoicePrompt] Saved gallery image → ...\outputs\history\vp_....png (... bytes)

Troubleshooting

Problem	Try this
`No module named …`	`py -3 app.py` or `start_voiceprompt.bat`
LM Studio fails / empty reply	Server on? Model loaded? `VOICEPROMPT_LM_MODEL` matches `/v1/models`? `VOICEPROMPT_LM_BASE_URL` if cloud `OPENAI_BASE_URL` hijacks VoicePrompt
Comfy fails	Comfy listening on 8188? Workflow path and injection node correct? Run the workflow once manually in Comfy
Retry same prompt	`regen` — last `[Enhanced]` text → Comfy again
See last error	`status`

Customize prompt rewriting

VoicePrompt sends a fixed system message loaded from voiceprompt/lm_system_default.txt plus your transcription as user. Edit that file to match your tone, length, or model family (still aimed at FLUX-style prompts).

Optional environment variables

Variable	Meaning
`OPENAI_BASE_URL`	LM Studio URL (default `http://127.0.0.1:1234/v1`)
`VOICEPROMPT_LM_BASE_URL`	Overrides `OPENAI_BASE_URL` for VoicePrompt only
`VOICEPROMPT_LM_MODEL`	Model id (optional; else first from `/v1/models`)
`VOICEPROMPT_COMFY_HOST` / `VOICEPROMPT_COMFY_PORT`	ComfyUI host/port
`VOICEPROMPT_COMFY_WORKFLOW`	Workflow JSON path
`VOICEPROMPT_COMFY_OUTPUT_DIR`	Comfy output folder (fallback if HTTP fetch fails)
`VOICEPROMPT_OUTPUT_DIR`	Where *`vp_.png`** copies go

One-line cheat sheet

LM Studio → ComfyUI → start_voiceprompt.bat → warmup → start → speak → check outputs\history.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
voiceprompt		voiceprompt
workflows		workflows
.gitignore		.gitignore
README.md		README.md
app.py		app.py
github_card_v01.jpg		github_card_v01.jpg
readme-hero.png		readme-hero.png
requirements.txt		requirements.txt
run.py		run.py
start_voiceprompt.bat		start_voiceprompt.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoicePrompt

Console flow: The cat on the table `[You said]` (transcript) then `[Enhanced]` (super detailed FLUX text sent to ComfyUI).

What you need (one-time)

Desktop screenshot

First-time setup

A. Install Python dependencies

B. LM Studio — model + server

C. ComfyUI

D. Tell VoicePrompt which LM to call (optional)

E. Workflow path (optional)

Every session (quick checklist)

Where images are saved

Troubleshooting

Customize prompt rewriting

Optional environment variables

One-line cheat sheet

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoicePrompt

Console flow: The cat on the table [You said] (transcript) then [Enhanced] (super detailed FLUX text sent to ComfyUI).

What you need (one-time)

Desktop screenshot

First-time setup

A. Install Python dependencies

B. LM Studio — model + server

C. ComfyUI

D. Tell VoicePrompt which LM to call (optional)

E. Workflow path (optional)

Every session (quick checklist)

Where images are saved

Troubleshooting

Customize prompt rewriting

Optional environment variables

One-line cheat sheet

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Console flow: The cat on the table `[You said]` (transcript) then `[Enhanced]` (super detailed FLUX text sent to ComfyUI).

Packages