Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -1665,6 +1665,16 @@ command = "ollama run llama3.2:1b 'Clean up:'"
timeout_ms = 45000 # 45 second timeout for LLM
```

### Context from Previous Dictation

When post-processing is enabled, voxtype passes the previous dictation's text via the `VOXTYPE_CONTEXT` environment variable (if the previous dictation was within 60 seconds). This helps LLM-based cleanup scripts maintain continuity across rapid-fire dictations.

- Stdin always contains only the current text (existing scripts work unchanged)
- Scripts that want context can optionally read `$VOXTYPE_CONTEXT`
- In meeting mode, context is tracked per audio source (mic/loopback) to prevent speaker bleed

See the example scripts in `examples/` for how to use `VOXTYPE_CONTEXT`.

### Error Handling

If the post-processing command fails for any reason (command not found, non-zero
Expand Down
20 changes: 20 additions & 0 deletions docs/USER_MANUAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1676,6 +1676,26 @@ Make it executable: `chmod +x ~/.config/voxtype/lm-studio-cleanup.sh`
| Local LLMs (Ollama, llama.cpp) | 30000-60000ms (30-60 seconds) |
| Remote APIs | 30000ms or higher |

### Context from Previous Dictation

When post-processing is enabled, voxtype automatically passes the previous dictation's text to your script via the `VOXTYPE_CONTEXT` environment variable if the previous dictation was within 60 seconds. This helps LLMs maintain continuity when you dictate in quick succession.

Your script receives:
- **Stdin**: Current text only (existing scripts work unchanged)
- **`$VOXTYPE_CONTEXT`**: Previous dictation text (optional, read it if you want context)

Example usage in a cleanup script:
```bash
PROMPT="Clean up this dictation:"
if [[ -n "${VOXTYPE_CONTEXT:-}" ]]; then
printf -v PROMPT '%s\n\nPrevious dictation for context (do NOT include in output):\n%s\n\nCurrent text to clean up:' "$PROMPT" "$VOXTYPE_CONTEXT"
fi
```

In meeting mode, context is tracked separately for microphone and loopback audio to prevent speaker bleed.

See the example scripts in `examples/` for full implementations.

### Error Handling

Post-processing is designed to be fault-tolerant:
Expand Down
11 changes: 9 additions & 2 deletions examples/gemini-cleanup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,20 @@ if [[ -z "$INPUT" ]]; then
exit 0
fi

# Build prompt with optional context from previous dictation
PROMPT="Clean up this dictated text. Remove filler words (um, uh, like), fix grammar and punctuation. Output ONLY the cleaned text - no quotes, no emojis, no explanations:"

if [[ -n "${VOXTYPE_CONTEXT:-}" ]]; then
printf -v PROMPT '%s\n\nPrevious dictation for context (do NOT include this in your output):\n%s\n\nCurrent text to clean up:' "$PROMPT" "$VOXTYPE_CONTEXT"
fi

# Build JSON payload with jq to handle special characters
JSON=$(jq -n --arg text "$INPUT" '{
JSON=$(jq -n --arg text "$INPUT" --arg prompt "$PROMPT" '{
contents: [
{
parts: [
{
text: ("Clean up this dictated text. Remove filler words (um, uh, like), fix grammar and punctuation. Output ONLY the cleaned text - no quotes, no emojis, no explanations:\n\n" + $text)
text: ($prompt + "\n\n" + $text)
}
]
}
Expand Down
11 changes: 9 additions & 2 deletions examples/ollama-cleanup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,17 @@

INPUT=$(cat)

# Build prompt with optional context from previous dictation
PROMPT="Clean up this dictated text. Remove filler words (um, uh, like), fix grammar and punctuation. Output ONLY the cleaned text - no quotes, no emojis, no explanations:"

if [[ -n "${VOXTYPE_CONTEXT:-}" ]]; then
printf -v PROMPT '%s\n\nPrevious dictation for context (do NOT include this in your output):\n%s\n\nCurrent text to clean up:' "$PROMPT" "$VOXTYPE_CONTEXT"
fi

# Build JSON payload properly with jq to handle special characters
JSON=$(jq -n --arg text "$INPUT" '{
JSON=$(jq -n --arg text "$INPUT" --arg prompt "$PROMPT" '{
model: "llama3.2:1b",
prompt: ("Clean up this dictated text. Remove filler words (um, uh, like), fix grammar and punctuation. Output ONLY the cleaned text - no quotes, no emojis, no explanations:\n\n" + $text),
prompt: ($prompt + "\n\n" + $text),
stream: false
}')

Expand Down
40 changes: 26 additions & 14 deletions examples/openai-cleanup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,21 +29,33 @@ if [[ -z "$INPUT" ]]; then
exit 0
fi

# Build prompt with optional context from previous dictation
SYSTEM_PROMPT="You clean up dictated text. Remove filler words (um, uh, like), fix grammar and punctuation. Output ONLY the cleaned text - no quotes, no emojis, no explanations."

if [[ -n "${VOXTYPE_CONTEXT:-}" ]]; then
SYSTEM_PROMPT="${SYSTEM_PROMPT} You will receive the previous dictation for context - do NOT include it in your output, only clean up the current text."
fi

# Build JSON payload with jq to handle special characters
JSON=$(jq -n --arg text "$INPUT" '{
model: "gpt-4o-mini",
messages: [
{
role: "system",
content: "You clean up dictated text. Remove filler words (um, uh, like), fix grammar and punctuation. Output ONLY the cleaned text - no quotes, no emojis, no explanations."
},
{
role: "user",
content: $text
}
],
max_tokens: 1000
}')
if [[ -n "${VOXTYPE_CONTEXT:-}" ]]; then
JSON=$(jq -n --arg text "$INPUT" --arg system "$SYSTEM_PROMPT" --arg context "$VOXTYPE_CONTEXT" '{
model: "gpt-4o-mini",
messages: [
{ role: "system", content: $system },
{ role: "user", content: ("Previous dictation for context:\n" + $context + "\n\nCurrent text to clean up:\n" + $text) }
],
max_tokens: 1000
}')
else
JSON=$(jq -n --arg text "$INPUT" --arg system "$SYSTEM_PROMPT" '{
model: "gpt-4o-mini",
messages: [
{ role: "system", content: $system },
{ role: "user", content: $text }
],
max_tokens: 1000
}')
fi

# Call OpenAI API
RESPONSE=$(curl -s --max-time 8 \
Expand Down
46 changes: 34 additions & 12 deletions src/daemon.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ use pidlock::Pidlock;
use std::path::PathBuf;
use std::process::Stdio;
use std::sync::Arc;
use std::time::Duration;
use std::time::{Duration, Instant};
use tokio::process::Command;
use tokio::signal::unix::{signal, SignalKind};

Expand Down Expand Up @@ -475,6 +475,8 @@ pub struct Daemon {
audio_feedback: Option<AudioFeedback>,
text_processor: TextProcessor,
post_processor: Option<PostProcessor>,
/// Last post-processed text and when it was produced, for context in subsequent dictations
last_dictation: Option<(String, Instant)>,
// Model manager for multi-model support
model_manager: Option<ModelManager>,
// Background task for loading model on-demand
Expand Down Expand Up @@ -588,6 +590,7 @@ impl Daemon {
audio_feedback,
text_processor,
post_processor,
last_dictation: None,
model_manager: None,
model_load_task: None,
transcription_task: None,
Expand Down Expand Up @@ -1231,7 +1234,7 @@ impl Daemon {

/// Handle transcription completion (called when transcription_task completes)
async fn handle_transcription_result(
&self,
&mut self,
state: &mut State,
result: std::result::Result<TranscriptionResult, tokio::task::JoinError>,
) {
Expand Down Expand Up @@ -1277,6 +1280,14 @@ impl Daemon {
}
}

// Get context from last dictation if within 60 seconds
let recent_context = self.last_dictation.as_ref().and_then(|(text, when)| {
if when.elapsed() < Duration::from_secs(60) {
Some(text.as_str())
} else {
None
}
});
Comment on lines +1284 to +1290
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recent_context is a &str borrowed from self.last_dictation and is passed into async post-processing calls (.await). The function then updates self.last_dictation afterwards, which is likely to cause a borrow-checker error (borrow of self held across await, then mutated). To avoid this, materialize context into a local owned value (e.g., clone the string into an Option<String> and use .as_deref() when calling process_with_context).

Suggested change
let recent_context = self.last_dictation.as_ref().and_then(|(text, when)| {
if when.elapsed() < Duration::from_secs(60) {
Some(text.as_str())
} else {
None
}
});
let recent_context_owned =
self.last_dictation.as_ref().and_then(|(text, when)| {
if when.elapsed() < Duration::from_secs(60) {
Some(text.clone())
} else {
None
}
});
let recent_context = recent_context_owned.as_deref();

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4637653. Cloned the context string into an owned Option<String> and use .as_deref() at all three call sites. Same reasoning as the meeting/mod.rs fix: the original compiled fine under NLL, but the owned value is more robust.

// Apply post-processing command (profile overrides default)
let final_text = if let Some(profile) = active_profile {
if let Some(ref cmd) = profile.post_process_command {
Expand All @@ -1287,32 +1298,43 @@ impl Daemon {
};
let profile_processor = PostProcessor::new(&profile_config);
tracing::info!(
"Post-processing with profile: {:?}",
profile_override.as_ref().unwrap()
"Post-processing with profile: {:?}, context: {:?}",
profile_override.as_ref().unwrap(),
recent_context
);
Comment on lines 1299 to 1304
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging recent_context at info level can leak the previous dictation text into logs (potentially sensitive/PII) in addition to the current transcription. Consider logging only whether context was present (or its length), and/or moving the context value to debug tracing.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in 4637653. The info level now only logs whether context is present (has_context: true/false) and whether the text changed. Actual text content (input, context, result) is now at debug level only.

let result = profile_processor.process(&processed_text).await;
tracing::info!("Post-processed: {:?}", result);
let result = profile_processor
.process_with_context(&processed_text, recent_context)
.await;
tracing::info!("Post-processed: {:?}, changed: {}", result, result != processed_text);
result
} else {
// Profile exists but has no post_process_command, use default
if let Some(ref post_processor) = self.post_processor {
tracing::info!("Post-processing: {:?}", processed_text);
let result = post_processor.process(&processed_text).await;
tracing::info!("Post-processed: {:?}", result);
tracing::info!("Post-processing: {:?}, context: {:?}", processed_text, recent_context);
let result = post_processor
.process_with_context(&processed_text, recent_context)
.await;
tracing::info!("Post-processed: {:?}, changed: {}", result, result != processed_text);
result
} else {
processed_text
}
}
} else if let Some(ref post_processor) = self.post_processor {
tracing::info!("Post-processing: {:?}", processed_text);
let result = post_processor.process(&processed_text).await;
tracing::info!("Post-processed: {:?}", result);
tracing::info!("Post-processing: {:?}, context: {:?}", processed_text, recent_context);
let result = post_processor
.process_with_context(&processed_text, recent_context)
.await;
tracing::info!("Post-processed: {:?}, changed: {}", result, result != processed_text);
result
} else {
processed_text
};

// Track last dictation for context in subsequent post-processing
self.last_dictation =
Some((final_text.clone(), Instant::now()));

if smart_submit {
tracing::debug!(
"Smart auto-submit: final text after post-processing: {:?}",
Expand Down
2 changes: 1 addition & 1 deletion src/meeting/data.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ impl std::str::FromStr for MeetingId {
}

/// Audio source for speaker attribution
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
#[derive(Default)]
pub enum AudioSource {
Expand Down
37 changes: 36 additions & 1 deletion src/meeting/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,9 @@ pub use state::{ChunkState, MeetingState};
pub use storage::{MeetingStorage, StorageConfig, StorageError};

use crate::error::{MeetingError, Result};
use crate::output::post_process::PostProcessor;
use crate::transcribe::{self, Transcriber};
use std::collections::HashMap;
use std::sync::Arc;
use tokio::sync::mpsc;

Expand Down Expand Up @@ -100,6 +102,10 @@ pub struct MeetingDaemon {
transcriber: Option<Arc<dyn Transcriber>>,
engine_name: String,
event_tx: mpsc::Sender<MeetingEvent>,
post_processor: Option<PostProcessor>,
/// Previous chunk's post-processed text, tracked per audio source
/// so mic and loopback contexts don't bleed into each other
last_chunk_text: HashMap<AudioSource, String>,
}

impl MeetingDaemon {
Expand All @@ -116,6 +122,15 @@ impl MeetingDaemon {
Arc::from(transcribe::create_transcriber(app_config)?);
let engine_name = format!("{:?}", app_config.engine).to_lowercase();

let post_processor = app_config.output.post_process.as_ref().map(|cfg| {
tracing::info!(
"Meeting post-processing enabled: command={:?}, timeout={}ms",
cfg.command,
cfg.timeout_ms
);
PostProcessor::new(cfg)
});

Ok(Self {
config,
state: MeetingState::Idle,
Expand All @@ -124,6 +139,8 @@ impl MeetingDaemon {
transcriber: Some(transcriber),
engine_name,
event_tx,
post_processor,
last_chunk_text: HashMap::new(),
})
}

Expand Down Expand Up @@ -190,6 +207,7 @@ impl MeetingDaemon {
}

self.state = std::mem::take(&mut self.state).stop();
self.last_chunk_text.clear();

// Finalize meeting
if let Some(ref mut meeting) = self.current_meeting {
Expand Down Expand Up @@ -280,10 +298,27 @@ impl MeetingDaemon {
let mut buffer = processor.new_buffer(chunk_id, source, start_offset_ms);
buffer.add_samples(&samples);

let result = processor
let mut result = processor
.process_chunk(buffer)
.map_err(crate::error::VoxtypeError::Transcribe)?;

// Post-process segment text if configured
if let Some(ref post_processor) = self.post_processor {
let context = self.last_chunk_text.get(&source).map(|s| s.as_str());
for segment in &mut result.segments {
if !segment.text.is_empty() {
segment.text = post_processor
.process_with_context(&segment.text, context)
.await;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context borrows from self.last_chunk_text and is then held across .await calls to process_with_context. Because the method later mutates self.last_chunk_text (via insert), this is very likely to fail to compile with a borrow-across-await error. Consider copying the context into an owned String (or cloning the map value) before awaiting, and/or updating a local context variable as you process segments, then write back to the map after the awaits complete.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4637653. Cloned the context into an owned String before the async loop, using .as_deref() at the call site. The original code did compile (NLL drops the borrow before the insert), but the clone is cheap and avoids fragility if the loop body changes later.

}
}
// Update context for next chunk (per source)
if let Some(last_seg) = result.segments.last() {
self.last_chunk_text
.insert(source, last_seg.text.clone());
}
}

// Add segments to transcript
if let Some(ref mut meeting) = self.current_meeting {
for segment in &result.segments {
Expand Down
39 changes: 30 additions & 9 deletions src/output/post_process.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,14 @@ impl PostProcessor {
}
}

/// Process text through the external command
/// Process text with optional context from a previous chunk
///
/// When context is provided, it is passed via the VOXTYPE_CONTEXT environment
/// variable so the post-processing command can use it for continuity.
/// Stdin always contains only the current text, keeping existing scripts compatible.
/// Returns the processed text on success, or the original text on any failure.
/// This ensures voice-to-text always produces output even when post-processing fails.
pub async fn process(&self, text: &str) -> String {
match self.execute_command(text).await {
pub async fn process_with_context(&self, text: &str, context: Option<&str>) -> String {
match self.execute_command_with_env(text, context).await {
Ok(processed) => {
if processed.is_empty() {
tracing::warn!(
Expand All @@ -64,13 +66,32 @@ impl PostProcessor {
}
}

async fn execute_command(&self, text: &str) -> Result<String, PostProcessError> {
// Spawn command via shell for proper parsing of complex commands
let mut child = Command::new("sh")
.args(["-c", &self.command])
/// Process text through the external command
///
/// Returns the processed text on success, or the original text on any failure.
/// This ensures voice-to-text always produces output even when post-processing fails.
pub async fn process(&self, text: &str) -> String {
self.process_with_context(text, None).await
}

async fn execute_command_with_env(
&self,
text: &str,
context: Option<&str>,
) -> Result<String, PostProcessError> {
let mut cmd = Command::new("sh");
cmd.args(["-c", &self.command])
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.stderr(Stdio::piped());

// Always clear to prevent inheriting stale context from parent environment
cmd.env_remove("VOXTYPE_CONTEXT");
if let Some(ctx) = context {
cmd.env("VOXTYPE_CONTEXT", ctx);
}
Comment on lines +88 to +92
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New VOXTYPE_CONTEXT behavior isn’t covered by tests. Since this file already has async tests, consider adding coverage to assert (1) when context is None, the child does not see VOXTYPE_CONTEXT even if it exists in the parent env, and (2) when Some, the child receives the exact context value while stdin still only contains current text.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added three tests in 4637653:

  • test_context_passed_via_env_var: verifies VOXTYPE_CONTEXT is set and stdin contains only current text
  • test_no_context_env_var_when_none: verifies the env var is absent when context is None
  • test_context_env_not_inherited_from_parent: verifies stale parent env is cleared when context is None


let mut child = cmd
.spawn()
.map_err(|e| PostProcessError::SpawnFailed(e.to_string()))?;

Expand Down
Loading