Add post-processing to meeting mode and dictation context#292
Add post-processing to meeting mode and dictation context#292materemias wants to merge 5 commits intomainfrom
Conversation
Meeting mode now applies the configured post-processing command to each transcribed chunk before storing it in the transcript. Previous chunk text is passed as context via combined stdin with separator markers, tracked per audio source (mic/loopback) to prevent speaker bleed. Regular dictation mode also gains context: if the last dictation was within 2 minutes, its text is passed as context to the post-processor. - Add process_with_context() to PostProcessor for context-aware processing - Refactor process() to delegate to process_with_context(text, None) - Add PostProcessor and per-source context tracking to MeetingDaemon - Clear context on meeting stop before fallible I/O - Add Hash derive to AudioSource for HashMap key usage - Track last_dictation (text, timestamp) in Daemon for regular mode context Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce the recent dictation context window from 120s to 60s and log whether context was available in post-processing tracing::info calls. Bump whisper-rs to 0.16.0 for latest Arch compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Context was being prepended to stdin with separator markers, which broke existing post-processing scripts that expect plain text. Now context is passed via the VOXTYPE_CONTEXT environment variable, keeping stdin as plain current text for backwards compatibility. Scripts that want context can read $VOXTYPE_CONTEXT. Scripts that don't care about context work unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch from prepending context to stdin (which broke existing scripts) to passing it via the VOXTYPE_CONTEXT environment variable. Stdin now always contains only the current text, keeping all scripts compatible. - Clear VOXTYPE_CONTEXT before each command to prevent inheriting stale context from the parent environment - Update example scripts (gemini, ollama, openai) to optionally read VOXTYPE_CONTEXT for continuity across dictations - Fix bash newline handling: use printf -v instead of literal \n - Add changed detection to post-processing tracing info logs - Document VOXTYPE_CONTEXT in CONFIGURATION.md and USER_MANUAL.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds optional context-aware post-processing support to meeting mode chunk transcription and to push-to-talk dictation, enabling external cleanup commands (e.g., LLM scripts) to maintain continuity across chunks/dictations via VOXTYPE_CONTEXT.
Changes:
- Add
PostProcessor::process_with_contextthat passes prior text viaVOXTYPE_CONTEXTwhile keeping stdin as current text. - Apply post-processing to meeting-mode segments with per-audio-source context tracking and context reset on meeting stop.
- Track recent dictation output for up to 60 seconds and pass it as context into post-processing; update example scripts + docs to optionally use
VOXTYPE_CONTEXT.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/output/post_process.rs |
Introduces context-aware post-processing via env var and refactors process() to delegate. |
src/meeting/mod.rs |
Adds meeting-mode post-processing and per-source context tracking using a HashMap. |
src/meeting/data.rs |
Derives Hash for AudioSource so it can be used as a HashMap key. |
src/daemon.rs |
Adds recent dictation context tracking (60s) and passes it into post-processing; extends tracing logs. |
examples/openai-cleanup.sh |
Updates prompt construction to optionally include prior dictation context. |
examples/ollama-cleanup.sh |
Updates prompt construction to optionally include prior dictation context. |
examples/gemini-cleanup.sh |
Updates prompt construction to optionally include prior dictation context. |
docs/USER_MANUAL.md |
Documents VOXTYPE_CONTEXT behavior and meeting-mode per-source context. |
docs/CONFIGURATION.md |
Documents VOXTYPE_CONTEXT behavior and references updated examples. |
src/meeting/mod.rs
Outdated
| let context = self.last_chunk_text.get(&source).map(|s| s.as_str()); | ||
| for segment in &mut result.segments { | ||
| if !segment.text.is_empty() { | ||
| segment.text = post_processor | ||
| .process_with_context(&segment.text, context) | ||
| .await; |
There was a problem hiding this comment.
context borrows from self.last_chunk_text and is then held across .await calls to process_with_context. Because the method later mutates self.last_chunk_text (via insert), this is very likely to fail to compile with a borrow-across-await error. Consider copying the context into an owned String (or cloning the map value) before awaiting, and/or updating a local context variable as you process segments, then write back to the map after the awaits complete.
There was a problem hiding this comment.
Fixed in 4637653. Cloned the context into an owned String before the async loop, using .as_deref() at the call site. The original code did compile (NLL drops the borrow before the insert), but the clone is cheap and avoids fragility if the loop body changes later.
| let recent_context = self.last_dictation.as_ref().and_then(|(text, when)| { | ||
| if when.elapsed() < Duration::from_secs(60) { | ||
| Some(text.as_str()) | ||
| } else { | ||
| None | ||
| } | ||
| }); |
There was a problem hiding this comment.
recent_context is a &str borrowed from self.last_dictation and is passed into async post-processing calls (.await). The function then updates self.last_dictation afterwards, which is likely to cause a borrow-checker error (borrow of self held across await, then mutated). To avoid this, materialize context into a local owned value (e.g., clone the string into an Option<String> and use .as_deref() when calling process_with_context).
| let recent_context = self.last_dictation.as_ref().and_then(|(text, when)| { | |
| if when.elapsed() < Duration::from_secs(60) { | |
| Some(text.as_str()) | |
| } else { | |
| None | |
| } | |
| }); | |
| let recent_context_owned = | |
| self.last_dictation.as_ref().and_then(|(text, when)| { | |
| if when.elapsed() < Duration::from_secs(60) { | |
| Some(text.clone()) | |
| } else { | |
| None | |
| } | |
| }); | |
| let recent_context = recent_context_owned.as_deref(); |
There was a problem hiding this comment.
Fixed in 4637653. Cloned the context string into an owned Option<String> and use .as_deref() at all three call sites. Same reasoning as the meeting/mod.rs fix: the original compiled fine under NLL, but the owned value is more robust.
| let profile_processor = PostProcessor::new(&profile_config); | ||
| tracing::info!( | ||
| "Post-processing with profile: {:?}", | ||
| profile_override.as_ref().unwrap() | ||
| "Post-processing with profile: {:?}, context: {:?}", | ||
| profile_override.as_ref().unwrap(), | ||
| recent_context | ||
| ); |
There was a problem hiding this comment.
Logging recent_context at info level can leak the previous dictation text into logs (potentially sensitive/PII) in addition to the current transcription. Consider logging only whether context was present (or its length), and/or moving the context value to debug tracing.
There was a problem hiding this comment.
Good catch. Fixed in 4637653. The info level now only logs whether context is present (has_context: true/false) and whether the text changed. Actual text content (input, context, result) is now at debug level only.
| // Always clear to prevent inheriting stale context from parent environment | ||
| cmd.env_remove("VOXTYPE_CONTEXT"); | ||
| if let Some(ctx) = context { | ||
| cmd.env("VOXTYPE_CONTEXT", ctx); | ||
| } |
There was a problem hiding this comment.
New VOXTYPE_CONTEXT behavior isn’t covered by tests. Since this file already has async tests, consider adding coverage to assert (1) when context is None, the child does not see VOXTYPE_CONTEXT even if it exists in the parent env, and (2) when Some, the child receives the exact context value while stdin still only contains current text.
There was a problem hiding this comment.
Added three tests in 4637653:
test_context_passed_via_env_var: verifiesVOXTYPE_CONTEXTis set and stdin contains only current texttest_no_context_env_var_when_none: verifies the env var is absent when context isNonetest_context_env_not_inherited_from_parent: verifies stale parent env is cleared when context isNone
- Clone context strings before async boundaries in meeting/mod.rs and daemon.rs to avoid holding borrows across .await points - Demote dictation text and context logging from info to debug level to prevent PII leakage in logs; info level now only logs whether context is present and whether text changed - Add tests for VOXTYPE_CONTEXT env var: verify it is set when context is provided, absent when None, and not inherited from parent env Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
This PR adds post-processing support to two areas that previously lacked it:
1. Meeting mode post-processing (new)
Meeting mode previously skipped post-processing entirely. Now, if
[output.post_process]is configured, each transcribed chunk is piped through the post-processing command before being stored in the transcript. Previous chunk text is passed as context so the LLM can maintain continuity across 30-second chunk boundaries.Context is tracked per audio source (mic vs loopback) to prevent speaker bleed between channels. Context is cleared when a meeting stops so it doesn't leak into the next meeting.
2. Dictation mode context (new)
Regular push-to-talk dictation now passes the previous dictation's text as context to the post-processing command, if the previous dictation was within 60 seconds. This helps the LLM produce more coherent cleanup when dictating in quick succession.
The 60-second retention window is intentionally not configurable to avoid config fatigue. It covers rapid-fire dictation without leaking stale context.
Context delivery mechanism
Context is passed via the
VOXTYPE_CONTEXTenvironment variable, keeping stdin as plain current text. This means existing post-processing scripts work unchanged. Scripts that want context can optionally read$VOXTYPE_CONTEXT.The env var is explicitly cleared before each command invocation to prevent inheriting stale context from the parent environment.
Other changes
process()now delegates toprocess_with_context(text, None), removing ~20 lines of duplicated error-handling logicchanged: true/falseto indicate whether the command modified the text$VOXTYPE_CONTEXTTest plan
[output.post_process]and run a meeting, verify exported transcript shows post-processed textVOXTYPE_CONTEXTis not inherited from parent environment🤖 Generated with Claude Code