Skip to content

Add post-processing to meeting mode and dictation context#292

Open
materemias wants to merge 5 commits intomainfrom
feature/meeting-post-processing
Open

Add post-processing to meeting mode and dictation context#292
materemias wants to merge 5 commits intomainfrom
feature/meeting-post-processing

Conversation

@materemias
Copy link
Copy Markdown
Collaborator

Note: Reopened from #265, which was closed when the fork was deleted. This PR contains the same changes, rebased onto current main.

Summary

This PR adds post-processing support to two areas that previously lacked it:

1. Meeting mode post-processing (new)

Meeting mode previously skipped post-processing entirely. Now, if [output.post_process] is configured, each transcribed chunk is piped through the post-processing command before being stored in the transcript. Previous chunk text is passed as context so the LLM can maintain continuity across 30-second chunk boundaries.

Context is tracked per audio source (mic vs loopback) to prevent speaker bleed between channels. Context is cleared when a meeting stops so it doesn't leak into the next meeting.

2. Dictation mode context (new)

Regular push-to-talk dictation now passes the previous dictation's text as context to the post-processing command, if the previous dictation was within 60 seconds. This helps the LLM produce more coherent cleanup when dictating in quick succession.

The 60-second retention window is intentionally not configurable to avoid config fatigue. It covers rapid-fire dictation without leaking stale context.

Context delivery mechanism

Context is passed via the VOXTYPE_CONTEXT environment variable, keeping stdin as plain current text. This means existing post-processing scripts work unchanged. Scripts that want context can optionally read $VOXTYPE_CONTEXT.

The env var is explicitly cleared before each command invocation to prevent inheriting stale context from the parent environment.

Other changes

  • process() now delegates to process_with_context(text, None), removing ~20 lines of duplicated error-handling logic
  • Post-processing tracing logs now include changed: true/false to indicate whether the command modified the text
  • Example scripts (gemini, ollama, openai) updated to optionally use $VOXTYPE_CONTEXT
  • Docs updated (CONFIGURATION.md, USER_MANUAL.md)

Test plan

  • Configure [output.post_process] and run a meeting, verify exported transcript shows post-processed text
  • Verify post-processing failure falls back to original text
  • Verify mic and loopback chunks get independent context (no speaker bleed)
  • Verify context clears on meeting stop and doesn't leak to next meeting
  • Verify regular dictation gets context from previous dictation within 60s
  • Verify dictation context expires after 60s
  • Verify existing scripts work without modification (stdin is plain text)
  • Verify VOXTYPE_CONTEXT is not inherited from parent environment

🤖 Generated with Claude Code

materemias and others added 4 commits April 1, 2026 16:53
Meeting mode now applies the configured post-processing command to each
transcribed chunk before storing it in the transcript. Previous chunk
text is passed as context via combined stdin with separator markers,
tracked per audio source (mic/loopback) to prevent speaker bleed.

Regular dictation mode also gains context: if the last dictation was
within 2 minutes, its text is passed as context to the post-processor.

- Add process_with_context() to PostProcessor for context-aware processing
- Refactor process() to delegate to process_with_context(text, None)
- Add PostProcessor and per-source context tracking to MeetingDaemon
- Clear context on meeting stop before fallible I/O
- Add Hash derive to AudioSource for HashMap key usage
- Track last_dictation (text, timestamp) in Daemon for regular mode context

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce the recent dictation context window from 120s to 60s and log
whether context was available in post-processing tracing::info calls.

Bump whisper-rs to 0.16.0 for latest Arch compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Context was being prepended to stdin with separator markers, which broke
existing post-processing scripts that expect plain text. Now context is
passed via the VOXTYPE_CONTEXT environment variable, keeping stdin as
plain current text for backwards compatibility.

Scripts that want context can read $VOXTYPE_CONTEXT. Scripts that don't
care about context work unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch from prepending context to stdin (which broke existing scripts)
to passing it via the VOXTYPE_CONTEXT environment variable. Stdin now
always contains only the current text, keeping all scripts compatible.

- Clear VOXTYPE_CONTEXT before each command to prevent inheriting stale
  context from the parent environment
- Update example scripts (gemini, ollama, openai) to optionally read
  VOXTYPE_CONTEXT for continuity across dictations
- Fix bash newline handling: use printf -v instead of literal \n
- Add changed detection to post-processing tracing info logs
- Document VOXTYPE_CONTEXT in CONFIGURATION.md and USER_MANUAL.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 1, 2026 14:54
@materemias materemias requested a review from peteonrails as a code owner April 1, 2026 14:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds optional context-aware post-processing support to meeting mode chunk transcription and to push-to-talk dictation, enabling external cleanup commands (e.g., LLM scripts) to maintain continuity across chunks/dictations via VOXTYPE_CONTEXT.

Changes:

  • Add PostProcessor::process_with_context that passes prior text via VOXTYPE_CONTEXT while keeping stdin as current text.
  • Apply post-processing to meeting-mode segments with per-audio-source context tracking and context reset on meeting stop.
  • Track recent dictation output for up to 60 seconds and pass it as context into post-processing; update example scripts + docs to optionally use VOXTYPE_CONTEXT.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/output/post_process.rs Introduces context-aware post-processing via env var and refactors process() to delegate.
src/meeting/mod.rs Adds meeting-mode post-processing and per-source context tracking using a HashMap.
src/meeting/data.rs Derives Hash for AudioSource so it can be used as a HashMap key.
src/daemon.rs Adds recent dictation context tracking (60s) and passes it into post-processing; extends tracing logs.
examples/openai-cleanup.sh Updates prompt construction to optionally include prior dictation context.
examples/ollama-cleanup.sh Updates prompt construction to optionally include prior dictation context.
examples/gemini-cleanup.sh Updates prompt construction to optionally include prior dictation context.
docs/USER_MANUAL.md Documents VOXTYPE_CONTEXT behavior and meeting-mode per-source context.
docs/CONFIGURATION.md Documents VOXTYPE_CONTEXT behavior and references updated examples.

Comment on lines +307 to +312
let context = self.last_chunk_text.get(&source).map(|s| s.as_str());
for segment in &mut result.segments {
if !segment.text.is_empty() {
segment.text = post_processor
.process_with_context(&segment.text, context)
.await;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context borrows from self.last_chunk_text and is then held across .await calls to process_with_context. Because the method later mutates self.last_chunk_text (via insert), this is very likely to fail to compile with a borrow-across-await error. Consider copying the context into an owned String (or cloning the map value) before awaiting, and/or updating a local context variable as you process segments, then write back to the map after the awaits complete.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4637653. Cloned the context into an owned String before the async loop, using .as_deref() at the call site. The original code did compile (NLL drops the borrow before the insert), but the clone is cheap and avoids fragility if the loop body changes later.

Comment on lines +1284 to +1290
let recent_context = self.last_dictation.as_ref().and_then(|(text, when)| {
if when.elapsed() < Duration::from_secs(60) {
Some(text.as_str())
} else {
None
}
});
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recent_context is a &str borrowed from self.last_dictation and is passed into async post-processing calls (.await). The function then updates self.last_dictation afterwards, which is likely to cause a borrow-checker error (borrow of self held across await, then mutated). To avoid this, materialize context into a local owned value (e.g., clone the string into an Option<String> and use .as_deref() when calling process_with_context).

Suggested change
let recent_context = self.last_dictation.as_ref().and_then(|(text, when)| {
if when.elapsed() < Duration::from_secs(60) {
Some(text.as_str())
} else {
None
}
});
let recent_context_owned =
self.last_dictation.as_ref().and_then(|(text, when)| {
if when.elapsed() < Duration::from_secs(60) {
Some(text.clone())
} else {
None
}
});
let recent_context = recent_context_owned.as_deref();

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4637653. Cloned the context string into an owned Option<String> and use .as_deref() at all three call sites. Same reasoning as the meeting/mod.rs fix: the original compiled fine under NLL, but the owned value is more robust.

Comment on lines 1299 to 1304
let profile_processor = PostProcessor::new(&profile_config);
tracing::info!(
"Post-processing with profile: {:?}",
profile_override.as_ref().unwrap()
"Post-processing with profile: {:?}, context: {:?}",
profile_override.as_ref().unwrap(),
recent_context
);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging recent_context at info level can leak the previous dictation text into logs (potentially sensitive/PII) in addition to the current transcription. Consider logging only whether context was present (or its length), and/or moving the context value to debug tracing.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in 4637653. The info level now only logs whether context is present (has_context: true/false) and whether the text changed. Actual text content (input, context, result) is now at debug level only.

Comment on lines +88 to +92
// Always clear to prevent inheriting stale context from parent environment
cmd.env_remove("VOXTYPE_CONTEXT");
if let Some(ctx) = context {
cmd.env("VOXTYPE_CONTEXT", ctx);
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New VOXTYPE_CONTEXT behavior isn’t covered by tests. Since this file already has async tests, consider adding coverage to assert (1) when context is None, the child does not see VOXTYPE_CONTEXT even if it exists in the parent env, and (2) when Some, the child receives the exact context value while stdin still only contains current text.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added three tests in 4637653:

  • test_context_passed_via_env_var: verifies VOXTYPE_CONTEXT is set and stdin contains only current text
  • test_no_context_env_var_when_none: verifies the env var is absent when context is None
  • test_context_env_not_inherited_from_parent: verifies stale parent env is cleared when context is None

- Clone context strings before async boundaries in meeting/mod.rs and
  daemon.rs to avoid holding borrows across .await points
- Demote dictation text and context logging from info to debug level
  to prevent PII leakage in logs; info level now only logs whether
  context is present and whether text changed
- Add tests for VOXTYPE_CONTEXT env var: verify it is set when context
  is provided, absent when None, and not inherited from parent env

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants