feat: add Studio batch transcription workflow#1183
Closed
tomerar wants to merge 11 commits intocjpais:mainfrom
Closed
feat: add Studio batch transcription workflow#1183tomerar wants to merge 11 commits intocjpais:mainfrom
tomerar wants to merge 11 commits intocjpais:mainfrom
Conversation
Tighten the Studio workflow across backend and frontend by fixing preparation-stage cancellation, restart recovery, chunk handling, and user-facing job behavior. - make Studio cancellation work safely during audio preparation - improve preparation progress reporting for long files - reduce chunk size and add overlap trimming to improve responsiveness and boundary quality - fix sequential SRT numbering when empty chunks are skipped - recover stale running/paused jobs after restart - validate retry behavior when the source file is missing - surface Studio action errors in the UI - replace browser confirm with Tauri dialog for stop confirmation - clean up Studio labels, status text, and i18n wiring - add regression tests for SRT export, chunking, overlap trimming, and retry/source recovery behavior
- add Studio translations across supported locales - remove the unused Studio status bar from the home view - emit cleaned chunk text in transcript previews to reduce duplicate lines - extend overlap trimming coverage for longer repeated chunk prefixes
- improve recent job browsing, filtering, and selection feedback - allow loading new audio without depending on pending state - preserve active job context while preparing another file - auto-version export filenames on output collisions - complete Studio i18n keys across locales - tighten Studio store initialization and view state handling
- prevent duplicate concurrent starts for the same studio job - validate output folder and export formats before starting jobs - make studio export retries safer and avoid duplicate output generation - add graceful worker shutdown and timeout-aware dictation waiting - unify studio time/size formatting and improve runtime status messaging - add fallback handling for dropzone paths that cannot be read - complete new studio translations across locales - add focused tests for studio validation helpers and VTT export
- harden studio job lifecycle, retry behavior, and export validation - improve localized studio status, formatting, and dropzone fallback UX - align file picking with supported backend extensions across studio views - add and verify studio-focused backend tests and stable Playwright setup - polish recent jobs, output handling, and overall pre-MR readiness
Owner
|
New features are not being accepted at the moment. This was written in the PR template and Contributing.md. You cannot just drop a huge slop PR on me, if you want a feature and discuss it, we can do that in discord, DM, or GitHub discussions. Generally it's well thought out but this is impossible for me to review. If you want to make huge changes we need to work together not just have something spawn out of the blue. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces Studio, a new file-based transcription workflow for Handy.
Handy’s core experience today is optimized for live dictation: press a shortcut, speak, and have the text appear in the active app. That workflow remains fully intact.
Studio adds a second major workflow for a different but very common use case: bringing an existing audio file into Handy, transcribing it locally, reviewing progress, and exporting clean transcript files.
In practice, this expands Handy from a live speech-to-text tool into both:
This fits Handy’s current direction well:
It also matches the kinds of adjacent workflows users commonly ask for around the project: not just live microphone dictation, but also reliable local transcription for audio files they already have.
Scope
This PR focuses on introducing Studio as a complete first version of file-based transcription inside Handy.
It does not redesign or replace Handy’s existing live dictation flow. Instead, it adds a complementary workflow for local audio-file transcription, job management, and transcript export.
Why This Feature Exists
Before this PR, Handy was excellent for:
But there was no first-class workflow for:
That gap matters for real users.
Common examples include:
Studio is designed to cover exactly those cases while preserving Handy’s existing simplicity.
What Studio Adds
Studio is a dedicated transcription workspace inside Handy for existing audio files.
A user can now:
Supported output formats:
TXTSRTVTTSupported input formats currently include:
MP3WAVM4AFLACOGGProduct Flow
1. File preparation
When a file is selected, Studio creates a persisted job and inspects the media before transcription begins.
The user immediately gets:
2. Job setup
Before starting, the user chooses:
This keeps Studio task-oriented and predictable rather than turning it into a one-click fire-and-forget flow.
3. Audio normalization and chunking
The backend normalizes the source audio into a Whisper-friendly WAV and splits it into overlapping chunks.
This helps Studio handle:
4. Live progress and preview
While the job is running, Studio shows:
5. Export
When the job completes, Studio writes one or more transcript outputs to disk.
Completed jobs show:
If the same file is exported more than once, Studio versions filenames safely instead of failing on collisions.
6. Recent jobs
Studio includes a real recent-jobs workflow, not just a passive history list.
Users can:
This matters because file-based transcription is inherently a multi-run workflow.
Screenshots
Studio home
Prepared job
Running transcription
Completed job
Recent jobs
Recent job filtering
Cancelled and retryable jobs
How It Is Implemented
Backend
Studio adds a dedicated backend manager responsible for the file-transcription lifecycle.
Main backend additions:
src-tauri/src/managers/studio.rssrc-tauri/src/media/decode.rssrc-tauri/src/exporters/srt.rssrc-tauri/src/exporters/txt.rssrc-tauri/src/exporters/vtt.rssrc-tauri/src/commands/studio.rsCore backend responsibilities:
Frontend
Studio adds a dedicated React/Zustand UI flow.
Main frontend additions:
src/components/studio/StudioHome.tsxsrc/components/studio/StudioDropzone.tsxsrc/components/studio/StudioSetupCard.tsxsrc/components/studio/StudioJobView.tsxsrc/components/studio/StudioRecentList.tsxsrc/stores/studioStore.tssrc/lib/studioApi.tssrc/lib/types/studio.tsCore frontend responsibilities:
The implementation follows Handy’s existing architecture rather than introducing a separate subsystem with unrelated conventions.
Reliability and Hardening Included
This branch includes a full stabilization pass beyond the initial feature implementation.
Notable improvements included in this PR:
This matters because Studio is a long-running, stateful workflow with persistence, recovery, exports, retries, and background processing. The feature needed a full validation and hardening cycle to be ready for merge.
UX and Product Polish Included
Studio is designed to feel like a native part of Handy rather than an experimental side panel.
UX work in this PR includes:
The goal was not only to make the feature functional, but also to make it practical for repeated everyday use.
Localization
Studio is fully wired into Handy’s translation system.
This PR updates all locales with Studio-related copy, including:
This keeps Studio aligned with Handy’s multilingual direction instead of introducing a large untranslated feature area.
Why This Does Not Regress Existing Handy Behavior
This feature is additive.
It does not replace or redesign Handy’s existing real-time dictation workflow.
The current live flow still behaves as expected:
Studio is separate in the right places:
At the same time, it is integrated in the right places:
So this PR expands Handy’s scope without destabilizing what already works.
Community / Project Fit
This PR fits Handy’s public product identity very well.
Handy is described in the project README as:
Studio is a direct extension of those values.
It brings file transcription into the same local-first model instead of pushing users toward a separate cloud workflow.
It also fits the broader direction of the project:
Studio is not feature creep for its own sake. It is a natural next workflow for users who already trust Handy for offline transcription and want that same experience for existing recordings.
Validation and Verification
This branch went through a full validation cycle, including implementation, hardening, UX cleanup, localization cleanup, and verification.
Frontend verification
bun run lintbun run buildbun run check:translationsbun run test:playwrightBackend verification
cargo fmt --manifest-path src-tauri/Cargo.toml -- --checkcargo check --manifest-path src-tauri/Cargo.toml --features windows-whisper-vulkancargo test --manifest-path src-tauri/Cargo.toml --features windows-whisper-vulkan --libResult:
84 passed, 0 failedThis PR was not left at the “feature works on my machine” stage. It went through a full pass of bug fixing, flow validation, test verification, and product-level cleanup before being prepared for review.
Reviewer Notes
Suggested review order:
src-tauri/src/managers/studio.rssrc-tauri/src/media/decode.rssrc/stores/studioStore.tssrc/components/studio/*src-tauri/src/commands/studio.rssrc/lib/studioApi.tssrc/lib/types/studio.tsKey review areas:
Final Takeaway
This PR adds a major new capability to Handy:
not just “transcribe speech while I’m talking,” but also “take an existing audio file, transcribe it locally, manage the job, and export clean results.”
That makes Studio a real feature-level expansion of Handy, not just a small enhancement.
It stays aligned with the app’s core values, keeps existing flows intact, and makes Handy substantially more useful for real-world offline transcription work.