Skip to content

feat: add Studio batch transcription workflow#1183

Closed
tomerar wants to merge 11 commits intocjpais:mainfrom
tomerar:feat/studio-integration
Closed

feat: add Studio batch transcription workflow#1183
tomerar wants to merge 11 commits intocjpais:mainfrom
tomerar:feat/studio-integration

Conversation

@tomerar
Copy link
Copy Markdown
Contributor

@tomerar tomerar commented Mar 28, 2026

Summary

This PR introduces Studio, a new file-based transcription workflow for Handy.

Handy’s core experience today is optimized for live dictation: press a shortcut, speak, and have the text appear in the active app. That workflow remains fully intact.

Studio adds a second major workflow for a different but very common use case: bringing an existing audio file into Handy, transcribing it locally, reviewing progress, and exporting clean transcript files.

In practice, this expands Handy from a live speech-to-text tool into both:

  • a fast offline dictation tool
  • and a practical offline transcription workspace for existing recordings

This fits Handy’s current direction well:

  • offline
  • private
  • local-first
  • simple to use
  • extensible enough for community-driven growth

It also matches the kinds of adjacent workflows users commonly ask for around the project: not just live microphone dictation, but also reliable local transcription for audio files they already have.


Scope

This PR focuses on introducing Studio as a complete first version of file-based transcription inside Handy.

It does not redesign or replace Handy’s existing live dictation flow. Instead, it adds a complementary workflow for local audio-file transcription, job management, and transcript export.


Why This Feature Exists

Before this PR, Handy was excellent for:

  • quick dictation
  • shortcut-based speech capture
  • pasting transcribed text directly into another app

But there was no first-class workflow for:

  • transcribing an audio file that already exists
  • working with longer recordings
  • exporting transcript files to disk
  • revisiting and retrying previous transcription jobs
  • comparing repeated runs of the same file
  • managing file-based transcription history inside Handy itself

That gap matters for real users.

Common examples include:

  • podcasts
  • meetings
  • lectures
  • interviews
  • voice notes
  • songs and lyric captures
  • spoken screen recordings
  • longer offline recordings that are not part of a live dictation session

Studio is designed to cover exactly those cases while preserving Handy’s existing simplicity.


What Studio Adds

Studio is a dedicated transcription workspace inside Handy for existing audio files.

A user can now:

  1. Open the Studio section from the sidebar
  2. Choose or drag-and-drop an audio file
  3. Review file details before starting
  4. Select an output folder
  5. Select one or more output formats
  6. Start transcription
  7. Watch live progress and transcript preview
  8. Cancel or retry when needed
  9. Re-open recent jobs
  10. Open exported transcript files from completed jobs

Supported output formats:

  • TXT
  • SRT
  • VTT

Supported input formats currently include:

  • MP3
  • WAV
  • M4A
  • FLAC
  • OGG

Product Flow

1. File preparation

When a file is selected, Studio creates a persisted job and inspects the media before transcription begins.

The user immediately gets:

  • file name
  • duration
  • size
  • format/container details
  • estimated processing time
  • a setup screen before committing to the run

2. Job setup

Before starting, the user chooses:

  • where output files will be saved
  • which export formats to generate

This keeps Studio task-oriented and predictable rather than turning it into a one-click fire-and-forget flow.

3. Audio normalization and chunking

The backend normalizes the source audio into a Whisper-friendly WAV and splits it into overlapping chunks.

This helps Studio handle:

  • longer recordings
  • different codecs and containers
  • more stable progress tracking
  • safer retry and recovery behavior

4. Live progress and preview

While the job is running, Studio shows:

  • current processing stage
  • chunk progress
  • transcript preview as chunks complete
  • clear cancelled/error states when something goes wrong

5. Export

When the job completes, Studio writes one or more transcript outputs to disk.

Completed jobs show:

  • output files
  • final status
  • transcript preview
  • quick access to the output folder

If the same file is exported more than once, Studio versions filenames safely instead of failing on collisions.

6. Recent jobs

Studio includes a real recent-jobs workflow, not just a passive history list.

Users can:

  • revisit jobs
  • load prepared jobs again
  • inspect completed jobs
  • retry failed/cancelled jobs
  • filter by status
  • delete one or many jobs
  • keep history without being forced to remove older work

This matters because file-based transcription is inherently a multi-run workflow.


Screenshots

Studio home

  • Main Studio landing screen
  • New file-based transcription workflow inside Handy
  • Dedicated entry point for batch and offline transcription jobs

Studio home

Prepared job

  • File selected and prepared before transcription starts
  • Output folder and export formats can be configured before running
  • Studio shows file details and estimated processing time up front

Prepared job

Running transcription

  • Studio surfaces both preparation stages and active transcription stages
  • Progress remains visible throughout decoding, chunking, and transcription
  • The transcript preview begins filling in as chunks complete

Running transcription

Running transcription with preview

Completed job

  • Finished jobs show the final transcript preview and completion state
  • Generated output files are listed directly in the Studio UI
  • Users can immediately open the output folder after transcription finishes

Completed job

Recent jobs

  • Studio keeps a browsable history of recent transcription jobs
  • Users can revisit prepared and completed work directly from the same screen
  • This makes repeated file-based workflows practical instead of one-time only

Recent jobs

Recent job filtering

  • Recent jobs can be filtered by status without leaving the Studio workflow
  • This helps users focus on ready or completed work without deleting job history
  • Filtering stays lightweight and integrated into the existing recent-jobs panel

Recent job filtering

Cancelled and retryable jobs

  • Cancelled jobs remain visible in Studio instead of disappearing from context
  • Users can retry interrupted work directly from the job view or recent-jobs list
  • This makes Studio more resilient for longer workflows and unexpected interruptions

Cancelled and retryable jobs


How It Is Implemented

Backend

Studio adds a dedicated backend manager responsible for the file-transcription lifecycle.

Main backend additions:

  • src-tauri/src/managers/studio.rs
  • src-tauri/src/media/decode.rs
  • src-tauri/src/exporters/srt.rs
  • src-tauri/src/exporters/txt.rs
  • src-tauri/src/exporters/vtt.rs
  • src-tauri/src/commands/studio.rs

Core backend responsibilities:

  • SQLite persistence for jobs, chunks, and exports
  • media probing and decode/normalize pipeline
  • chunk planning and tracking
  • per-job progress events
  • transcript preview events
  • output export generation
  • retry / cancel / recovery behavior
  • recent-jobs state from persisted data

Frontend

Studio adds a dedicated React/Zustand UI flow.

Main frontend additions:

  • src/components/studio/StudioHome.tsx
  • src/components/studio/StudioDropzone.tsx
  • src/components/studio/StudioSetupCard.tsx
  • src/components/studio/StudioJobView.tsx
  • src/components/studio/StudioRecentList.tsx
  • src/stores/studioStore.ts
  • src/lib/studioApi.ts
  • src/lib/types/studio.ts

Core frontend responsibilities:

  • preparing files
  • starting jobs
  • reacting to backend events
  • rendering progress and preview
  • browsing recent jobs
  • filtering recent jobs by status
  • loading previous jobs back into the UI
  • retry / delete / clear-all flows
  • formatting time, size, and estimate data consistently

The implementation follows Handy’s existing architecture rather than introducing a separate subsystem with unrelated conventions.


Reliability and Hardening Included

This branch includes a full stabilization pass beyond the initial feature implementation.

Notable improvements included in this PR:

  • prevent duplicate concurrent starts for the same Studio job
  • validate output folders and export formats before starting jobs
  • improve retry behavior so retries are safer and more deterministic
  • avoid duplicate output generation after interrupted export flows
  • track worker lifecycle more carefully
  • improve shutdown behavior on app exit
  • handle output filename collisions by versioning instead of failing
  • improve transcript preview behavior during chunked transcription
  • improve recent-jobs selection and browsing flow
  • improve recovery behavior for interrupted jobs
  • improve drag-and-drop fallback behavior when the path is unavailable
  • align file-picking with backend-supported extensions
  • fix Playwright web-server startup on Windows so verification remains reliable

This matters because Studio is a long-running, stateful workflow with persistence, recovery, exports, retries, and background processing. The feature needed a full validation and hardening cycle to be ready for merge.


UX and Product Polish Included

Studio is designed to feel like a native part of Handy rather than an experimental side panel.

UX work in this PR includes:

  • dedicated sidebar entry
  • dedicated Studio landing page
  • drag-and-drop plus file chooser
  • setup screen before starting a job
  • live progress view
  • transcript preview view
  • recent-jobs browser
  • recent-jobs filtering
  • bulk clear flow with confirmation
  • load-from-recent feedback
  • ability to choose a new file even when another job already exists
  • clearer retry/cancel behavior
  • open-output-folder actions
  • better handling for repeated runs of the same source file

The goal was not only to make the feature functional, but also to make it practical for repeated everyday use.


Localization

Studio is fully wired into Handy’s translation system.

This PR updates all locales with Studio-related copy, including:

  • page labels
  • setup copy
  • status copy
  • recent-jobs copy
  • drag-and-drop messaging
  • retry / cancel / warning copy
  • output and preview labels

This keeps Studio aligned with Handy’s multilingual direction instead of introducing a large untranslated feature area.


Why This Does Not Regress Existing Handy Behavior

This feature is additive.

It does not replace or redesign Handy’s existing real-time dictation workflow.

The current live flow still behaves as expected:

  • start/stop by shortcut
  • microphone capture
  • local transcription
  • paste into the active app

Studio is separate in the right places:

  • separate manager
  • separate commands
  • separate events
  • separate UI/store flow
  • separate persistence tables

At the same time, it is integrated in the right places:

  • model/settings integration
  • app shell/navigation
  • existing project architecture
  • existing translation system
  • existing build/test flow

So this PR expands Handy’s scope without destabilizing what already works.


Community / Project Fit

This PR fits Handy’s public product identity very well.

Handy is described in the project README as:

  • free
  • open source
  • private
  • offline
  • simple
  • extensible

Studio is a direct extension of those values.

It brings file transcription into the same local-first model instead of pushing users toward a separate cloud workflow.

It also fits the broader direction of the project:

  • Handy is positioned as something people can build on
  • this expands the app into a highly requested adjacent workflow
  • the feature stays aligned with privacy, offline processing, and product simplicity

Studio is not feature creep for its own sake. It is a natural next workflow for users who already trust Handy for offline transcription and want that same experience for existing recordings.


Validation and Verification

This branch went through a full validation cycle, including implementation, hardening, UX cleanup, localization cleanup, and verification.

Frontend verification

  • bun run lint
  • bun run build
  • bun run check:translations
  • bun run test:playwright

Backend verification

  • cargo fmt --manifest-path src-tauri/Cargo.toml -- --check
  • cargo check --manifest-path src-tauri/Cargo.toml --features windows-whisper-vulkan
  • cargo test --manifest-path src-tauri/Cargo.toml --features windows-whisper-vulkan --lib

Result:

  • 84 passed, 0 failed

This PR was not left at the “feature works on my machine” stage. It went through a full pass of bug fixing, flow validation, test verification, and product-level cleanup before being prepared for review.


Reviewer Notes

Suggested review order:

  1. src-tauri/src/managers/studio.rs
  2. src-tauri/src/media/decode.rs
  3. src/stores/studioStore.ts
  4. src/components/studio/*
  5. src-tauri/src/commands/studio.rs
  6. src/lib/studioApi.ts
  7. src/lib/types/studio.ts
  8. i18n additions

Key review areas:

  • job lifecycle correctness
  • retry / cancel / recovery behavior
  • output/export behavior
  • event wiring between backend and frontend
  • recent-jobs workflow
  • isolation from existing dictation flows

Final Takeaway

This PR adds a major new capability to Handy:

not just “transcribe speech while I’m talking,” but also “take an existing audio file, transcribe it locally, manage the job, and export clean results.”

That makes Studio a real feature-level expansion of Handy, not just a small enhancement.

It stays aligned with the app’s core values, keeps existing flows intact, and makes Handy substantially more useful for real-world offline transcription work.

tomerar added 11 commits March 27, 2026 01:10
Tighten the Studio workflow across backend and frontend by fixing
preparation-stage cancellation, restart recovery, chunk handling, and
user-facing job behavior.

- make Studio cancellation work safely during audio preparation
- improve preparation progress reporting for long files
- reduce chunk size and add overlap trimming to improve responsiveness
  and boundary quality
- fix sequential SRT numbering when empty chunks are skipped
- recover stale running/paused jobs after restart
- validate retry behavior when the source file is missing
- surface Studio action errors in the UI
- replace browser confirm with Tauri dialog for stop confirmation
- clean up Studio labels, status text, and i18n wiring
- add regression tests for SRT export, chunking, overlap trimming,
  and retry/source recovery behavior
- add Studio translations across supported locales
- remove the unused Studio status bar from the home view
- emit cleaned chunk text in transcript previews to reduce duplicate lines
- extend overlap trimming coverage for longer repeated chunk prefixes
- improve recent job browsing, filtering, and selection feedback
- allow loading new audio without depending on pending state
- preserve active job context while preparing another file
- auto-version export filenames on output collisions
- complete Studio i18n keys across locales
- tighten Studio store initialization and view state handling
- prevent duplicate concurrent starts for the same studio job
- validate output folder and export formats before starting jobs
- make studio export retries safer and avoid duplicate output generation
- add graceful worker shutdown and timeout-aware dictation waiting
- unify studio time/size formatting and improve runtime status messaging
- add fallback handling for dropzone paths that cannot be read
- complete new studio translations across locales
- add focused tests for studio validation helpers and VTT export
- harden studio job lifecycle, retry behavior, and export validation
- improve localized studio status, formatting, and dropzone fallback UX
- align file picking with supported backend extensions across studio views
- add and verify studio-focused backend tests and stable Playwright setup
- polish recent jobs, output handling, and overall pre-MR readiness
@cjpais cjpais closed this Mar 28, 2026
@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Mar 28, 2026

New features are not being accepted at the moment. This was written in the PR template and Contributing.md.

You cannot just drop a huge slop PR on me, if you want a feature and discuss it, we can do that in discord, DM, or GitHub discussions.

Generally it's well thought out but this is impossible for me to review. If you want to make huge changes we need to work together not just have something spawn out of the blue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants