Diarization: note the 4-speaker limit should be lifted in v1.4 (EN+FR)
fc81e0e
docs(troubleshooting): document Wayland clipboard / Electron typing issue (EN+FR)
New section in both Troubleshooting pages explaining why enabling the
"Copy transcription to clipboard" option breaks F9 typing into Electron apps
on Wayland with wl-clipboard < 2.3.0, how to fix (upgrade or keep the toggle
off), and the GNOME caveat (mutter#524, ext-data-control still missing in
stable Mutter).
9810199
docs(shortcuts): document DICTEE_PTT_EXTRA_DEVICES (#10) — EN+FR
New section under the dictee-ptt daemon: 'Triggering dictation with a
mouse button or a custom key'. Covers the use case (logiops, keyd,
kanata, xremap, input-remapper), both the GUI path (dictee-setup
Shortcuts tab → Detect…) and the manual path (editing dictee.conf).
User-facing tone, no jargon, available since v1.3.5.
64f0a7f
docs(wiki): new "How-It-Works" page — accessible overview of the dictee → ORT → ONNX stack (EN+FR)
Target audience: non-developers who want to understand what happens
under the hood when they press F9. No programming background required.
Page structure:
- "What happens when you press F9?" — 1-sentence summary
- Visual stack diagram (9 layers, ASCII)
- Each layer explained with a clear role + restaurant metaphor
1. User pressing F9
2. dictee shell script (the conductor)
3. transcribe-client (the courier)
4. transcribe-daemon (the resident scribe)
5. parakeet-rs (the classroom)
6. ort (the interpreter)
7. libonnxruntime.so (the engine room)
8. Execution Provider (CPU/GPU hardware driver)
9. ONNX model (the trained brain)
- Why this layered architecture
- Table of where each layer lives on disk
- Restaurant metaphor recap
- Links to deeper technical pages
Also link to the new page from Home.md and fr-Home.md under
"I want to understand how it works".
cc9bb0d
docs(parakeet): document int8 quantization variant (EN+FR)
- CLI-Reference: add DICTEE_PARAKEET_QUANT to Runtime env vars table
- Parakeet-TDT-Deep-Dive: new "Quantization choice — FP32 vs int8" section
with measured comparison table (latency, VRAM, WER) and hardware
recommendations.
Measured on i7-13700H + RTX 4070 (audio 16s, daemon preloaded, median 7 runs):
- CPU int8 is 34% faster than CPU FP32 (AVX-VNNI exploited)
- GPU int8 is 6× slower than GPU FP32 (current ONNX Runtime CUDA EP limitation)
- GPU VRAM int8 is 87% smaller than FP32 (useful when VRAM is constrained)
a65a303
docs(cli): document DICTEE_INTRA_THREADS env var (EN+FR)
6ba5653
wiki: v1.3.4 changelog (user-facing) + fix obsolete VRAM-cap claims (EN+FR)
Changelog
- Add v1.3.4 (stable) section in user-facing prose: long files now work
on every host (CPU and GPU), PTT recording duration guard rail with
clear values per backend (Canary 2:30 / Parakeet 4:30), 5-site
target-tab UI hardening, translate skip messages in 6 languages,
diarize falls back to standalone Parakeet+Sortformer when Canary is
the live-dictation backend.
- Mark v1.3.3 as historical (drop "(stable)" qualifier on v1.3.3, bump
to v1.3.4).
Obsolete VRAM-cap claims fixed across pages
- The real cap on the raw `transcribe` CLI is the ~5:20 min Parakeet-TDT
v3 ONNX attention-mask bug, not VRAM. Same error on 8 GB or 24 GB.
- Home, ASR-Backends, CLI-Reference, Troubleshooting, dev-Diarization
updated accordingly (user-facing pages: minimal jargon; Deep-Dive
pages: full technical detail).
- Parakeet-TDT-Deep-Dive: tableau VRAM théorique conservé, nouvelle
section "The real cap: ~5:20 min ONNX attention-mask bug" + section
"Theoretical VRAM caps (only relevant on ≤ 4 GB GPUs)".
Other v1.3.4 doc updates
- Home: chunked diarize marked as shipped in v1.3.4 (was "(v1.4)").
- CLI-Reference: `transcribe-diarize-batch` section bumped (was
"v1.3 final, not wired into UI yet" → "wired in dictee-transcribe
since v1.3.4").
- Keyboard-Shortcuts: default cheatsheet shortcut Ctrl+Alt+F9 →
Shift+F9 (v1.3.4 default) + historical context.
- Troubleshooting: "use dictee-transcribe (chunked auto)" placed
first in the OOM workaround, with `ffmpeg -f segment` as the
manual-split alternative.
User-facing pages reformulated to drop dev jargon (cf.
feedback-wiki-user-not-developer.md): ONNX attention-mask /
`_has_cuda_build()` / `pgrep -x` / `/proc/<pid>/comm` / `/dev/shm` /
`kpackagetool6` reduced to user-impact prose. Deep-Dive and dev-*
pages keep full technical depth.
3f62c41
v1.3.3 — bump install commands + add changelog entries (EN+FR)
fd16d62
wiki: bump v1.3.1 → v1.3.2, add v1.3.2 changelog entry (EN+FR), update Installation pages
28b65ab
wiki: fix v1.3.1 lede — phrasing implied parallel runs (we actually block them)
72712a2
wiki: v1.3.1 — Changelog, Troubleshooting (CUDA fallback + uinput Fedora), bumped install URLs
d9741b6
v1.3.0 stable: refresh Installation/Troubleshooting/Changelog, drop obsolete v1.3 roadmap
077878d
ASR-Backends: note that Nemotron is no longer in setup since v1.3
0329638
Translation: add 'Translating an audio file (dictee-transcribe)' section
b243578
Diarization: expand 'Going further with LLM' section with screenshots
0085a17
Diarization: refresh export section for single-tab dialog (1.3.0)
986acf2
Diarization: illustrate the Threshold slider in the explanation box
385181d
Diarization: illustrate the audio/text sync toggles section
a4c1826
Diarization: explain the 3 audio/text sync toggles
Adds a 'Sync audio and text' subsection between the timeline tips
and the Rename speakers section, documenting the three toggles
that sit under the Rename accordion header:
- Follow playback in text (text cursor follows audio)
- Auto-play on text click (clicking a segment seeks audio)
- Highlight current segment (underline the segment being played)
A 3-row table shows what each does and when to turn it on.
Both EN and FR pages updated.
f8f122c
_Sidebar: move Diarization + LLM-Diarization under Getting started
The dedicated 'Diarization / Diarisation' section had only two
entries — folding them into 'Getting started / Premiers pas'
makes the sidebar shorter and surfaces both pages directly to
new users following the standard onboarding path.
41fddec
Diarization: drop 'braille spinner' jargon, keep prose lighter
The braille pattern reference (⠋⠙⠹⠸…) was technical noise for
end users. Replaced with a parenthetical 'with a spinner
turning in the tab title' that flows better.
Both EN and FR pages updated.
94529f8
Diarization: explain the Threshold slider with onset/offset semantics
Replaces the one-line aside with a dedicated box that maps the
slider position to the actual model thresholds (onset/offset),
gives a 3-row recommendation table (low / 50% / high) with
real-world use cases, and reminds readers that re-running with
a new value spawns a comparable tab (title contains the value).
Both EN and FR pages updated.
df780c5
Diarization: correct prev/next segment glyphs to match the actual buttons
The wiki had mistakenly used ◀ / ▶ (single triangles, the
seek-to-start / end buttons in the player toolbar) for the
prev/next segment buttons. The actual glyphs in the code
(dictee-transcribe.py:2092, 2098) are ⏮ / ⏭ — double-arrow
with vertical bar.
cadc89f
Diarization: illustrate Export section + mention multi-tab export
125a81c
Diarization: illustrate timeline + rename sections with screenshots
Adds two PNG screenshots inline to break up the wall of text and
make the visual cues (coloured bands, red triangle, Rename
accordion layout) immediately recognisable:
- diarization-timeline_1.3.png: zoom on the audio player timeline
with per-speaker coloured bands and the red playback triangle.
- diarization-rename_1.3.png: the Rename speakers accordion with
the colour swatches, name fields and Apply / Reset buttons.
EN and FR pages updated symmetrically.
cba39c7
Diarization: replace static screenshot with animated demo, refresh prose
- Static diarization-1_1.3.png replaced by an animated GIF
(diarization-demo_1.3.gif, 38 frames @ 1 fps, 1569×1273) so
readers see the full Transcribe → diarize → rename → playback
flow at a glance.
- Prose rewritten to be more user-friendly and reflect recent
improvements:
* mention the violet Transcribe button (new layout)
* mention the per-tab braille spinner during work
* mention the red triangle playback cursor and prev/next segment
buttons on the timeline
* dedicated 'Export' section listing the three formats and
confirming rename propagation
* dedicated 'LLM analysis' teaser pointing to the LLM page,
with the available backends listed
* remove the dated 'used to cap at ~10 min' wording — chunked
pipeline is the current normal
0cf7bba
LLM-Diarization: document Disable thinking + cancellation
Two recently shipped features missing from the wiki:
- "Disable thinking" checkbox (commit 480000a): for reasoning
models (qwen3, deepseek-r1, gpt-oss). Visible for Ollama only,
next to Context window. Strips the <think>...</think> preamble.
- Cooperative cancellation (commit af1d9d3): closing the LLM
result tab while the model is generating now aborts the live
HTTP stream — saves cloud tokens, frees the GPU on Ollama.
Both EN and FR pages updated.
ac4e10a
LLM-Diarization: keep VRAM tip generic (no Gemma 3 4B / 8 GB hardcoded)
The "run diarization first, then LLM" tip was specific to Gemma 3
4B on an 8 GB GPU. The same advice applies to any local LLM on
any size GPU shared with the transcription pipeline — generalised
the wording so users with different setups don't think it doesn't
apply to them.
Both EN and FR pages updated.
7478cbb
LLM-Diarization: document context window per backend
The 'Context window' field added to the provider editor in dictee
is Ollama-only — for LM Studio / Jan / vLLM / cloud backends, the
limit must be raised in the server's own UI/CLI or is fixed by
the model.
Adds a per-backend table after 'Edit / delete' explaining where to
set Context Length so users with non-Ollama backends don't think
the dictee field has any effect for them.
Both EN and FR pages updated.
b6fd42b
docs(wiki): split Diarization and CLI into separate sidebar sections
d67f0c4