docs(wiki): new "How-It-Works" page — accessible overview of the dictee → ORT → ONNX stack (EN+FR)
Target audience: non-developers who want to understand what happens
under the hood when they press F9. No programming background required.
Page structure:
- "What happens when you press F9?" — 1-sentence summary
- Visual stack diagram (9 layers, ASCII)
- Each layer explained with a clear role + restaurant metaphor
1. User pressing F9
2. dictee shell script (the conductor)
3. transcribe-client (the courier)
4. transcribe-daemon (the resident scribe)
5. parakeet-rs (the classroom)
6. ort (the interpreter)
7. libonnxruntime.so (the engine room)
8. Execution Provider (CPU/GPU hardware driver)
9. ONNX model (the trained brain)
- Why this layered architecture
- Table of where each layer lives on disk
- Restaurant metaphor recap
- Links to deeper technical pages
Also link to the new page from Home.md and fr-Home.md under
"I want to understand how it works".
wiki: v1.3.4 changelog (user-facing) + fix obsolete VRAM-cap claims (EN+FR)
Changelog
- Add v1.3.4 (stable) section in user-facing prose: long files now work
on every host (CPU and GPU), PTT recording duration guard rail with
clear values per backend (Canary 2:30 / Parakeet 4:30), 5-site
target-tab UI hardening, translate skip messages in 6 languages,
diarize falls back to standalone Parakeet+Sortformer when Canary is
the live-dictation backend.
- Mark v1.3.3 as historical (drop "(stable)" qualifier on v1.3.3, bump
to v1.3.4).
Obsolete VRAM-cap claims fixed across pages
- The real cap on the raw `transcribe` CLI is the ~5:20 min Parakeet-TDT
v3 ONNX attention-mask bug, not VRAM. Same error on 8 GB or 24 GB.
- Home, ASR-Backends, CLI-Reference, Troubleshooting, dev-Diarization
updated accordingly (user-facing pages: minimal jargon; Deep-Dive
pages: full technical detail).
- Parakeet-TDT-Deep-Dive: tableau VRAM théorique conservé, nouvelle
section "The real cap: ~5:20 min ONNX attention-mask bug" + section
"Theoretical VRAM caps (only relevant on ≤ 4 GB GPUs)".
Other v1.3.4 doc updates
- Home: chunked diarize marked as shipped in v1.3.4 (was "(v1.4)").
- CLI-Reference: `transcribe-diarize-batch` section bumped (was
"v1.3 final, not wired into UI yet" → "wired in dictee-transcribe
since v1.3.4").
- Keyboard-Shortcuts: default cheatsheet shortcut Ctrl+Alt+F9 →
Shift+F9 (v1.3.4 default) + historical context.
- Troubleshooting: "use dictee-transcribe (chunked auto)" placed
first in the OOM workaround, with `ffmpeg -f segment` as the
manual-split alternative.
User-facing pages reformulated to drop dev jargon (cf.
feedback-wiki-user-not-developer.md): ONNX attention-mask /
`_has_cuda_build()` / `pgrep -x` / `/proc/<pid>/comm` / `/dev/shm` /
`kpackagetool6` reduced to user-impact prose. Deep-Dive and dev-*
pages keep full technical depth.
docs(wiki): move Voice-Commands to Getting started + link from Home
The voice-commands page is what end users discover first ("how do I make a
comma appear?"), not a post-processing internal — promote it from the
Post-processing section to Getting started in both the sidebar and the
Home / Accueil index.
- _Sidebar: Voice-Commands now sits between Keyboard-Shortcuts and
GPU-Setup; removed the duplicate entry from the Post-processing block.
- Home + fr-Home: new bullet under Getting started / Premiers pas, with
a one-liner pointing to the floating cheatsheet (Ctrl+Alt+F9).
docs(wiki): add Canary-1B-Deep-Dive (EN + FR) with real benchmarks
Structure mirrors Parakeet-TDT-Deep-Dive but adapted to Canary specifics:
- encoder-decoder architecture (vs TDT transducer)
- 7 languages in the Rust port (EN/FR/DE/ES/IT/PT/UK), verified in src/canary.rs:33-47
- native translation via decoder prompt tokens (source + target lang IDs)
- no internal chunking → practical 10-15s ceiling per call
- GPU required in practice (CPU ~3× slower than Parakeet CPU)
- source lang must match audio (no auto-detect, unlike Parakeet)
Benchmarks measured locally on RTX 4070 Laptop with fresh bench runs:
- Warm latency per duration (3 / 5 / 10 / 30 / 60 / 300 s), 5 runs each
- WER FR 5.4% on MultiLingual LibriSpeech (20 clips), beats Parakeet 7.4%
- WER EN 1.8% on LibriSpeech clean (20 clips), beats Parakeet 2.0%
- Documented the 30s hallucination loop + 60s+ truncation anomalies
Added to _Sidebar.md and both Home indices under "Speech recognition".
docs(wiki): move GPU-Setup to end of Getting started section
docs(wiki): move Configuration from Reference to Getting started (after Setup-Wizard)
docs(wiki): regroup UIs under Getting started + list new pages in Home
- Sidebar: fold Plasmoid / Tray / Keyboard-Shortcuts into Getting started
(removed the dedicated "User interfaces" block to avoid duplication).
- Home (EN+FR): add Setup-Wizard + Configuration to the relevant
sections, reference them from the "Three entry paths" shortcuts,
update the outdated "English-only" footnote.
docs(wiki): rename FR pages with fr- prefix for auto-sidebar grouping
Rename all 20 French pages from <Name>-fr.md to fr-<Name>.md so they
cluster together in GitHub Wiki's alphabetical Pages list instead of
being interspersed with English pages by topic.
Updated all internal links across 41 files:
- Cross-language switchers at top of pages
- 'Étapes suivantes' / 'Next steps' link lists
- Inline links in page bodies
- _Sidebar.md FR link
Python script replaces '<Name>-fr' with 'fr-<Name>' in order of
decreasing length to avoid partial matches. No manual edits.
docs(wiki): fill wave V1-fr — Home, Installation, GPU-Setup, ASR-Backends, Parakeet
French translations of the onboarding wave (~1100 lines total):
- Home-fr.md : TOC bilingual with 3 entry paths
- Installation-fr.md : one-liner + per-distro + tarball + source
- GPU-Setup-fr.md : CUDA prereqs, driver matrix, bundled libs
- ASR-Backends-fr.md : 4-backend comparison + benchmarks
- Parakeet-TDT-Deep-Dive-fr.md : architecture, 25 langs, VRAM limits
Added EN↔FR language switcher at top of each EN page:
🌐 Language: **English** | [Français](X-fr)
And reciprocal on each FR page:
🌐 Langue : [English](X) | **Français**
Naming convention: suffix -fr (Installation-fr.md → /wiki/Installation-fr).
Internal cross-links within FR pages also use -fr suffix.
docs(wiki): scaffold 20 pages (stubs)
Initial wiki structure for dictee v1.3.0 documentation:
Getting started
- Home (landing + TOC + 3 entry paths)
- Installation (1-liner, .deb/.rpm/AUR/tarball, aarch64)
- GPU-Setup (CUDA prereqs per distro + fallback)
Speech recognition
- ASR-Backends (comparison 4 backends)
- Parakeet-TDT-Deep-Dive (main model, 25 langs, VRAM limits)
Translation
- Translation (5 backends + privacy matrix)
- Ollama-Setup (install + Gemma 3 4B + prompts)
User interfaces
- Plasmoid-Widget (KDE Plasma 6 + 5 animations)
- Tray-Icon (PyQt6 + SNI + AppIndicator fallback)
- Keyboard-Shortcuts (KDE/GNOME/tiling WMs + double tap)
Post-processing pipeline
- Post-Processing-Overview (12 steps + diagram)
- Rules-and-Dictionary (regex + dict + 7 langs)
- LLM-Correction (first/last/hybrid positions)
- Numbers-Dates-Continuation (cardinal/ordinal/version/buffer)
Diarization & CLI
- Diarization (Sortformer + 4 speakers + v1.4 batch)
- CLI-Reference (every command + env vars + exit codes)
Reference
- Troubleshooting (errors by category + log locations)
- FAQ (design + features + comparisons)
- Developer-Guide (build + tests + i18n + contrib)
- Changelog (version history)
All 20 pages are stubs: H1 title + intro + TOC + minimal
reference table. Content will be filled in 5 waves (V1-V5)
per docs/plan-readme-wiki-v1.3.md.