Skip to content

History / Home

Revisions

  • docs(wiki): new "How-It-Works" page — accessible overview of the dictee → ORT → ONNX stack (EN+FR) Target audience: non-developers who want to understand what happens under the hood when they press F9. No programming background required. Page structure: - "What happens when you press F9?" — 1-sentence summary - Visual stack diagram (9 layers, ASCII) - Each layer explained with a clear role + restaurant metaphor 1. User pressing F9 2. dictee shell script (the conductor) 3. transcribe-client (the courier) 4. transcribe-daemon (the resident scribe) 5. parakeet-rs (the classroom) 6. ort (the interpreter) 7. libonnxruntime.so (the engine room) 8. Execution Provider (CPU/GPU hardware driver) 9. ONNX model (the trained brain) - Why this layered architecture - Table of where each layer lives on disk - Restaurant metaphor recap - Links to deeper technical pages Also link to the new page from Home.md and fr-Home.md under "I want to understand how it works".

    @rcspam rcspam committed May 15, 2026
  • wiki: v1.3.4 changelog (user-facing) + fix obsolete VRAM-cap claims (EN+FR) Changelog - Add v1.3.4 (stable) section in user-facing prose: long files now work on every host (CPU and GPU), PTT recording duration guard rail with clear values per backend (Canary 2:30 / Parakeet 4:30), 5-site target-tab UI hardening, translate skip messages in 6 languages, diarize falls back to standalone Parakeet+Sortformer when Canary is the live-dictation backend. - Mark v1.3.3 as historical (drop "(stable)" qualifier on v1.3.3, bump to v1.3.4). Obsolete VRAM-cap claims fixed across pages - The real cap on the raw `transcribe` CLI is the ~5:20 min Parakeet-TDT v3 ONNX attention-mask bug, not VRAM. Same error on 8 GB or 24 GB. - Home, ASR-Backends, CLI-Reference, Troubleshooting, dev-Diarization updated accordingly (user-facing pages: minimal jargon; Deep-Dive pages: full technical detail). - Parakeet-TDT-Deep-Dive: tableau VRAM théorique conservé, nouvelle section "The real cap: ~5:20 min ONNX attention-mask bug" + section "Theoretical VRAM caps (only relevant on ≤ 4 GB GPUs)". Other v1.3.4 doc updates - Home: chunked diarize marked as shipped in v1.3.4 (was "(v1.4)"). - CLI-Reference: `transcribe-diarize-batch` section bumped (was "v1.3 final, not wired into UI yet" → "wired in dictee-transcribe since v1.3.4"). - Keyboard-Shortcuts: default cheatsheet shortcut Ctrl+Alt+F9 → Shift+F9 (v1.3.4 default) + historical context. - Troubleshooting: "use dictee-transcribe (chunked auto)" placed first in the OOM workaround, with `ffmpeg -f segment` as the manual-split alternative. User-facing pages reformulated to drop dev jargon (cf. feedback-wiki-user-not-developer.md): ONNX attention-mask / `_has_cuda_build()` / `pgrep -x` / `/proc/<pid>/comm` / `/dev/shm` / `kpackagetool6` reduced to user-impact prose. Deep-Dive and dev-* pages keep full technical depth.

    @rcspam rcspam committed May 12, 2026
  • docs(wiki): move Voice-Commands to Getting started + link from Home The voice-commands page is what end users discover first ("how do I make a comma appear?"), not a post-processing internal — promote it from the Post-processing section to Getting started in both the sidebar and the Home / Accueil index. - _Sidebar: Voice-Commands now sits between Keyboard-Shortcuts and GPU-Setup; removed the duplicate entry from the Post-processing block. - Home + fr-Home: new bullet under Getting started / Premiers pas, with a one-liner pointing to the floating cheatsheet (Ctrl+Alt+F9).

    @rcspam rcspam committed Apr 26, 2026
  • docs(wiki): add Canary-1B-Deep-Dive (EN + FR) with real benchmarks Structure mirrors Parakeet-TDT-Deep-Dive but adapted to Canary specifics: - encoder-decoder architecture (vs TDT transducer) - 7 languages in the Rust port (EN/FR/DE/ES/IT/PT/UK), verified in src/canary.rs:33-47 - native translation via decoder prompt tokens (source + target lang IDs) - no internal chunking → practical 10-15s ceiling per call - GPU required in practice (CPU ~3× slower than Parakeet CPU) - source lang must match audio (no auto-detect, unlike Parakeet) Benchmarks measured locally on RTX 4070 Laptop with fresh bench runs: - Warm latency per duration (3 / 5 / 10 / 30 / 60 / 300 s), 5 runs each - WER FR 5.4% on MultiLingual LibriSpeech (20 clips), beats Parakeet 7.4% - WER EN 1.8% on LibriSpeech clean (20 clips), beats Parakeet 2.0% - Documented the 30s hallucination loop + 60s+ truncation anomalies Added to _Sidebar.md and both Home indices under "Speech recognition".

    @rcspam rcspam committed Apr 24, 2026
  • docs(wiki): move GPU-Setup to end of Getting started section

    @rcspam rcspam committed Apr 24, 2026
  • docs(wiki): move Configuration from Reference to Getting started (after Setup-Wizard)

    @rcspam rcspam committed Apr 23, 2026
  • docs(wiki): regroup UIs under Getting started + list new pages in Home - Sidebar: fold Plasmoid / Tray / Keyboard-Shortcuts into Getting started (removed the dedicated "User interfaces" block to avoid duplication). - Home (EN+FR): add Setup-Wizard + Configuration to the relevant sections, reference them from the "Three entry paths" shortcuts, update the outdated "English-only" footnote.

    @rcspam rcspam committed Apr 23, 2026
  • docs(wiki): rename FR pages with fr- prefix for auto-sidebar grouping Rename all 20 French pages from <Name>-fr.md to fr-<Name>.md so they cluster together in GitHub Wiki's alphabetical Pages list instead of being interspersed with English pages by topic. Updated all internal links across 41 files: - Cross-language switchers at top of pages - 'Étapes suivantes' / 'Next steps' link lists - Inline links in page bodies - _Sidebar.md FR link Python script replaces '<Name>-fr' with 'fr-<Name>' in order of decreasing length to avoid partial matches. No manual edits.

    @rcspam rcspam committed Apr 21, 2026
  • docs(wiki): fill wave V1-fr — Home, Installation, GPU-Setup, ASR-Backends, Parakeet French translations of the onboarding wave (~1100 lines total): - Home-fr.md : TOC bilingual with 3 entry paths - Installation-fr.md : one-liner + per-distro + tarball + source - GPU-Setup-fr.md : CUDA prereqs, driver matrix, bundled libs - ASR-Backends-fr.md : 4-backend comparison + benchmarks - Parakeet-TDT-Deep-Dive-fr.md : architecture, 25 langs, VRAM limits Added EN↔FR language switcher at top of each EN page: 🌐 Language: **English** | [Français](X-fr) And reciprocal on each FR page: 🌐 Langue : [English](X) | **Français** Naming convention: suffix -fr (Installation-fr.md → /wiki/Installation-fr). Internal cross-links within FR pages also use -fr suffix.

    @rcspam rcspam committed Apr 21, 2026
  • docs(wiki): scaffold 20 pages (stubs) Initial wiki structure for dictee v1.3.0 documentation: Getting started - Home (landing + TOC + 3 entry paths) - Installation (1-liner, .deb/.rpm/AUR/tarball, aarch64) - GPU-Setup (CUDA prereqs per distro + fallback) Speech recognition - ASR-Backends (comparison 4 backends) - Parakeet-TDT-Deep-Dive (main model, 25 langs, VRAM limits) Translation - Translation (5 backends + privacy matrix) - Ollama-Setup (install + Gemma 3 4B + prompts) User interfaces - Plasmoid-Widget (KDE Plasma 6 + 5 animations) - Tray-Icon (PyQt6 + SNI + AppIndicator fallback) - Keyboard-Shortcuts (KDE/GNOME/tiling WMs + double tap) Post-processing pipeline - Post-Processing-Overview (12 steps + diagram) - Rules-and-Dictionary (regex + dict + 7 langs) - LLM-Correction (first/last/hybrid positions) - Numbers-Dates-Continuation (cardinal/ordinal/version/buffer) Diarization & CLI - Diarization (Sortformer + 4 speakers + v1.4 batch) - CLI-Reference (every command + env vars + exit codes) Reference - Troubleshooting (errors by category + log locations) - FAQ (design + features + comparisons) - Developer-Guide (build + tests + i18n + contrib) - Changelog (version history) All 20 pages are stubs: H1 title + intro + TOC + minimal reference table. Content will be filled in 5 waves (V1-V5) per docs/plan-readme-wiki-v1.3.md.

    @rcspam rcspam committed Apr 21, 2026
  • Initial Home page

    @rcspam rcspam committed Apr 21, 2026