A Python implementation of Farasa toolkit
-
Updated
Sep 11, 2025 - Python
A Python implementation of Farasa toolkit
Neural Arabic text diacritization
Yorùbá language training text for NLP, ASR and TTS tasks
Benchmark Arabic text diacritization dataset
Extensible DL-based automatic Arabic diacritization tool allowing the restoration of different types of diacritics.
Set of tools for tokenization, transliteration and diacritic restoration of a text written in Serbian language.
Automatic Arabic diacritics restoration tool.
Arabic deep-learning based diacritization models (Shakkala, Shakkelha) ported to PyTorch
Arabic deep-learning based diacritization models (Shakkala, Shakkelha) in the ONNX format.
Course Project for Natural Language Processing
Remove diacritics (accents)
Translation-over-Diacritization technique implementation
Neural Arabic text diacritization website
Official code for "Fine-Tashkeel at KSAA-2026" — Systematic evaluation of 18 Seq2Seq, token classification, decoder LLM, and ASR models for automatic Arabic text diacritization. 5th place at KSAA-2026 Shared Task (OSACT7 @ LREC 2026).
✒ DúöScríbî allows you to convert letters with diacritics to regular, ASCII letters. 🤓
Arabic diacritization with Transformers
A personal project on diacritizing Arabic text.
Guidelines and linguistic rules for improving Arabic pronunciation in AI speech systems through preprocessing and diacritization.
This repository implements a complete solution for processing Arabic text within PDF documents. Key features include: OCR using Google Cloud Document AI. Arabic text cleaning and formatting. Optional diacritization using Farasa. Asynchronous processing for faster performance.
Add a description, image, and links to the diacritization topic page so that developers can more easily learn about it.
To associate your repository with the diacritization topic, visit your repo's landing page and select "manage topics."