class AINLPEngineer:
def __init__(self):
self.name = "Mohamed Kandil"
self.location = "Kafr El-Sheikh, Egypt πͺπ¬"
self.title = "AI / NLP Engineer"
self.focus = ["Arabic LLMs", "RAG Systems", "Applied AI", "Arabic NLP"]
def currently_building(self):
return {
"Athar" : "Arabic RAG system over large-scale domain-specific corpora β 4,500+ HF downloads",
"Baligh" : "Arabic LLM assistant β curated knowledge grounding + SFT alignment",
"EgyptianAgent": "Egyptian Arabic function-calling model (FunctionGemma fine-tune)",
}
def open_source(self):
return {
"datasets" : "4,500+ downloads Β· 11.5M+ indexed vectors published on Hugging Face",
"models" : "huggingface.co/Kandil7",
"github" : "github.com/Kandil7",
}|
Large-scale Arabic retrieval-augmented generation
Tech: |
Retrieval-grounded Arabic language model
Tech: |
|
Dialect-specific function-calling LLM
Tech: |
Production-ready Arabic OCR pipeline
Tech: |
|
Deep learning computer vision classifier
Tech: |
Whisper-based video transcription pipeline
Tech: |
| Artifact | Type | Highlights |
|---|---|---|
| Athar-Shamela4 | π¦ Dataset | 4,500+ downloads Β· Large Arabic knowledge corpus |
| Athar-Embeddings | π¦ Dataset | 1,000+ downloads Β· 3.26M embedding rows |
| shamela-vectors | π¦ Dataset | 11.5M rows Β· Ready-to-index Arabic vectors |
| Athar-RAG-Hub | π¦ Dataset | 5,850 curated RAG QA pairs |
| Athar-Mini-Dataset-v2 | π¦ Dataset | 80+ downloads Β· Evaluation-ready |
| egyptian-voice-commands | π¦ Dataset | Egyptian Arabic commands dataset |
| Baligh-1.5B | π€ Model | Arabic LLM Β· SFT on curated knowledge |
| functiongemma-270m-egyptian | π€ Model | 46+ downloads Β· Egyptian function calling |
Building:
- Athar v2: GraphRAG + Agentic retrieval layer for Arabic corpora
- Baligh v1: Improved SFT alignment + Arabic evaluation benchmarks
- Arabic RAG evaluation suite (RAGAS-based for Arabic QA)
Learning:
- Advanced RAG patterns (HippoRAG, EHRAG, atomic retrieval)
- LLM post-training strategies (DPO, ORPO, RLHF)
- Production ML deployment and MLOps at scale
Open Source:
- Publishing Arabic NLP datasets on Hugging Face
- Building open evaluation benchmarks for Arabic QA
- Contributing to Arabic LLM tooling ecosystem


