Skip to content
View Kandil7's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Kandil7

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kandil7/README.md

Mohamed Kandil

AI / NLP Engineer | Arabic LLMs Β· RAG Systems Β· Applied AI

Typing SVG

🧠 About Me

class AINLPEngineer:
    def __init__(self):
        self.name       = "Mohamed Kandil"
        self.location   = "Kafr El-Sheikh, Egypt πŸ‡ͺπŸ‡¬"
        self.title      = "AI / NLP Engineer"
        self.focus      = ["Arabic LLMs", "RAG Systems", "Applied AI", "Arabic NLP"]

    def currently_building(self):
        return {
            "Athar"        : "Arabic RAG system over large-scale domain-specific corpora β€” 4,500+ HF downloads",
            "Baligh"       : "Arabic LLM assistant β€” curated knowledge grounding + SFT alignment",
            "EgyptianAgent": "Egyptian Arabic function-calling model (FunctionGemma fine-tune)",
        }

    def open_source(self):
        return {
            "datasets" : "4,500+ downloads Β· 11.5M+ indexed vectors published on Hugging Face",
            "models"   : "huggingface.co/Kandil7",
            "github"   : "github.com/Kandil7",
        }

πŸš€ Featured Projects

πŸ“š Athar β€” Arabic RAG Knowledge System

Large-scale Arabic retrieval-augmented generation

  • πŸ” Hybrid retrieval: dense embeddings + BM25 via Qdrant
  • πŸ“– Domain-specific Arabic knowledge corpora (Shamela4)
  • βœ… Citation-based, grounded question answering
  • πŸ“Š 4,500+ HF downloads Β· 11.5M+ vectors indexed

Tech: Qdrant BM25 Arabic NLP FastAPI PyArrow

Dataset Dataset

πŸ€– Baligh β€” Arabic LLM Assistant

Retrieval-grounded Arabic language model

  • 🧩 SFT on curated Arabic knowledge datasets
  • πŸ›‘οΈ Controlled, hallucination-resistant responses
  • βš™οΈ QLoRA fine-tuning via Unsloth on Qwen2.5-1.5B
  • πŸŒ™ Aligned for structured Arabic knowledge tasks

Tech: Unsloth QLoRA Qwen2.5 Hugging Face TRL

Model

πŸ“± Egyptian Mobile Action Assistant

Dialect-specific function-calling LLM

  • πŸ‡ͺπŸ‡¬ Fine-tuned FunctionGemma for Egyptian Arabic
  • πŸ“² Mobile action understanding and function calling
  • πŸ—£οΈ Dialect-specific instruction following
  • πŸ—οΈ Custom dataset for Egyptian voice commands

Tech: FunctionGemma QLoRA Egyptian Arabic NLP

Model Dataset

πŸͺͺ Egyptian ID OCR System

Production-ready Arabic OCR pipeline

  • 🎯 YOLO detection β†’ OCR ensemble β†’ JSON API
  • πŸ”₯ PaddleOCR + EasyOCR with confidence gating
  • βœ… ~92% field-level accuracy on real samples
  • 🏦 Ready for fintech / KYC applications

Tech: YOLO PaddleOCR EasyOCR OpenCV FastAPI

🌿 Plant Disease Detection

Deep learning computer vision classifier

  • πŸƒ Automated leaf image classification pipeline
  • πŸ”¬ CNN-based deep learning with preprocessing
  • πŸ“Š Multi-class plant disease identification

Tech: PyTorch OpenCV TensorFlow

πŸŽ™οΈ Long-Form Speech Transcription

Whisper-based video transcription pipeline

  • ⏱️ Processes multi-hour audio/video content
  • πŸ“„ Structured text output with timestamps
  • 🌐 Supports Arabic and multilingual audio

Tech: Whisper Python Audio Processing


πŸ€— Open-Source Contributions on Hugging Face

Artifact Type Highlights
Athar-Shamela4 πŸ“¦ Dataset 4,500+ downloads Β· Large Arabic knowledge corpus
Athar-Embeddings πŸ“¦ Dataset 1,000+ downloads Β· 3.26M embedding rows
shamela-vectors πŸ“¦ Dataset 11.5M rows Β· Ready-to-index Arabic vectors
Athar-RAG-Hub πŸ“¦ Dataset 5,850 curated RAG QA pairs
Athar-Mini-Dataset-v2 πŸ“¦ Dataset 80+ downloads Β· Evaluation-ready
egyptian-voice-commands πŸ“¦ Dataset Egyptian Arabic commands dataset
Baligh-1.5B πŸ€– Model Arabic LLM Β· SFT on curated knowledge
functiongemma-270m-egyptian πŸ€– Model 46+ downloads Β· Egyptian function calling

πŸ› οΈ Tech Stack

Languages & Frameworks

Python FastAPI Flutter JavaScript

AI / ML

PyTorch TensorFlow HuggingFace LangChain

Databases & Vector Stores

Qdrant MongoDB PostgreSQL Redis

DevOps & Tools

Docker Git Azure GitHub Actions


πŸ“Š GitHub Stats


🐍 Contribution Graph

Snake animation


🎯 Current Focus

Building:
  - Athar v2: GraphRAG + Agentic retrieval layer for Arabic corpora
  - Baligh v1: Improved SFT alignment + Arabic evaluation benchmarks
  - Arabic RAG evaluation suite (RAGAS-based for Arabic QA)

Learning:
  - Advanced RAG patterns (HippoRAG, EHRAG, atomic retrieval)
  - LLM post-training strategies (DPO, ORPO, RLHF)
  - Production ML deployment and MLOps at scale

Open Source:
  - Publishing Arabic NLP datasets on Hugging Face
  - Building open evaluation benchmarks for Arabic QA
  - Contributing to Arabic LLM tooling ecosystem

πŸ’¬ Let's Connect!

πŸ’Ό Open to collaborations, full-time roles, and freelance AI/NLP projects


⚑ "Building applied Arabic AI systems that actually work in production" ⚑

Pinned Loading

  1. prprompts-flutter-generator prprompts-flutter-generator Public

    Generate customized PRPROMPTS for Flutter projects from PRD - HIPAA, PCI-DSS, GDPR compliant with Clean Architecture and BLoC patterns

    JavaScript 11 2