Skip to content

ivaaneoski/signSense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SignSense - Real-Time Sign Language Recognition

Python OpenCV MediaPipe NumPy

SignSense is a real-time American Sign Language (ASL) recognition desktop application built in Python. Designed as a precise, developer-focused tooling interface, it uses a live webcam feed to detect, classify, and track static alphabet signs.

The visual identity is minimal, functional, and clean. It features interactive overlay panels, realtime probability confidence scoring, left-and-right hand tracking, and a dedicated Word Builder mode for text assembly. Built without any external ML models, its recognition engine relies on deep 3D MediaPipe hand landmark spatial tracking and heuristic geometric rules.


Built With (Technology Stack)

This project is built using the following core technologies:

  • Python 3.12: The core programming language used for the application logic. (Note: Python 3.12 is explicitly required because newer versions of MediaPipe on Python 3.13+ drop support for the legacy solutions API used in this project).
  • OpenCV (opencv-python): Used for interfacing with the webcam, capturing real-time video frames, and rendering the custom heads-up display (HUD) overlay and interface elements.
  • Google MediaPipe: Utilizes the legacy mediapipe.solutions.hands API to extract 21 3D landmarks of a hand in real-time. Pinned to <0.10.15 to ensure API and namespace compatibility.
  • NumPy: Used for high-performance mathematical operations, matrix manipulations, and Euclidean distance calculations between hand landmarks.
  • uv (Astral): Recommended for lightning-fast virtual environment creation and dependency resolution.

Step-by-Step Installation Guide

Follow these steps to get the application running on your local machine:

1. Prerequisites

Ensure you have the following installed on your system:

  • Python 3.12 (highly recommended to avoid MediaPipe compatibility issues)
  • A working webcam connected to your computer

2. Clone the Repository

git clone https://github.com/yourusername/SignSense.git
cd SignSense

3. Create a Virtual Environment

It is highly recommended to isolate the project dependencies. If you use uv, run:

uv venv --python 3.12

Alternatively, using standard Python venv:

python3.12 -m venv .venv

4. Activate the Environment

  • Windows (PowerShell):
    .\.venv\Scripts\Activate.ps1
  • macOS / Linux:
    source .venv/bin/activate

5. Install Dependencies

Install the required packages strictly from the requirements.txt to avoid API breaking changes:

pip install -r requirements.txt

(Or if using uv: uv pip install -r requirements.txt)

6. Run the Application

Start the ASL recognizer by passing the Python script to your environment:

python sign_language_recognition.py

Note: The application requires camera permissions from your OS. It defaults to camera index 0.


⌨️ Controls & Keyboard Shortcuts

While the webcam window is focused, you can use the following standard keyboard shortcuts:

Key Action
Q / ESC Quit the application cleanly
TAB Toggle Word Builder Mode (text assembly) on/off
SPACE Pause feed (Standard) OR Add word to sentence (Word Builder Mode)
BACKSPACE Clear history (Standard) OR Delete last letter (Word Builder Mode)
S Save a timestamped screenshot of the current frame to ./screenshots/

Supported Signs Reference

The application supports most static ASL alphabet letters and some common gestures.

Category Letters
Single finger I (pinky), D (index), X (hooked index)
Two fingers H, U, V, R, K, L
Three fingers W
Four fingers B
Full hand C, O, E, S, A
Thumb combos Y, L, T, G
Pinches F, O, D
Special ILY 🤟 (thumb + index + pinky)

⚠️ Note: Dynamic signs (like J and Z), which require fluid hand motion to signify correctly, are formally excluded from this model as it exclusively analyzes static geometrical frames.


⚠️ Known Limitations & Ambiguities

While this application successfully mitigates flat planar tracking issues by utilizing MediaPipe's true 3D spatial (Z-axis) vector measurements and left-hand mirroring, consider the following limitations:

  • Finger Crossovers: The R sign is approximated best-effort and will classify largely based on the index and middle fingers extending, which mimics the H and U states heavily.
  • Camera Angle: For maximum accuracy, keep your hand squared and flat directed to the camera so that lateral overlapping (e.g., crossing thumbs) is visibly clear to the MediaPipe tracker.

About

Real-time American Sign Language (ASL) detection for Python 3.12. Uses hand-landmark tracking and Euclidean geometry to classify static signs with a clean HUD overlay

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages