SignSense is a real-time American Sign Language (ASL) recognition desktop application built in Python. Designed as a precise, developer-focused tooling interface, it uses a live webcam feed to detect, classify, and track static alphabet signs.
The visual identity is minimal, functional, and clean. It features interactive overlay panels, realtime probability confidence scoring, left-and-right hand tracking, and a dedicated Word Builder mode for text assembly. Built without any external ML models, its recognition engine relies on deep 3D MediaPipe hand landmark spatial tracking and heuristic geometric rules.
This project is built using the following core technologies:
- Python 3.12: The core programming language used for the application logic. (Note: Python 3.12 is explicitly required because newer versions of MediaPipe on Python 3.13+ drop support for the legacy
solutionsAPI used in this project). - OpenCV (
opencv-python): Used for interfacing with the webcam, capturing real-time video frames, and rendering the custom heads-up display (HUD) overlay and interface elements. - Google MediaPipe: Utilizes the legacy
mediapipe.solutions.handsAPI to extract 21 3D landmarks of a hand in real-time. Pinned to<0.10.15to ensure API and namespace compatibility. - NumPy: Used for high-performance mathematical operations, matrix manipulations, and Euclidean distance calculations between hand landmarks.
- uv (Astral): Recommended for lightning-fast virtual environment creation and dependency resolution.
Follow these steps to get the application running on your local machine:
Ensure you have the following installed on your system:
- Python 3.12 (highly recommended to avoid MediaPipe compatibility issues)
- A working webcam connected to your computer
git clone https://github.com/yourusername/SignSense.git
cd SignSenseIt is highly recommended to isolate the project dependencies. If you use uv, run:
uv venv --python 3.12Alternatively, using standard Python venv:
python3.12 -m venv .venv- Windows (PowerShell):
.\.venv\Scripts\Activate.ps1
- macOS / Linux:
source .venv/bin/activate
Install the required packages strictly from the requirements.txt to avoid API breaking changes:
pip install -r requirements.txt(Or if using uv: uv pip install -r requirements.txt)
Start the ASL recognizer by passing the Python script to your environment:
python sign_language_recognition.pyNote: The application requires camera permissions from your OS. It defaults to camera index
0.
While the webcam window is focused, you can use the following standard keyboard shortcuts:
| Key | Action |
|---|---|
Q / ESC |
Quit the application cleanly |
TAB |
Toggle Word Builder Mode (text assembly) on/off |
SPACE |
Pause feed (Standard) OR Add word to sentence (Word Builder Mode) |
BACKSPACE |
Clear history (Standard) OR Delete last letter (Word Builder Mode) |
S |
Save a timestamped screenshot of the current frame to ./screenshots/ |
The application supports most static ASL alphabet letters and some common gestures.
| Category | Letters |
|---|---|
| Single finger | I (pinky), D (index), X (hooked index) |
| Two fingers | H, U, V, R, K, L |
| Three fingers | W |
| Four fingers | B |
| Full hand | C, O, E, S, A |
| Thumb combos | Y, L, T, G |
| Pinches | F, O, D |
| Special | ILY 🤟 (thumb + index + pinky) |
⚠️ Note: Dynamic signs (like J and Z), which require fluid hand motion to signify correctly, are formally excluded from this model as it exclusively analyzes static geometrical frames.
While this application successfully mitigates flat planar tracking issues by utilizing MediaPipe's true 3D spatial (Z-axis) vector measurements and left-hand mirroring, consider the following limitations:
- Finger Crossovers: The R sign is approximated best-effort and will classify largely based on the index and middle fingers extending, which mimics the H and U states heavily.
- Camera Angle: For maximum accuracy, keep your hand squared and flat directed to the camera so that lateral overlapping (e.g., crossing thumbs) is visibly clear to the MediaPipe tracker.