Aegis AI: Cryptographic & Machine Learning Fraud Prevention for PwD Certificates
Aegis AI is a secure, dual-authority verification system designed to eliminate disability certificate fraud in high-stakes competitive examinations (such as JEE and NEET).
By combining zero-trust cryptographic vaults with Natural Language Processing (NLP), Machine Learning (ML), and real-time Computer Vision, Aegis AI addresses key loopholes including:
Photographic/Photoshop Forgeries: Bypassed by encrypting certificates with high-entropy keys managed through a secure central database.
Clinical Inflation (Exaggerating minor impairments to cross the 40% benchmark): Audited using an NLP-driven RandomForestRegressor trained on medical board standards to predict true impairment percentage from raw clinical text.
Impersonation/Scribe Abuse: Flagged dynamically in real-time by a secondary MediaPipe-based behavioral tracking script.
🏗️ System Architecture Flow
[ Medical Authority (Port 5001) ] ────────► Generates secure PDF ───► Logs key to vault.json │ ┌──────────────────────────────────────────────────────────────────────────┘ │ ▼ [ Candidate Portal ] ──────────────────────► Uploads encrypted PDF & App ID │ ┌──────────────────────────────────────────────────────────────────────────┘ │ ▼ [ Exam Authority (Port 5002) ] ────────────► Fetches key, Decrypts PDF in-memory ► OCR Text Extraction (Tesseract) ► ML Model prediction (Random Forest) ► Flags anomalies instantly if deviation > 15%
📁 Repository Structure
Aegis_AI/ ├── exam_authority/ │ ├── dashboard.html # Exam verification portal UI │ └── main.py # Port 5002: Unified server & ML forensics engine ├── medical_authority/ │ ├── cert_generate.py # ReportLab and PyPDF2 encryption logic │ ├── dashboard.html # Hospital certificate generation UI │ └── main.py # Port 5001: Certificate generator server ├── poppler-26.02.0/ # Local Poppler binaries (Windows layout) ├── app.py # OCR Backend Flask Server ├── behavior_analysis.py # MediaPipe Computer Vision tracker ├── medical_cases.csv # Professional clinical training dataset (80+ cases) ├── vault.json # Shared key database (automatically generated) ├── requirements.txt # Python dependencies └── README.md # Project documentation (this file)
🛠️ Prerequisites & Installation
To run Aegis AI, your system requires both Python libraries and underlying system-level OCR/PDF rendering binaries.
- System Dependencies (Crucial)
A. Tesseract OCR (Text Extraction Engine)
Windows: Download and run the 64-bit installer from the UB-Mannheim Tesseract Wiki.
Ensure it is installed to the default path: C:\Program Files\Tesseract-OCR\tesseract.exe.
macOS: Install via Homebrew:
brew install tesseract
Linux (Ubuntu/Debian):
sudo apt-get install tesseract-ocr
B. Poppler (PDF-to-Image Conversion)
Windows:
The system expects a local Poppler folder named poppler-26.02.0 in your root project directory.
Make sure the executable file pdftoppm.exe is located at poppler-26.02.0/Library/bin/pdftoppm.exe or poppler-26.02.0/bin/pdftoppm.exe.
macOS: Install via Homebrew:
brew install poppler
Linux (Ubuntu/Debian):
sudo apt-get install poppler-utils
- Python Environment Setup
Clone the repository and navigate to your project directory:
git clone https://github.com/your-username/Aegis_AI.git cd Aegis_AI
Create a virtual environment:
python -m venv venv
Activate the virtual environment:
Windows (PowerShell):
.\venv\Scripts\activate
macOS / Linux:
source venv/bin/activate
Install the mandatory project requirements:
pip install -r requirements.txt
(Optional) If you plan to run the Computer Vision tracking script (behavior_analysis.py), install its dependencies:
pip install opencv-python mediapipe
🚦 How to Run the Project
Step 1: Start the Medical Authority Server (Port 5001)
In your first terminal tab (with your virtual environment active):
python medical_authority/main.py
What this does: Initializes the medical issuance pipeline, ready to generate secure password-protected certificates and write decryption passwords to vault.json.
Step 2: Start the Exam Authority Server (Port 5002)
In your second terminal tab (with your virtual environment active):
python exam_authority/main.py
What this does: Automatically loads the clinical dataset from medical_cases.csv, trains the TF-IDF feature extractor and RandomForestRegressor, and opens up the verification API endpoint.
🔄 Step-by-Step User Workflow
Certificate Generation:
Open medical_authority/dashboard.html in your web browser.
Fill out the form fields with a candidate's information, disability type, percentage, and clinical description.
Click Generate & Encrypt.
Copy the returned Application ID (e.g., AEG-123456) and download the secure PDF certificate.
Run the Forensic Audit:
Open exam_authority/dashboard.html in your browser.
Upload the downloaded PDF certificate.
Paste the copied Application ID.
Click Run AI Forensic Check.
The backend will fetch the matching decryption key from vault.json, decrypt the file, run OCR, extract data, and output whether the document is VERIFIED or FLAGGED (deviation > 15%).
Behavioral Proctoring Check (Optional):
Run the pose-tracking file in your terminal:
python behavior_analysis.py
It will open your webcam feed and dynamically flag high-velocity joint movements on the candidate's monitored "impaired" arm relative to the certificate threshold.
🛡️ Troubleshooting
Poppler / PDF-to-Image Errors: If you get an error saying Unable to get page count, make sure your local Poppler folder is named exactly poppler-26.02.0 and that it contains the Library/bin folder.
Tesseract Errors: If the application fails to locate Tesseract, verify that tesseract.exe is in your standard Windows program files directory (C:\Program Files\Tesseract-OCR\tesseract.exe). If you installed it elsewhere, modify the variable at the top of exam_authority/main.py:
pytesseract.pytesseract.tesseract_cmd = r'Your\Custom\Path\tesseract.exe'
NumPy/JSON Serialization Error: Ensure you are using the latest version of exam_authority/main.py where all statistical metrics are cast to native Python types (int, float, bool) before being serialized to JSON.