A Django-based email scanner that analyzes .eml files to detect phishing and fake emails using a trained ML model and header analysis.
- Upload and scan
.emlemail files via a web interface - Hybrid detection pipeline:
- Signature matching: Instant lookup of known phishing URL and content hashes (MD5/SHA256).
- Dynamic Trusted Domains: Rule-based score reduction for legitimate services (GitHub, Codeberg, Google, etc.), manageable via the admin panel.
- ML-powered analysis: Random Forest with TF-IDF + structural features for unknown threats, including enhanced detection for crypto-related phishing.
- Risk scoring:
LOW(safe),MEDIUM(review),HIGH(phishing) - Suspicious URL extraction with sanitization
- Sender domain analysis and header anomaly detection
- User authentication (register/login/logout)
- Scan history and report management per user
- Rate-limiting on login attempts
The platform includes a secure administrative interface to manage the detection intelligence manually. Access it at /admin/ (requires staff/superuser account).
To prevent false positives from legitimate services:
- Navigate to Scanner > Trusted Domains.
- Add the root domain (e.g.,
codeberg.org,company-internal.com). - These domains receive a significant risk score reduction during analysis.
- Changes are applied instantly (cache is automatically invalidated).
To enable instant blocking of known threats:
- Navigate to Scanner > Signatures.
- Add a new signature by providing:
- Type:
URL(SHA256 of the URL string),MD5, orSHA256(of the email body). - Hash: The cryptographic hash of the indicator.
- Description: Source or name of the phishing campaign.
- Type:
- Matching signatures trigger an immediate
HIGHrisk rating (Score: 100).
Admins can upload new datasets (CSV format) via the Admin Dashboard (/admin-panel/) to retrain the Random Forest model with the latest phishing trends.
FakeFinder/
├── src/ # Django project root (manage.py)
│ ├── FakeFinder/ # Django project settings
│ ├── scanner/ # Main app: models, views, utils
│ │ ├── models.py # ScanReport & Signature models
│ │ ├── views.py # Hybrid prediction logic (Signature + ML)
│ │ ├── utils.py # .eml parsing and URL extraction
│ │ └── urls.py # URL routing
│ └── ml/ # Machine learning module
│ ├── train.py # Model training script
│ ├── features.py # TF-IDF + structural feature extraction
│ └── Phishing_Email.csv # Training dataset (Kaggle)
- Python 3.11+
- Django 5.2
- PostgreSQL (optional, SQLite for development)
# Clone the repository
git clone https://codeberg.org/yushi_61/FakeFinder.git
cd FakeFinder/src
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt-
Copy the example environment file:
cp FakeFinder/.env.example FakeFinder/.env
-
Configure your
.envfile:DEBUG=True DJANGO_SECRET_KEY=your-secret-key-here DJANGO_ALLOWED_HOSTS=localhost,127.0.0.1 # Database (optional — defaults to SQLite) DB_NAME=fakefinder DB_USER=postgres DB_PASSWORD=your-password DB_HOST=localhost DB_PORT=5432
-
Run database migrations:
python manage.py migrate
-
(Optional) Create a superuser:
python manage.py createsuperuser
Before scanning emails, you need to train the model:
python ml/train.py --data ml/Phishing_Email.csv --out ml/model.joblib --trees 200This will:
- Load the Kaggle Phishing Email dataset
- Extract TF-IDF + structural features
- Train a calibrated Random Forest classifier
- Save the model to
ml/model.joblib
Dataset: Kaggle - Phishing Email Detection by Subhalaxmi Rout. Download and place Phishing_Email.csv in the ml/ directory.
python manage.py runserverAccess the application at http://localhost:8000
- Username:
admin - Password:
admin_password_2026
- Register an account or log in
- Upload an
.emlemail file (max 10MB) - View the generated report with:
- Risk score: LOW / MEDIUM / HIGH
- Numeric score: 0–100 (0 = safe, 100 = phishing)
- Suspicious URLs detected in the email
- Header anomalies (sender domain, subject)
- View and manage your scan history from the dashboard
| Score | Risk Level | Description |
|---|---|---|
| 0–29 | LOW | Email appears safe |
| 30–71 | MEDIUM | Ambiguous, review recommended |
| 72–100 | HIGH | Strong phishing signal |
| Endpoint | Method | Description |
|---|---|---|
/ |
GET/POST | Upload email and view results |
/report/<id>/ |
GET | View scan report |
/report/<id>/delete/ |
POST | Delete a report |
/history/delete/ |
POST | Delete all reports |
/login/ |
POST | User login (rate-limited) |
/logout/ |
POST | User logout |
/register/ |
POST | User registration |