FakeFinder

A Django-based email scanner that analyzes .eml files to detect phishing and fake emails using a trained ML model and header analysis.

Features

Upload and scan .eml email files via a web interface
Hybrid detection pipeline:
- Signature matching: Instant lookup of known phishing URL and content hashes (MD5/SHA256).
- Dynamic Trusted Domains: Rule-based score reduction for legitimate services (GitHub, Codeberg, Google, etc.), manageable via the admin panel.
- ML-powered analysis: Random Forest with TF-IDF + structural features for unknown threats, including enhanced detection for crypto-related phishing.
Risk scoring: LOW (safe), MEDIUM (review), HIGH (phishing)
Suspicious URL extraction with sanitization
Sender domain analysis and header anomaly detection
User authentication (register/login/logout)
Scan history and report management per user
Rate-limiting on login attempts

Admin Management

The platform includes a secure administrative interface to manage the detection intelligence manually. Access it at /admin/ (requires staff/superuser account).

Managing Trusted Domains

To prevent false positives from legitimate services:

Navigate to Scanner > Trusted Domains.
Add the root domain (e.g., codeberg.org, company-internal.com).
These domains receive a significant risk score reduction during analysis.
Changes are applied instantly (cache is automatically invalidated).

Managing Phishing Signatures

To enable instant blocking of known threats:

Navigate to Scanner > Signatures.
Add a new signature by providing:
- Type: URL (SHA256 of the URL string), MD5, or SHA256 (of the email body).
- Hash: The cryptographic hash of the indicator.
- Description: Source or name of the phishing campaign.
Matching signatures trigger an immediate HIGH risk rating (Score: 100).

Retraining the ML Model

Admins can upload new datasets (CSV format) via the Admin Dashboard (/admin-panel/) to retrain the Random Forest model with the latest phishing trends.

Architecture

FakeFinder/
├── src/                 # Django project root (manage.py)
│   ├── FakeFinder/      # Django project settings
│   ├── scanner/         # Main app: models, views, utils
│   │   ├── models.py    # ScanReport & Signature models
│   │   ├── views.py     # Hybrid prediction logic (Signature + ML)
│   │   ├── utils.py     # .eml parsing and URL extraction
│   │   └── urls.py      # URL routing
│   └── ml/              # Machine learning module
│       ├── train.py     # Model training script
│       ├── features.py  # TF-IDF + structural feature extraction
│       └── Phishing_Email.csv  # Training dataset (Kaggle)

Requirements

Python 3.11+
Django 5.2
PostgreSQL (optional, SQLite for development)

Installation

# Clone the repository
git clone https://codeberg.org/yushi_61/FakeFinder.git
cd FakeFinder/src

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

Environment Setup

Copy the example environment file:

cp FakeFinder/.env.example FakeFinder/.env

Configure your .env file:

DEBUG=True
DJANGO_SECRET_KEY=your-secret-key-here
DJANGO_ALLOWED_HOSTS=localhost,127.0.0.1

# Database (optional — defaults to SQLite)
DB_NAME=fakefinder
DB_USER=postgres
DB_PASSWORD=your-password
DB_HOST=localhost
DB_PORT=5432

Run database migrations:
```
python manage.py migrate
```
(Optional) Create a superuser:
```
python manage.py createsuperuser
```

Training the ML Model

Before scanning emails, you need to train the model:

python ml/train.py --data ml/Phishing_Email.csv --out ml/model.joblib --trees 200

This will:

Load the Kaggle Phishing Email dataset
Extract TF-IDF + structural features
Train a calibrated Random Forest classifier
Save the model to ml/model.joblib

Dataset: Kaggle - Phishing Email Detection by Subhalaxmi Rout. Download and place Phishing_Email.csv in the ml/ directory.

Running the Server

python manage.py runserver

Access the application at http://localhost:8000

Admin Credentials (Local/Demo)

Username: admin
Password: admin_password_2026

Usage

Register an account or log in
Upload an .eml email file (max 10MB)
View the generated report with:
- Risk score: LOW / MEDIUM / HIGH
- Numeric score: 0–100 (0 = safe, 100 = phishing)
- Suspicious URLs detected in the email
- Header anomalies (sender domain, subject)
View and manage your scan history from the dashboard

Risk Score Thresholds

Score	Risk Level	Description
0–29	LOW	Email appears safe
30–71	MEDIUM	Ambiguous, review recommended
72–100	HIGH	Strong phishing signal

API Endpoints

Endpoint	Method	Description
`/`	GET/POST	Upload email and view results
`/report/<id>/`	GET	View scan report
`/report/<id>/delete/`	POST	Delete a report
`/history/delete/`	POST	Delete all reports
`/login/`	POST	User login (rate-limited)
`/logout/`	POST	User logout
`/register/`	POST	User registration

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
Diagrams		Diagrams
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
report.pdf		report.pdf
specifications.pdf		specifications.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FakeFinder

Features

Admin Management

Managing Trusted Domains

Managing Phishing Signatures

Retraining the ML Model

Architecture

Requirements

Installation

Environment Setup

Training the ML Model

Running the Server

Admin Credentials (Local/Demo)

Usage

Risk Score Thresholds

API Endpoints

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FakeFinder

Features

Admin Management

Managing Trusted Domains

Managing Phishing Signatures

Retraining the ML Model

Architecture

Requirements

Installation

Environment Setup

Training the ML Model

Running the Server

Admin Credentials (Local/Demo)

Usage

Risk Score Thresholds

API Endpoints

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages