Script-Identifier

Scene Text Script Classification using Traditional Machine Learning Techniques

Overview

Script-Identifier is a Pattern Recognition and Machine Learning (PRML) course project developed as part of CSL2050. The objective of the project is to identify the script/language of scene text images using traditional machine learning techniques. The system is trained and evaluated on the Bharat Scene Text Dataset, and the focus is on comparing classical ML algorithms with various handcrafted and deep feature extractors.

This project adheres to the CSL2050 guidelines, emphasizing rigorous evaluation, failure case analysis, and comprehensive deliverables including code, report, demo, and presentation.

Project Team

Developed by Team Lexiconauts
Under CSL2050 - Pattern Recognition and Machine Learning

Features

Classification of scene text into scripts such as Hindi, Tamil, Bengali, etc.
Traditional ML models: SVM, KNN, ANN, Decision Trees, Logistic Regression, etc.
Feature extraction techniques: HOG, SIFT, ResNet, VGG, Vision Transformer (ViT)
Dimensionality reduction: PCA, LDA
Visualizations: t-SNE, PCA, class distributions, and decision boundaries
FastAPI based interfaces for inference and demo
Modular, well-documented codebase with YAML-based configuration

Repository Structure

.
├── config/              # Configuration files for experiments
├── dataset/             # Data loaders and transformations
├── models/              # ML models and backbones
├── tools/               # Training and inference scripts
├── utils/               # Plotting and utility functions
├── Visualisation/       # Visualization scripts
├── data/                # Placeholder for processed data and latents
├── frontend/            # Next.js frontend interface
├── main.py              # Entry point for inference
├── ui.py                # UI
├── fastapi_server.py    # REST API with FastAPI
├── requirements.txt     # Dependencies
└── README.md            # Repo documentation

Installation

Prerequisites

Python 3.10+
CUDA-enabled GPU (for faster training)
pip / conda for package management

Setup

git clone https://github.com/AurindumBanerjee/Script-Identifier.git
cd Script-Identifier
pip install -r requirements.txt

How to Run

Refer to the Logistic Regression ReadMe.md for an example of how to run models. Every other model made also runs similarly.

Use the following code from root to run the model. [Follow path structure from root, separated via '.' instead of '']

python -m model_folder.model_file

To run the fastapi server

python fastapi_server.py

For UI

cd frontend
npm install
npm run dev

Dataset

We use the Bharat Scene Text Dataset containing scene text in 13 Indian scripts.

Download the dataset from the official Bharat Scene Text Dataset and put the dataset folder in data directory as data/recognition

.
└── data/recognition/
    ├── train/        # Training data directory with subfolders for each class (language)
    │   ├── english/
    │   ├── gujarati/
    │   ├── hindi/
    │   └── ...       # Additional language/class folders
    │
    ├── test/         # Test data directory with subfolders for each class (language)
    │   ├── english/
    │   ├── gujarati/
    │   ├── hindi/
    │   └── ...       # Additional language/class folders
    │
    ├── train.csv     # CSV file mapping training images to their labels
    └── test.csv      # CSV file mapping test images to their labels

In the data, we specifically use the Cropped Word Recognition Set. Data splits are provided for training and testing.

Language	Train	Test
Assamese	2,623	1,343
Bengali	4,968	1,161
English	28,778	8,113
Gujarati	1,956	693
Hindi	14,855	4,034
Kannada	2,241	693
Malayalam	2,408	567
Marathi	3,932	1,045
Odia	3,176	1,022
Punjabi	8,544	2,560
Tamil	2,041	507
Telugu	2,227	482
Others	20600	-
Total	77,749	22,220

Deliverables

As per CSL2050 project requirements:

Component	Status
Mid-Project Report	✅ Submitted
Final Report	✅ Submitted
GitHub Repository	✅ Updated
Project Page	✅ `Web Page`
Web Demo (Gradio)	✅ Created
Spotlight Video	✅ Submitted
Minutes of Meetings	✅ Maintained

Citation

@misc{BSTD, title = {{B}harat {S}cene {T}ext {D}ataset}, howpublished = {\url{https://github.com/Bhashini-IITJ/BharatSceneTextDataset}}, year = 2024, }

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Script-Identifier

Overview

Project Team

Features

Repository Structure

Installation

Prerequisites

Setup

How to Run

Dataset

Deliverables

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 337 Commits
Visualisation		Visualisation
assets		assets
ckpts		ckpts
conifg		conifg
data		data
dataset		dataset
frontend		frontend
models		models
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fastapi_server.py		fastapi_server.py
index.html		index.html
main.py		main.py
requirements.txt		requirements.txt
ui.py		ui.py

Folders and files

Latest commit

History

Repository files navigation

Script-Identifier

Overview

Project Team

Features

Repository Structure

Installation

Prerequisites

Setup

How to Run

Dataset

Deliverables

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages