Seven-Segment Display OCR

A deep learning-based Optical Character Recognition (OCR) system for reading seven-segment displays from video frames. This project provides tools for labeling video data, training neural network models, and performing real-time predictions on seven-segment displays.

Features

Video Labeling: Interactive GUI tool for labeling seven-segment displays in video frames
Data Augmentation: Automatic generation of training variations with perspective warping, brightness, contrast, and blur adjustments
Deep Learning Model: CNN-based sequence model for recognizing multi-digit seven-segment displays
Real-time Prediction: Support for webcam and video file input with CSV output
Perspective Correction: Automatic perspective warping to handle angled camera views
Multi-Meter Support: Handle multiple meters in a single video frame

Installation

Clone the repository:

git clone https://github.com/2leander2/seven-segment-ocr.git
cd seven-segment-ocr

Install dependencies:

pip install -r requirements.txt

Requirements

Python 3.x
OpenCV
TensorFlow/Keras
NumPy
Pandas
Matplotlib
Pillow
scikit-learn
tqdm

See requirements.txt for the complete list of dependencies.

Project Structure

seven-segment-ocr/
├── label_data.py          # Video labeling tool with GUI
├── training.py            # Model training script
├── predicting.py          # Prediction/inference script
├── model.py               # Neural network model definition
├── dataset.py             # Dataset loading and preprocessing
├── image_filtering.py     # Image preprocessing and augmentation
├── videos/                # Input video files
├── training_csv/          # CSV files with labels
├── models/                # Trained model files
└── requirements.txt       # Python dependencies

Usage

1. Labeling Video Data

Use label_data.py to label seven-segment displays in video frames:

from label_data import VideoLabeler

video_path = 'videos/sample1.mov'
labeler = VideoLabeler(
    video_path=video_path,
    frame_interval=9,        # Process every 9th frame
    start_index=0,
    frame_shape=(100, 246, 1)  # Output image dimensions (height, width, channels)
)

# Label frames interactively
labeler.run()

# Generate augmented training data
labeler.generate_frames_all(
    variations=10,              # Number of variations per frame
    max_point_variation=9,      # Max pixel variation for perspective points
    max_brightness=0.03,        # Max brightness variation
    max_contrast=0.03,          # Max contrast variation
    max_blurriness=5            # Max blur kernel size
)

Labeling Process:

Select 4 corner points around the seven-segment display
Enter a meter ID for the selected display
For each frame, enter the numeric value shown on the display
Leave empty if the display is obstructed or unreadable

2. Training the Model

Train a model using labeled data:

from model import SequenceModel
import pandas as pd

def filter_data(csv_files, interval=1):
    dfs = []
    for csv_file in csv_files:
        df = pd.read_csv(csv_file, dtype=str)
        dfs.append(df)
    
    dataframe = pd.concat(dfs)
    dataframe = dataframe.drop_duplicates(subset=["Label"])
    dataframe = dataframe.iloc[::interval]  # Sample every Nth row
    
    return dataframe

# Initialize model
sequence_model = SequenceModel()

# Load and filter training data
csv_files = [
    "training_csv/sample1/meter_0_labels.csv",
    "training_csv/sample1/meter_1_labels.csv",
]
dataframe = filter_data(csv_files, interval=9)

# Train the model
sequence_model.start_training(
    dataframe=dataframe,
    img_shape=(100, 246, 1),      # Input image shape
    epochs=50,                     # Training epochs
    units=14,                      # Number of output classes (0-9, '.', '-', '+')
    num_shape=(3, 2),              # Number of digits (3 digits, 2 decimal places)
    save_path="models/sequence_model.keras",
    batch_size=50
)

3. Making Predictions

Use the trained model to predict values from images or video:

from predicting import LabelGenerator
from model import SequenceModel

# Define models (model_path, image_shape)
models = [
    ("models/sequence_model.keras", (100, 246, 1)),
]

# Initialize label generator
generator = LabelGenerator(
    models=models,
    resolution=(1920, 1080),
    convert_func=None,  # Custom conversion function if needed
    results_csv_dir="results"
)

# Predict from video file
generator.generate_csv_from_img_sequence(
    sequence_path="videos/sample1.mov",
    scale=1,
    apply_blur=False,
    write_debug_video=True
)

# Or predict from webcam
generator.generate_csv_from_webcam(
    webcam_addr=0,  # Camera index
    scale=1,
    interval=1000,  # Process every 1000ms
    write_sequence=True,
    apply_blur=False
)

How It Works

Image Preprocessing

Perspective Warping: Corrects for camera angle by warping the display region to a rectangular view
Grayscale Conversion: Converts to grayscale for processing
Thresholding: Applies Otsu's thresholding to create binary images
Resizing: Normalizes image size for the neural network

Data Augmentation

The system generates multiple variations of each labeled frame:

Perspective Variation: Slight shifts in corner points
Brightness Adjustment: Random brightness variations
Contrast Adjustment: Random contrast variations
Gaussian Blur: Random blur to simulate motion or focus issues

Model Architecture

The SequenceModel uses a CNN-based architecture:

Convolutional layers for feature extraction
Dense layers for classification
Multi-output design to predict each digit position independently
Supports digits (0-9), decimal point (.), and sign (+/-)

Label Format

Labels are stored in CSV format with columns:

Frame Number: Frame index in the video
Label: Numeric value displayed (e.g., "123.45", "-67.8")
Image Path: Path to the processed image directory

Output Format

Predictions are saved as CSV files with:

Timestamp
Frame number
Predicted values for each meter
Processing time

Tips

Frame Interval: Use larger intervals (e.g., 9) to reduce labeling effort while maintaining coverage
Augmentation: Generate 10-20 variations per frame for better model generalization
Multiple Meters: Label each meter separately with unique IDs
Image Quality: Ensure good lighting and minimal obstructions for best results

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seven-Segment Display OCR

Features

Installation

Requirements

Project Structure

Usage

1. Labeling Video Data

2. Training the Model

3. Making Predictions

How It Works

Image Preprocessing

Data Augmentation

Model Architecture

Label Format

Output Format

Tips

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
__pycache__		__pycache__
generator		generator
image_sequences		image_sequences
models		models
results_csv		results_csv
training_csv		training_csv
videos		videos
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
capture_sequence.py		capture_sequence.py
data_filter.py		data_filter.py
dataset.py		dataset.py
image_filtering.py		image_filtering.py
label_data.py		label_data.py
model.py		model.py
plot.py		plot.py
predicting.py		predicting.py
requirements.txt		requirements.txt
test.py		test.py
training.py		training.py

Folders and files

Latest commit

History

Repository files navigation

Seven-Segment Display OCR

Features

Installation

Requirements

Project Structure

Usage

1. Labeling Video Data

2. Training the Model

3. Making Predictions

How It Works

Image Preprocessing

Data Augmentation

Model Architecture

Label Format

Output Format

Tips

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages