Skip to content

2leander2/seven-segment-ocr

Repository files navigation

Seven-Segment Display OCR

A deep learning-based Optical Character Recognition (OCR) system for reading seven-segment displays from video frames. This project provides tools for labeling video data, training neural network models, and performing real-time predictions on seven-segment displays.

Features

  • Video Labeling: Interactive GUI tool for labeling seven-segment displays in video frames
  • Data Augmentation: Automatic generation of training variations with perspective warping, brightness, contrast, and blur adjustments
  • Deep Learning Model: CNN-based sequence model for recognizing multi-digit seven-segment displays
  • Real-time Prediction: Support for webcam and video file input with CSV output
  • Perspective Correction: Automatic perspective warping to handle angled camera views
  • Multi-Meter Support: Handle multiple meters in a single video frame

Installation

  1. Clone the repository:
git clone https://github.com/2leander2/seven-segment-ocr.git
cd seven-segment-ocr
  1. Install dependencies:
pip install -r requirements.txt

Requirements

  • Python 3.x
  • OpenCV
  • TensorFlow/Keras
  • NumPy
  • Pandas
  • Matplotlib
  • Pillow
  • scikit-learn
  • tqdm

See requirements.txt for the complete list of dependencies.

Project Structure

seven-segment-ocr/
├── label_data.py          # Video labeling tool with GUI
├── training.py            # Model training script
├── predicting.py          # Prediction/inference script
├── model.py               # Neural network model definition
├── dataset.py             # Dataset loading and preprocessing
├── image_filtering.py     # Image preprocessing and augmentation
├── videos/                # Input video files
├── training_csv/          # CSV files with labels
├── models/                # Trained model files
└── requirements.txt       # Python dependencies

Usage

1. Labeling Video Data

Use label_data.py to label seven-segment displays in video frames:

from label_data import VideoLabeler

video_path = 'videos/sample1.mov'
labeler = VideoLabeler(
    video_path=video_path,
    frame_interval=9,        # Process every 9th frame
    start_index=0,
    frame_shape=(100, 246, 1)  # Output image dimensions (height, width, channels)
)

# Label frames interactively
labeler.run()

# Generate augmented training data
labeler.generate_frames_all(
    variations=10,              # Number of variations per frame
    max_point_variation=9,      # Max pixel variation for perspective points
    max_brightness=0.03,        # Max brightness variation
    max_contrast=0.03,          # Max contrast variation
    max_blurriness=5            # Max blur kernel size
)

Labeling Process:

  1. Select 4 corner points around the seven-segment display
  2. Enter a meter ID for the selected display
  3. For each frame, enter the numeric value shown on the display
  4. Leave empty if the display is obstructed or unreadable

2. Training the Model

Train a model using labeled data:

from model import SequenceModel
import pandas as pd

def filter_data(csv_files, interval=1):
    dfs = []
    for csv_file in csv_files:
        df = pd.read_csv(csv_file, dtype=str)
        dfs.append(df)
    
    dataframe = pd.concat(dfs)
    dataframe = dataframe.drop_duplicates(subset=["Label"])
    dataframe = dataframe.iloc[::interval]  # Sample every Nth row
    
    return dataframe

# Initialize model
sequence_model = SequenceModel()

# Load and filter training data
csv_files = [
    "training_csv/sample1/meter_0_labels.csv",
    "training_csv/sample1/meter_1_labels.csv",
]
dataframe = filter_data(csv_files, interval=9)

# Train the model
sequence_model.start_training(
    dataframe=dataframe,
    img_shape=(100, 246, 1),      # Input image shape
    epochs=50,                     # Training epochs
    units=14,                      # Number of output classes (0-9, '.', '-', '+')
    num_shape=(3, 2),              # Number of digits (3 digits, 2 decimal places)
    save_path="models/sequence_model.keras",
    batch_size=50
)

3. Making Predictions

Use the trained model to predict values from images or video:

from predicting import LabelGenerator
from model import SequenceModel

# Define models (model_path, image_shape)
models = [
    ("models/sequence_model.keras", (100, 246, 1)),
]

# Initialize label generator
generator = LabelGenerator(
    models=models,
    resolution=(1920, 1080),
    convert_func=None,  # Custom conversion function if needed
    results_csv_dir="results"
)

# Predict from video file
generator.generate_csv_from_img_sequence(
    sequence_path="videos/sample1.mov",
    scale=1,
    apply_blur=False,
    write_debug_video=True
)

# Or predict from webcam
generator.generate_csv_from_webcam(
    webcam_addr=0,  # Camera index
    scale=1,
    interval=1000,  # Process every 1000ms
    write_sequence=True,
    apply_blur=False
)

How It Works

Image Preprocessing

  1. Perspective Warping: Corrects for camera angle by warping the display region to a rectangular view
  2. Grayscale Conversion: Converts to grayscale for processing
  3. Thresholding: Applies Otsu's thresholding to create binary images
  4. Resizing: Normalizes image size for the neural network

Data Augmentation

The system generates multiple variations of each labeled frame:

  • Perspective Variation: Slight shifts in corner points
  • Brightness Adjustment: Random brightness variations
  • Contrast Adjustment: Random contrast variations
  • Gaussian Blur: Random blur to simulate motion or focus issues

Model Architecture

The SequenceModel uses a CNN-based architecture:

  • Convolutional layers for feature extraction
  • Dense layers for classification
  • Multi-output design to predict each digit position independently
  • Supports digits (0-9), decimal point (.), and sign (+/-)

Label Format

Labels are stored in CSV format with columns:

  • Frame Number: Frame index in the video
  • Label: Numeric value displayed (e.g., "123.45", "-67.8")
  • Image Path: Path to the processed image directory

Output Format

Predictions are saved as CSV files with:

  • Timestamp
  • Frame number
  • Predicted values for each meter
  • Processing time

Tips

  • Frame Interval: Use larger intervals (e.g., 9) to reduce labeling effort while maintaining coverage
  • Augmentation: Generate 10-20 variations per frame for better model generalization
  • Multiple Meters: Label each meter separately with unique IDs
  • Image Quality: Ensure good lighting and minimal obstructions for best results

License

MIT License

About

Deep learning OCR system for reading seven-segment displays from video frames using TensorFlow/Keras

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages