A deep learning-based Optical Character Recognition (OCR) system for reading seven-segment displays from video frames. This project provides tools for labeling video data, training neural network models, and performing real-time predictions on seven-segment displays.
- Video Labeling: Interactive GUI tool for labeling seven-segment displays in video frames
- Data Augmentation: Automatic generation of training variations with perspective warping, brightness, contrast, and blur adjustments
- Deep Learning Model: CNN-based sequence model for recognizing multi-digit seven-segment displays
- Real-time Prediction: Support for webcam and video file input with CSV output
- Perspective Correction: Automatic perspective warping to handle angled camera views
- Multi-Meter Support: Handle multiple meters in a single video frame
- Clone the repository:
git clone https://github.com/2leander2/seven-segment-ocr.git
cd seven-segment-ocr- Install dependencies:
pip install -r requirements.txt- Python 3.x
- OpenCV
- TensorFlow/Keras
- NumPy
- Pandas
- Matplotlib
- Pillow
- scikit-learn
- tqdm
See requirements.txt for the complete list of dependencies.
seven-segment-ocr/
├── label_data.py # Video labeling tool with GUI
├── training.py # Model training script
├── predicting.py # Prediction/inference script
├── model.py # Neural network model definition
├── dataset.py # Dataset loading and preprocessing
├── image_filtering.py # Image preprocessing and augmentation
├── videos/ # Input video files
├── training_csv/ # CSV files with labels
├── models/ # Trained model files
└── requirements.txt # Python dependencies
Use label_data.py to label seven-segment displays in video frames:
from label_data import VideoLabeler
video_path = 'videos/sample1.mov'
labeler = VideoLabeler(
video_path=video_path,
frame_interval=9, # Process every 9th frame
start_index=0,
frame_shape=(100, 246, 1) # Output image dimensions (height, width, channels)
)
# Label frames interactively
labeler.run()
# Generate augmented training data
labeler.generate_frames_all(
variations=10, # Number of variations per frame
max_point_variation=9, # Max pixel variation for perspective points
max_brightness=0.03, # Max brightness variation
max_contrast=0.03, # Max contrast variation
max_blurriness=5 # Max blur kernel size
)Labeling Process:
- Select 4 corner points around the seven-segment display
- Enter a meter ID for the selected display
- For each frame, enter the numeric value shown on the display
- Leave empty if the display is obstructed or unreadable
Train a model using labeled data:
from model import SequenceModel
import pandas as pd
def filter_data(csv_files, interval=1):
dfs = []
for csv_file in csv_files:
df = pd.read_csv(csv_file, dtype=str)
dfs.append(df)
dataframe = pd.concat(dfs)
dataframe = dataframe.drop_duplicates(subset=["Label"])
dataframe = dataframe.iloc[::interval] # Sample every Nth row
return dataframe
# Initialize model
sequence_model = SequenceModel()
# Load and filter training data
csv_files = [
"training_csv/sample1/meter_0_labels.csv",
"training_csv/sample1/meter_1_labels.csv",
]
dataframe = filter_data(csv_files, interval=9)
# Train the model
sequence_model.start_training(
dataframe=dataframe,
img_shape=(100, 246, 1), # Input image shape
epochs=50, # Training epochs
units=14, # Number of output classes (0-9, '.', '-', '+')
num_shape=(3, 2), # Number of digits (3 digits, 2 decimal places)
save_path="models/sequence_model.keras",
batch_size=50
)Use the trained model to predict values from images or video:
from predicting import LabelGenerator
from model import SequenceModel
# Define models (model_path, image_shape)
models = [
("models/sequence_model.keras", (100, 246, 1)),
]
# Initialize label generator
generator = LabelGenerator(
models=models,
resolution=(1920, 1080),
convert_func=None, # Custom conversion function if needed
results_csv_dir="results"
)
# Predict from video file
generator.generate_csv_from_img_sequence(
sequence_path="videos/sample1.mov",
scale=1,
apply_blur=False,
write_debug_video=True
)
# Or predict from webcam
generator.generate_csv_from_webcam(
webcam_addr=0, # Camera index
scale=1,
interval=1000, # Process every 1000ms
write_sequence=True,
apply_blur=False
)- Perspective Warping: Corrects for camera angle by warping the display region to a rectangular view
- Grayscale Conversion: Converts to grayscale for processing
- Thresholding: Applies Otsu's thresholding to create binary images
- Resizing: Normalizes image size for the neural network
The system generates multiple variations of each labeled frame:
- Perspective Variation: Slight shifts in corner points
- Brightness Adjustment: Random brightness variations
- Contrast Adjustment: Random contrast variations
- Gaussian Blur: Random blur to simulate motion or focus issues
The SequenceModel uses a CNN-based architecture:
- Convolutional layers for feature extraction
- Dense layers for classification
- Multi-output design to predict each digit position independently
- Supports digits (0-9), decimal point (.), and sign (+/-)
Labels are stored in CSV format with columns:
Frame Number: Frame index in the videoLabel: Numeric value displayed (e.g., "123.45", "-67.8")Image Path: Path to the processed image directory
Predictions are saved as CSV files with:
- Timestamp
- Frame number
- Predicted values for each meter
- Processing time
- Frame Interval: Use larger intervals (e.g., 9) to reduce labeling effort while maintaining coverage
- Augmentation: Generate 10-20 variations per frame for better model generalization
- Multiple Meters: Label each meter separately with unique IDs
- Image Quality: Ensure good lighting and minimal obstructions for best results
MIT License