Enhancing Video Colorization with Deep Learning: A Comprehensive Analysis of Training Loss Functions

Description

This repository contains research on the use of deep neural networks for the automatic colorization of black-and-white videos. The approach extends image colorization techniques to video by employing a autoencoder with U-Net architecture based to predict denoised and colorized frames. Training was conducted using the DAVIS dataset, where a number of loss function combinations were tested to identify the optimal configuration for maintaining object and structure integrity while achieving high-quality colorization.

The article can be found here: Enhancing Video Colorization with Deep Learning: A Comprehensive Analysis of Training Loss Functions

Requirement

torch >= 1.13
torchvision >= 0.4
cuda >= 11.6
vit Pytorch >= 0.40.2

Network Topology

Results

This section displays qualitative results through images of various loss function combinations, highlighting visual quality and artifacts. Quantitative results are summarized with metrics like SSIM, PSNR, and LPIPS, providing a detailed performance evaluation of each configuration.

Visual Results

Gray Frame input	MAE + SSIM	MSE + SSIM	MAE + Content	MSE + Content

Reference Frame	MAE + LPIPS	MSE + LPIPS	MAE + Perceptual	MSE + Perceptual

Results: Combined Loss Functions

Number of Loss Functions	Training Loss Functions	SSIM ↑	PSNR ↑	LPIPS ↑
Single	MSE	0.970	45.460	0.021
Single	MAE	0.970	44.225	0.023
Two	MAE + SSIM	0.967	41.944	0.024
Two	MSE + SSIM	0.973	48.099	0.021
Two	MAE + Content	0.972	46.999	0.022
Two	MSE + Content	0.953	33.981	0.041
Two	MSE + LPIPS	0.962	40.570	0.028
Two	MAE + LPIPS	0.965	42.585	0.026
Two	MAE + Perceptual	0.967	43.570	0.023
Two	MSE + Perceptual	0.970	46.496	0.020
Three	MAE + SSIM + Perceptual	0.975	49.532	0.026
Three	MSE + SSIM + Perceptual	0.975	49.532	0.026
Three	MSE + LPIPS + SSIM	0.974	49.132	0.024
Three	MAE + LPIPS + SSIM	0.968	44.883	0.023
Three	MAE + LPIPS + Content	0.961	39.772	0.030
Three	MSE + LPIPS + Content	0.970	47.040	0.019
Three	MAE + SSIM + LPIPS	0.966	40.183	0.026
Three	MSE + SSIM + LPIPS	0.974	48.691	0.024
Three	MAE + SSIM + Style	0.971	43.463	0.025
Three	MSE + SSIM + Style	0.974	47.653	0.023

Datasets

Two datasets are utilized in the training process, the DAVIS 2017 (Densely Annotated VIdeo Segmentation) to training the weights and to validate the results of the model.

Data Format

The input to colorize inference needs to be an monocromatic video and an example frame (preference of this video). The code will resize and normalize the frames to predict the color. At the end, the video colorized will be saved at the videos_output folder.

Evaluation

To evalute the mode execute the file main.py and put in the variable str_dt one of the model name in the folder trained_models.

Also, if you want to train yout own model using the losses combination you just need to run the loop_train_all_losses.py, this script execute the train.py for each loss combionation defined in the criterions list.

Citation

@InProceedings{stival2024enhancing,
author="Stival, Leandro and da Silva Torres, Ricardo and Pedrini, Helio",
editor="Arai, Kohei",
title="Enhancing Video Colorization with Deep Learning: A Comprehensive Analysis of Training Loss Functions",
booktitle="Intelligent Systems and Applications",
year="2024",
publisher="Springer Nature Switzerland",
address="Cham",
pages="496--509",
isbn="978-3-031-66329-1"
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
architectures		architectures
architectures_losses		architectures_losses
cdc.py		cdc.py
evaluation.py		evaluation.py
graphic_results.py		graphic_results.py
load_data.py		load_data.py
loop_train_all_losses.py		loop_train_all_losses.py
main.py		main.py
read_data.py		read_data.py
readme.md		readme.md
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Video Colorization with Deep Learning: A Comprehensive Analysis of Training Loss Functions

Description

Requirement

Network Topology

Results

Visual Results

Results: Combined Loss Functions

Datasets

Data Format

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Enhancing Video Colorization with Deep Learning: A Comprehensive Analysis of Training Loss Functions

Description

Requirement

Network Topology

Results

Visual Results

Results: Combined Loss Functions

Datasets

Data Format

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages