This repository has the code used to perform Deep Learning Video Colorization (DLVC) using the Denoising Diffusion Probabilistic Model (DDPM) performing denoising in the latent space. The frames are encoded by an encoder
To guide the denoising process of the diffusion model, a pre-trained visual transformer denoted Visual Attention Conditioning (VAC) is used, where the grayscale frame
The paper is available for download from the following link: Video Colorization Based on a Diffusion Model Implementation
The implementation of the project is presented below, where the pre-trained coders and the data flow are described:
A comparison of the quantitative results presented in the paper with those of other authors reveals an increase in the metrics of FID and CDC.
| Comparison | PSNR |
SSIM |
CDC |
FID |
|---|---|---|---|---|
| Lei et al. | 30.35 | - | - | |
| Chen et al. | - | - | ||
| Huang et al. | 30.61 | - | - | |
| Ours | 27.95 | 0.27 |
Additionally, the qualitative results exhibited a favorable color distribution within the frames, although some artifacts were observed in the details, particularly in the form of faces and text.
| Original frame |
Gray version |
Colorized version |
|---|---|---|
Make sure you’ve got these bad boys installed before you dive in:
torch>= 2.0.1torchvision>= 0.15.2cuda>= 12.1diffusers>= 0.16.1
To evaluate your model (aka see some magic happen), use the video_colorization.py, script. Just point it to the dataroot with your grayscale video, and voilà, your colorized video will be saved at /video_output/colorized_video.mp4 No need for a time machine!
Want to train on your own data? Great! Just follow these easy-peasy steps.
First, get your data organized like a tidy computer scientist. Here's the structure you need:
└── root
└── video
├── 00000.jpg
├── 00001.jpg
└── 00002.jpg
└── video1
├── 00000.jpg
├── 00001.jpg
└── 00002.jpg
└── video2
├── 00000.jpg
├── 00001.jpg
└── 00002.jpg
Next up, let’s create the latent space of those video frames. Use feature_exctration.py to do the heavy lifting. After running it, you’ll get a fancy latent.npz containing all tensor will be created at the folder /data/DATASET_NAME/. This is basically the secret sauce that DMLC uses to colorize your grayscale frames.
Now for the main event: training. Use the train_diffusion.py script to kick off the training process. The network topology is defined in modules.py, where is possible change how layers are present. or deeper layers (more like diving into the deep end), you can adjust the net_dimension parameter. Once trained, your model will be stored in the unet_model folder. Success.
@inproceedings{stival2024video,
title={Video Colorization Based on a Diffusion Model Implementation},
author={Stival, Leandro and da Silva Torres, Ricardo and Pedrini, Helio},
booktitle={Intelligent Systems Conference},
pages={117--131},
year={2024},
organization={Springer}
}








