Official code for d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining.
Accepted in the International Conference on Pattern Recognition (ICPR) 2024.
Note: This release is tested on Python 3.9.16.
git clone https://github.com/prasunroy/dsketch.git
cd dsketch
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt- Download the Flickr20 dataset and extract into
datasets/flickr20directory. - Run
lctn_train.pywith the following options.
lctn_train.py [-h] [--sd_path SD_PATH] [--mixed_precision {no,fp16,bf16,fp8}] [--force_cpu]
[--data_root DATA_ROOT] [--image_size IMAGE_SIZE] [--batch_size BATCH_SIZE]
[--shuffle] [--num_workers NUM_WORKERS] [--lr LR] [--steps STEPS]
[--output_freq OUTPUT_FREQ] [--output_root OUTPUT_ROOT]python lctn_train.py --sd_path stabilityai/stable-diffusion-2-1 --mixed_precision fp16 --data_root ./datasets/flickr20/ --image_size 768 --batch_size 4 --shuffle --num_workers 8 --lr 0.001 --steps 50000 --output_freq 100 --output_root ./output/- Download the sample sketches and extract into
result/sample_sketchesdirectory. - (Optional) Copy the best checkpoint
<OUTPUT_ROOT>/<TIMESTAMP>/lctn.pthintocheckpointsdirectory. - Run
lctn_sample.pywith the following options.
lctn_sample.py [-h] [--seed SEED] [--prompt PROMPT] [--sketch SKETCH] [--image_size IMAGE_SIZE]
[--guidance_scale GUIDANCE_SCALE] [--noising_scale NOISING_SCALE] [--steps STEPS]
[--sd_path SD_PATH] [--lctn_path LCTN_PATH] [--mixed_precision {no,fp16,bf16,fp8}]
[--force_cpu] [--output_dir OUTPUT_DIR]python lctn_sample.py --seed 11111111 --prompt "photo of a fox" --sketch ./result/sample_sketches/fox.png --image_size 768 --guidance_scale 8.0 --noising_scale 0.8 --steps 50 --sd_path stabilityai/stable-diffusion-2-1 --lctn_path ./checkpoints/lctn_flickr20.pth --mixed_precision fp16 --output_dir ./result/fox/@inproceedings{roy2022dsketch,
title = {d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining},
author = {Roy, Prasun and Bhattacharya, Saumik and Ghosh, Subhankar and Pal, Umapada and Blumenstein, Michael},
booktitle = {The International Conference on Pattern Recognition (ICPR)},
month = {December},
year = {2024}
}
Copyright 2024 by the authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
