Skip to content

Latest commit

 

History

History
140 lines (91 loc) · 5.94 KB

File metadata and controls

140 lines (91 loc) · 5.94 KB

VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning

HuggingFace HuggingFace

Baolu Li*, Yiming Zhang*, Qinghe Wang*†, Liqian Ma✉, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Zhenfei Yin, Yunzhi Zhuge, Huchuan Lu, Xu jia✉
* equal contribution † project leader ✉ corresponding author

💡 Abstract

Visual effects (VFX) are crucial to the expressive power of digital media, yet their creation remains a major challenge for generative AI. Prevailing methods often rely on the one-LoRA-per-effect paradigm, which is resource-intensive and fundamentally incapable of generalizing to unseen effects, thus limiting scalability and creation. To address this challenge, we introduce VFXMaster, a unified, reference-based framework for VFX video generation. It recasts effect generation as an in-context learning task, enabling it to reproduce diverse dynamic effects from a reference video onto target content. In addition, it demonstrates remarkable generalization to unseen effect categories. Specifically, we design an in-context conditioning strategy that prompts the model with a reference example. An in-context attention mask is designed to precisely decouple and inject the essential effect attributes, allowing a single unified model to master the effect imitation without information leakage. In addition, we propose an efficient test-time adaptation mechanism to boost generalization capability on tough unseen effects from a single user-provided video rapidly. Extensive experiments demonstrate that our method effectively imitates various categories of effect information and exhibits outstanding generalization to out-of-domain effects.

VFXMaster_Demo.mp4

✅ TODO List

  • Release our inference code
  • Release our training code
  • Release our model weights
  • Release our datasets

🐍 Installation

# Clone this repository.
git clone https://github.com/libaolu312/VFXMaster.git
cd VFXMaster

# Install requirements
conda create -n vfxmaster python=3.10 -y
conda activate vfxmaster
pip install -r requirements.txt 

📦 Model Weights

Folder Structure

VFXMaster
└── training_weight
    ├── VFXMaster_Weight
    │   └── In-Context-Conditioning
    │   └── One-shot-Adaptation
    ├── CogVideoX-Fun-V1.1-5b-InP

Download Links

hf download 8ruceLi/VFXMaster --local-dir training_weight/VFXMaster_Weight
hf download alibaba-pai/CogVideoX-Fun-V1.1-5b-InP --local-dir training_weight/CogVideoX-Fun-V1.1-5b-InP

🔄 Inference

Inference requires ≥ 34GB of GPU memory (tested on a single NVIDIA A800-SXM4-80GB).

bash scripts/inference.sh
bash scripts/inference_tta.sh

🖥️ Training

Training on our datasets:

# download our datasets
mkdir datasets/VFXMaster_datasets
hf download --repo-type dataset 8ruceLi/VFXMaster --local-dir datasets/VFXMaster_datasets
cd datasets/VFXMaster_datasets
tar -xvf data.tar
cd ../..
# In-Context-Conditioning
bash scripts/train.sh
# One-shot-Adaptation
bash scripts/inference_tta.sh

If you want to train on your own datasets:

# download Qwen2.5-VL for captioning
hf download Qwen/Qwen2.5-VL-72B-Instruct --local-dir training_weight/Qwen2.5-VL-72B-Instruct
# processing training data
cd datasets/prepare_dataset
bash prepare_dataset.sh
# generate a new first frame image caption based on an existing reference effects video.
bash caption_first_frame.sh

🤝 Acknowledgements

We would like to express our gratitude to the following open-source projects that have been instrumental in the development of our project:

  • OpenVFX: An open source VFX datasets.
  • CogVideo: An open source video generation framework.
  • VideoX-Fun: A training library for diffusion models.
  • Qwen: A great language model.

Special thanks to the contributors of these libraries for their hard work and dedication!

📚 Contact

If you have any suggestions or find our work helpful, feel free to contact us.

We are actively looking for compute resources to scale VFXMaster to stronger base models — if you're interested in collaborating or sponsoring, don't hesitate to contact us!

Email: 8ruceli3@gmail.com

If you find our work useful, please consider giving a star to this github repository and citing it:

@article{li2025vfxmaster,
        title={VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning},
        author={Li, Baolu and Zhang, Yiming and Wang, Qinghe and Ma, Liqian and Shi, Xiaoyu and Wang, Xintao and Wan, Pengfei and Yin, Zhenfei and Zhuge, Yunzhi and Lu, Huchuan and others},
        journal={arXiv preprint arXiv:2510.25772},
        year={2025}
      }