Our method, DiffuseKronA, achieves superior image quality and accurate text-image correspondence across diverse input images and prompts, all the while upholding exceptional parameter efficiency. In this context, \([V]\) denotes a unique token used for fine-tuning a specific subject in the text-to-image diffusion model.
For more results, please visit gallery!
In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced sensitivity to hyperparameters, leading to a compromise between parameter efficiency and the quality of T2I personalized image synthesis. Addressing these constraints, we introduce DiffuseKronA, a novel Kronecker product-based adaptation module that not only significantly reduces the parameter count by up to 35% and 99.947% compared to LoRA-DreamBooth and the original DreamBooth, respectively, but also enhances the quality of image synthesis. Crucially, DiffuseKronA mitigates the issue of hyperparameter sensitivity, delivering consistent high-quality generations across a wide range of hyperparameters, thereby diminishing the necessity for extensive fine-tuning. Evaluated against diverse and complex input images and text prompts, DiffuseKronA consistently outperforms existing models, producing diverse images of higher quality with improved fidelity and a more accurate color distribution of objects, thus presenting a substantial advancement in the field of T2I generative modeling.
For the results and configurations, please visit model ablations!
Backbone | Model | Train. Time | # Param | Model size |
---|---|---|---|---|
SDXL | DiffuseKronA | ~ 40 min. | 3.8 M | 14.95 MB |
LoRA-DreamBooth | ~ 38 min. | 5.8 M | 22.32 MB | |
SD | DiffuseKronA | ~ 5.52 min. | 0.52 M | 2.1MB |
LoRA-Dreambooth | ~ 5.3 min. | 1.09 M | 4.3MB |
@InProceedings{Marjit_2025_WACV,
author = {Marjit, Shyam and Singh, Harshit and Mathur, Nityanand and Paul, Sayak and Yu, Chia-Mu and Chen, Pin-Yu},
title = {DiffuseKronA: A Parameter Efficient Fine-Tuning Method for Personalized Diffusion Models},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
month = {February},
year = {2025},
pages = {3529-3538}
}