FlexPainter: Flexible and Multi-View Consistent Texture Generation

¹HKUST(GZ), ²HKUST ³Guangzhou Quwan Network Technology

* Equal contribution † Corresponding author

Abstract

Texture map production is an important part of 3D modeling and determines the rendering quality. Recently, diffusion-based methods have opened a new way for texture generation. However, restricted control flexibility and limited prompt modalities may prevent creators from producing desired results. Furthermore, inconsistencies between generated multi-view images often lead to poor texture generation quality. To address these issues, we introduce FlexPainter, a novel texture generation pipeline that enables flexible multi-modal conditional guidance and achieves highly consistent texture generation. A shared conditional embedding space is constructed to perform flexible aggregation between different input modalities. Utilizing such embedding space, we present an image-based CFG method to decompose structural and style information, achieving reference image-based stylization. Leveraging the 3D knowledge within the image diffusion prior, we first generate multi-view images simultaneously using a grid representation to enhance global understanding. Meanwhile, we propose a view synchronization and adaptive weighting module during diffusion sampling to further ensure local consistency. Finally, a 3D-aware texture completion model combined with a texture enhancement model is used to generate seamless, high-resolution texture maps. Comprehensive experiments demonstrate that our framework significantly outperforms state-of-the-art methods in both flexibility and generation quality.

Method

Above is the pipeline of our method. We first generate multi-view images using the conditional input from the user. The top two cases show the generation of using text-only or image-only conditions. Leveraging the linear operation in Equ.(\ref{equ:lo}), we can also perform text-guided image refinement (shown in green) and stylization using reference image (shown in blue). The right side shows our view-synchronization and weighting module. Consistent multi-view images can be generated by reprojection and weighting during each sampling step.

BibTeX

@article{yan2024flexipainter, title={FlexPainter: Flexible and Multi-View Consistent Texture Generation}, author={Dongyu Yan, Leyi Wu, Jiantao Lin, Luozhou Wang, Tianshuo Xu, Zhifei Chen, Zhen Yang, Lie Xu, Shunsi Zhang, Yingcong Chen}, journal={arXiv preprint arXiv:2506.02620}, year={2025} }

FlexPainter: Flexible and Multi-View Consistent Texture Generation

FlexPainter generates diverse, high-quality textures based on various flexible user prompts.

Video

Abstract

Method

Results

Video Gallery

BibTeX