Appearance-aware Multi-view SVBRDF Reconstruction via Deep Reinforcement Learning

SIGGRAPH 2025 (Conference track)

Pengfei Zhu¹, Jie Guo^1†, Yifan Liu¹, Qi Sun¹, Yanxiang Wang¹, Keheng Xu¹, Ligang Liu², Yanwen Guo^1†,

¹Nanjing University ²University of Science and Technology of China (USTC)
^†Joint Corresponding Authors

Abstract

Recent advancements in deep learning have revolutionized the reconstruction of spatially-varying surface reflectance of real-world objects. Many existing methods have successfully recovered high-quality reflectance maps using a remarkably limited number of images captured by a lightweight handheld camera and a flash-like light source. As the samples become sparse, the choice of the sampling set has a significant impact on the results. To determine the best sampling set for each material while ensuring minimal capture costs, we introduce an appearance-aware adaptive sampling method in this paper. We model the sampling process as a sequential decision-making problem, and employ a deep reinforcement learning (DRL) framework to solve it. At each step, an agent (NBVL Planner), after trained on a specially designed dataset, plans the next best view-lighting (NBVL) pair based on the appearance of the material recognized so far. Once stopped, the sequence of the NBVLs constitutes the best sampling set for the material. We show, through extensive experiments on both synthetic materials and real-world cases, that the best sampling set extracted by our method outperforms other sampling sets, especially for challenging materials featuring globally-varying specular reflectance.

Method

Overview of our DRL pipeline. The NBVL Planner takes as input the action of the previous step and the captured image \( \mathbf{I}_t \). The ResNet encoder extracts features related to the material and the current sampling process, and uses GRU to fuse the information of this sample with all previous features (which is represented as the memory \( h_{t-1} \)). Subsequently, the Actor network predicts a probability density function over all actions, where the action with the highest probability is selected as the NBVL \( (\mathbf{I}_{t+1},\mathbf{v}_{t+1}) \), forming the next state \( s_{t+1} \). Based on them, the agent obtains the reward \( r_t \) for this step through the Reward Provider and proceeds to the next cycle. Once the agent decides to \( \textit{STOP} \), the samples of all steps constitute the best sampling set for the material.

Acknowledgements

We express our gratitude to the anonymous reviewers for their professional and insightful comments. We are also thankful to Yexin Jiang for engaging in discussions and offering invaluable suggestions. This work was supported by the National Natural Science Foundation of China (No. 61972194 and No. 62032011) and the National Science Foundation of Jiangsu Province (No. BK20211147).

BibTeX

@inproceedings{zhu2025appearance, title={Appearance-aware Multi-view SVBRDF Reconstruction via Deep Reinforcement Learning}, author={Zhu, Pengfei and Guo, Jie and Liu, Yifan and Sun, Qi and Wang, Yanxiang and Xu, Keheng and Liu, Ligang and Guo, Yanwen}, year={2025}, booktitle={SIGGRAPH'25 Conference Proceedings}, }