RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

doi:10.48550/arXiv.2408.12109

	RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
	Chenglong Wang 1; Yang Gan 1; Yifu Huo 1; Yongyu Mu 1; Murun Yang 1; Qiaozhi He 1; Tong Xiao 1,2; Chunliang Zhang 1,2; Tongran Liu3 ; Quan Du 2; Di Yang 2; Jingbo Zhu 1,2
第一作者	Chenglong
通讯作者邮箱	[email protected]
摘要	Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difffculty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a threephase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization.
	2024
DOI	10.48550/arXiv.2408.12109
发表期刊	arXiv
页码	14
期刊论文类型	综述
收录类别	EI
引用统计
文献类型	期刊论文
条目标识符	http://ir.psych.ac.cn/handle/311026/48771
专题	中国科学院行为科学重点实验室
作者单位	1.School of Computer Science and Engineering, Northeastern University, Shenyang, China 2.NiuTrans Research, Shenyang, China 3.CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS, Beijing, China
推荐引用方式 GB/T 7714	Chenglong Wang,Yang Gan,Yifu Huo,et al. RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data[J]. arXiv,2024:14.
APA	Chenglong Wang.,Yang Gan.,Yifu Huo.,Yongyu Mu.,Murun Yang.,...&Jingbo Zhu.(2024).RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data.arXiv,14.
MLA	Chenglong Wang,et al."RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data".arXiv (2024):14.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
A Robust Visual Rewa（708KB）	期刊论文	作者接受稿	限制开放	CC BY-NC-SA	请求全文