Institutional Repository of Key Laboratory of Behavioral Science, CAS
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data | |
Chenglong Wang1; Yang Gan1; Yifu Huo1; Yongyu Mu1; Murun Yang1; Qiaozhi He1; Tong Xiao1,2; Chunliang Zhang1,2; Tongran Liu3; Quan Du2; Di Yang2; Jingbo Zhu1,2 | |
第一作者 | Chenglong |
通讯作者邮箱 | [email protected] |
摘要 | Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difffculty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a threephase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization. |
2024 | |
DOI | 10.48550/arXiv.2408.12109 |
发表期刊 | arXiv |
页码 | 14 |
期刊论文类型 | 综述 |
收录类别 | EI |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.psych.ac.cn/handle/311026/48771 |
专题 | 中国科学院行为科学重点实验室 |
作者单位 | 1.School of Computer Science and Engineering, Northeastern University, Shenyang, China 2.NiuTrans Research, Shenyang, China 3.CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS, Beijing, China |
推荐引用方式 GB/T 7714 | Chenglong Wang,Yang Gan,Yifu Huo,et al. RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data[J]. arXiv,2024:14. |
APA | Chenglong Wang.,Yang Gan.,Yifu Huo.,Yongyu Mu.,Murun Yang.,...&Jingbo Zhu.(2024).RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data.arXiv,14. |
MLA | Chenglong Wang,et al."RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data".arXiv (2024):14. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
A Robust Visual Rewa(708KB) | 期刊论文 | 作者接受稿 | 限制开放 | CC BY-NC-SA | 请求全文 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论