PSYCH OpenIR  > 中国科学院行为科学重点实验室
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Chenglong Wang1; Yang Gan1; Yifu Huo1; Yongyu Mu1; Murun Yang1; Qiaozhi He1; Tong Xiao1,2; Chunliang Zhang1,2; Tongran Liu3; Quan Du2; Di Yang2; Jingbo Zhu1,2
第一作者Chenglong
通讯作者邮箱[email protected]
摘要

Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difffculty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM). In this work, we continue the line of research. We present a Robust Visual Reward Model (RoVRM) which improves human-preference alignment for LVLMs. RoVRM leverages auxiliary textual preference data through a threephase progressive training and optimal transport-based preference data selection to effectively mitigate the scarcity of visual preference data. We experiment with RoVRM on the commonly used vision-language tasks based on the LLaVA1.5-7B and -13B models. Experimental results demonstrate that RoVRM consistently outperforms traditional VRMs. Furthermore, our three-phase progressive training and preference data selection approaches can yield consistent performance gains over ranking-based alignment techniques, such as direct preference optimization.

2024
DOI10.48550/arXiv.2408.12109
发表期刊arXiv
页码14
期刊论文类型综述
收录类别EI
引用统计
文献类型期刊论文
条目标识符http://ir.psych.ac.cn/handle/311026/48771
专题中国科学院行为科学重点实验室
作者单位1.School of Computer Science and Engineering, Northeastern University, Shenyang, China
2.NiuTrans Research, Shenyang, China
3.CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS, Beijing, China
推荐引用方式
GB/T 7714
Chenglong Wang,Yang Gan,Yifu Huo,et al. RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data[J]. arXiv,2024:14.
APA Chenglong Wang.,Yang Gan.,Yifu Huo.,Yongyu Mu.,Murun Yang.,...&Jingbo Zhu.(2024).RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data.arXiv,14.
MLA Chenglong Wang,et al."RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data".arXiv (2024):14.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
A Robust Visual Rewa(708KB)期刊论文作者接受稿限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Chenglong Wang]的文章
[Yang Gan]的文章
[Yifu Huo]的文章
百度学术
百度学术中相似的文章
[Chenglong Wang]的文章
[Yang Gan]的文章
[Yifu Huo]的文章
必应学术
必应学术中相似的文章
[Chenglong Wang]的文章
[Yang Gan]的文章
[Yifu Huo]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。