PSYCH OpenIR  > 中国科学院行为科学重点实验室
Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition
Yang Liu1; Xin Chen1; Yuan Song1; Yarong Li1; Shengbei Wang2; Weitao Yuan2; Yongwei Li3; Zhen Zhao1
第一作者Yang Liu
通讯作者邮箱[email protected] (zhao, zhen)
摘要

In speech emotion recognition, existing models often struggle to accurately classify emotions with high similarity. In this paper, we propose a novel architecture that integrates a multi-view attention network (MVAN) and diffusion joint loss to alleviate confusion by placing a stronger focus on emotions that are challenging to classify accurately. First, we use logarithmic Mel-spectrograms (log-Mels), deltas, and delta- deltas of log-Mels as three-dimensional features to minimize external interference. Then, we design the MVAN to extract effective multi-time scale emotion features, where the channel and spatial attention are used to selectively localize the regions in the input features related to the target emotion. A Multi-time view bidirectional long and short-term memory network is used to extract the shallow edge features and deep semantic features, and multi-scale self-attention fuses these features through cross-scale attention fusion to obtain multi-time scale emotion features. Finally, a diffusion joint loss strategy is introduced to distinguish the emotional embeddings with high similarity by the generated complex emotion triplets in a diffusing fashion. We evaluated our proposed method on the Interactive Emotional Mood Binary Motion Capture (IEMOCAP), Chinese Academy of Sciences Automation Institute of Automation (CASIA), and Berlin German Emotion Speech Bank (EMODB) corpus. The results show significant improvements over existing methods, achieving 86.87% WA, 86.60% UA, and 86.82% WF1 on IEMOCAP; 70.74% WA, 70.74% UA, and 70.25% WF1 on CASIA; and 93.65% WA, 91.13% UA, and 92.26% WF1 on EMODB. These results confirm the superiority of our method.

关键词Speech emotion recognition Multi-view attention network Diffusion joint loss
2024
语种英语
DOI10.1016/j.engappai.2024.109219
发表期刊Engineering Applications of Artificial Intelligence
卷号137页码:15
收录类别SCI ; EI
WOS分区Q1
引用统计
文献类型期刊论文
条目标识符http://ir.psych.ac.cn/handle/311026/48745
专题中国科学院行为科学重点实验室
作者单位1.School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China
2.School of Computer Science and Software, TianGong University, Tianjing 300387, China
3.CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100089, China
推荐引用方式
GB/T 7714
Yang Liu,Xin Chen,Yuan Song,et al. Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition[J]. Engineering Applications of Artificial Intelligence,2024,137:15.
APA Yang Liu.,Xin Chen.,Yuan Song.,Yarong Li.,Shengbei Wang.,...&Zhen Zhao.(2024).Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition.Engineering Applications of Artificial Intelligence,137,15.
MLA Yang Liu,et al."Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition".Engineering Applications of Artificial Intelligence 137(2024):15.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Discriminative featu(4160KB)期刊论文出版稿限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Yang Liu]的文章
[Xin Chen]的文章
[Yuan Song]的文章
百度学术
百度学术中相似的文章
[Yang Liu]的文章
[Xin Chen]的文章
[Yuan Song]的文章
必应学术
必应学术中相似的文章
[Yang Liu]的文章
[Xin Chen]的文章
[Yuan Song]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。