PSYCH OpenIR  > 中国科学院行为科学重点实验室
Zero-shot voice conversion based on feature disentanglement
Na Guo1; Jianguo Wei1; Yongwei Li2; Wenhuan Lu1; Jianhua Tao3
第一作者Na Guo
通讯作者Li, Yongwei([email protected])
通讯作者邮箱[email protected] (y. li)
摘要

Voice conversion (VC) aims to convert the voice from a source speaker to a target speaker without modifying the linguistic content. Zero-shot voice conversion has attracted significant attention in the task of VC because it can achieve conversion for speakers who did not appear during the training stage. Despite the significant progress made by previous methods in zero-shot VC, there is still room for improvement in separating speaker information and content information. In this paper, we propose a zero-shot VC method based on feature disentanglement. The proposed model uses a speaker encoder for extracting speaker embeddings, introduces mixed speaker layer normalization to eliminate residual speaker information in content encoding, and employs adaptive attention weight normalization for conversion. Furthermore, dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. The experiments demonstrate that performance of the proposed model is superior to several state-of-the-art models, achieving both high similarity with the target speaker and intelligibility. In addition, the decoding speed of our model is much higher than the existing state-of-the-art models.

关键词Zero-shot voice conversion Mixed speaker layer normalization Adaptive attention weight normalization Dynamic convolution
2024
语种英语
DOI10.1016/j.specom.2024.103143
发表期刊Speech Communication
ISSN0167-6393
卷号165页码:10
期刊论文类型综述
收录类别EI
资助项目National Key R&D Pro-gram of China[2023YFB2603902] ; Tianjin Science and Technology Program[21JCZXJC00190] ; National Natural Science Foundation of China[62201571]
出版者ELSEVIER
WOS关键词SPARSE REPRESENTATION ; ADAPTATION ; SPEAKER
WOS研究方向Acoustics ; Computer Science
WOS类目Acoustics ; Computer Science, Interdisciplinary Applications
WOS记录号WOS:001340314300001
资助机构National Key R&D Pro-gram of China ; Tianjin Science and Technology Program ; National Natural Science Foundation of China
引用统计
文献类型期刊论文
条目标识符http://ir.psych.ac.cn/handle/311026/48789
专题中国科学院行为科学重点实验室
作者单位1.College of Intelligence and Computing, Tianjin University, Tianjin, China
2.CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
3.Department of Automation, Tsinghua University, Beijing, China
推荐引用方式
GB/T 7714
Na Guo,Jianguo Wei,Yongwei Li,et al. Zero-shot voice conversion based on feature disentanglement[J]. Speech Communication,2024,165:10.
APA Na Guo,Jianguo Wei,Yongwei Li,Wenhuan Lu,&Jianhua Tao.(2024).Zero-shot voice conversion based on feature disentanglement.Speech Communication,165,10.
MLA Na Guo,et al."Zero-shot voice conversion based on feature disentanglement".Speech Communication 165(2024):10.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Zero-shot voice conv(1881KB)期刊论文出版稿限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Na Guo]的文章
[Jianguo Wei]的文章
[Yongwei Li]的文章
百度学术
百度学术中相似的文章
[Na Guo]的文章
[Jianguo Wei]的文章
[Yongwei Li]的文章
必应学术
必应学术中相似的文章
[Na Guo]的文章
[Jianguo Wei]的文章
[Yongwei Li]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。