语音识别抑郁症的关键技术研究

PSYCH OpenIR > 社会与工程心理学研究室

	语音识别抑郁症的关键技术研究
其他题名	Key Technologies of Detecting Depression with Voice Features
	潘玮
	2020-05
摘要	抑郁症是一种以抑郁症状为核心并伴随大量其他症状的精神疾病。目前诊断以主观为主，而客观的评估工具对促进抑郁症的更加快速和准确的治疗尤为重要。语音数据临床容易获取，但是语音与抑郁症二者之间还存在以下问题:语音特征是否显著预测抑郁症，纳人混淆变量一一人口学信息后，语音对抑郁症预测的贡献大小;语音特征能否区分是否抑郁;二者关联是否跨情境跨情绪稳定;以及语音特征是否能够在复杂临床诊断情境中保持高鉴别力。研究一通过二元逻辑回归模型调查语音特征与抑郁症之间的关联是否显著。并纳人人口统计学信息，将其对预测是否抑郁的贡献作为基线水平。本研究收集584抑郁症患者和548名健康人的语音数据。结果发现，有四种语音特征对抑郁症预测起到了主要贡献:PC1 (OR=0.58, P <0.0001) , PC6 (OR=1.57, P <0.001) ,PC17 ( OR = 1. 53 , P <0.0001)和PC24 ( OR = 1.45 , P <0. 05 )。语音特征对抑郁症的单独贡献达到了35.65% (Nagelkerke's R2)。研究二设立三种分类模型:单独基于语音的模型;单独基于人口学变量的模型;基于语音与人口学变量的模型。同时该研究纳人了其他数据集作为测试集以便说明模型的泛化能力。本研究包含三个语音数据集，数据集一同研究一，用于分类模型构建。数据集二包含500名抑郁症患者，404名健康人。数据集三包含45名抑郁症患者与58名健康人。结果发现，与以人口学变量建立的抑郁症分类预测模型相比，包含语音的模型(单独基于语音的模型;基于语音和人口学变量建立的模型)一致的达到了较高的分类准确性(F-measure)。在其他数据集上进行测试，得到的结果也是一致的。在该研究中，语音特征单独预测模型在不同测试集上的分类准确性均达到80% 。研究三收集了45名抑郁症患者与58名健康人的语音数据。研究采用了3(情绪状态:正性，中性，负性)*3(任务类型:语言问答，文本朗读，图片描述)的实验设计，运用机器学习分类算法一一逻辑回归(Logistic Regression, LR)来构建抑郁识别模型。实验结果表明，语音对不同情境下不同情绪状态下的AUC值均在0.6以上(65.7-80.9)，语音的抑郁识别准确性可以达到82.9% o 研究四设定了三种不同的分类任务:1)对健康与非健康组进行分类;2)对健康组与各种精神疾病进行分类;3)对精神疾病两两分类。匹配后有32名躁郁症患者，抑郁症患者106例，健康患者114例，精神分裂症患者20例。从语音中提取MFCC特征并抽取i-vectors。逻辑回归模型评估结果显示:分类抑郁症和双相障碍的模型AUC值为0.5 (F-score=0.44 )。对于其他分类任务，AUC值均在0.75到0.92之间(F-score:0.73~0.91)。在模型性能的比较上，差异检验发现，抑郁症和双相障碍分类模型的性能(AUC )显著差于针对双相障碍与精神分裂症的分类模型(corrected P < 0.05 )。其他分类任务模型好坏差异不显著。而语音特征对抑郁症和双相障碍的分类效果不理想。本研究对语音特征与抑郁症的关系进行了系统的探讨，说明了以下几点:(1)语音特征能够显著预测抑郁症，语音对抑郁症具有可观的贡献;(2)语音特征能够实际预测抑郁症，模型具有一定的泛化能力;(3)语音的预测作用是跨情境跨情绪稳定的;(4)语音能够在精神疾病临床诊断的复杂情境下具有较高的鉴别能力。这些关键技术研究为进一步探究语音作为临床抑郁症诊断工具的可能性奠定了坚实的基础。
其他摘要	Depression is characterized with depressed mood and other complicated symptoms, which makes it particularly important to find a more objective assessment tool to promote faster and more accurate treatment of depression. Speech data is easily accessible clinically, and the research between speech and depression is still problematic in following aspects: whether speech features can significantly predict depression, and to what extent speech contributes predicting depression, comparing to confounding factors-demographic information; whether speech features can successfully classify depression or not in practice; can speech features predict depression across different contexts and emotions; and whether speech features can maintain good discrimination power in complex clinical diagnostic situations. In study 1，demographic information is included, whose contribution to predicting depression is taken as the baseline. This study collected speech data from 584 depression patients and 548 healthy people. Results showed that there are multiple speech characteristics for depression PC1 (OR=0.58, P <0.0001), PC6 (OR=1.57, P<0.001), PC17 (0R=1.53, P <0.0001), and PC24 (OR=1.45,P <0.05). Speech features alone contributed to depression with an amount of 35.65% (Nagelkerke's R2). Study 2 established three classification models: independent speech-based models; separate demographic-based models; and models based on both speech and demographic variables. The study contains three datasets. Dataset 1 is the same as in study 1 for classification model construction and testing. Dataset 2 contains 500 depression patients, 404 healthy people. And dataset 3 contains 45 depression patients and 58 healthy people. It showed that, compared with demographic variables, models including speech all reached generally higher predicting accuracy (F -measure). Even when tested on other data sets, the results are consistent. Voice input only model tested on different test sets all reach 80%. Study 3 collected 45 depressed patients and 58 healthy people. The research adopted 3 (emotional state: positive, neutral, negative)*3 (task type: question and answer，text reading, picture description) experimental design. With classification algorithm-Logistic Regression. Results found that the average value of AUC of speech in different situations and different emotional states is universally above 0.6 (65.7-80.9), and the accuracy of depression recognition of speech reached 82.9%.Study 4 set up 3 type classification tasks: 1) classifying healthy and non-healthy groups; 2) classifying health and each mental illness; 3) pairwise classification among mental illnesses. After matching, there were 32 patients with bipolar disorder; 106 patients with depression, 114 healthy patients, and 20 with schizophrenia. We extracted MFCC features from speech and extracted i-vectors. After logistic regression modeling and model performance examination, results show that the AUC value of the model for classification of depression and bipolar disorder is 0.5 (F score =0.44). But for other classification tasks, the AUC value is between 0.75 and 0.92 (F-score range: 0.73~0.91). For model comparison, difference test found that the performance of the classification model of depression and bipolar disorder (AUC) was significantly worse than the classification model for bipolar disorder and schizophrenia (adjusted P <0.05). The difference between other classification task models is not significant. Speech features are not ideal for the classification of depression and bipolar disorder. This study systematically explores the relationship between speech features and depression, and illustrates the following: (1) speech features can significantly predict depression, and speech significantly contributes to depression; (2) speech features predict depression practically, the model has high generalization ability; (3) the predictive role of speech is stable across contexts and emotions; (4) speech can have higher discrimination ability in the complex context of clinical diagnosis of mental illness. These studies provide an alternative basis for further exploring the possibilities of speech as a diagnostic tool for clinical depression.
关键词	抑郁症语音特征辅助诊断机器学习分类识别
学位类型	博士
语种	中文
学位名称	理学博士
学位专业	应用心理学
学位授予单位	中国科学院心理研究所
学位授予地点	中国科学院心理研究所
文献类型	学位论文
条目标识符	http://ir.psych.ac.cn/handle/311026/31761
专题	社会与工程心理学研究室
推荐引用方式 GB/T 7714	潘玮. 语音识别抑郁症的关键技术研究[D]. 中国科学院心理研究所. 中国科学院心理研究所,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
潘玮-博士学位论文.pdf（2623KB）	学位论文		限制开放	CC BY-NC-SA	请求全文