PSYCH OpenIR  > 社会与工程心理学研究室
基于词典的中国古文心理语义分析关键技术研究
其他题名Critical Technology in Ancient Chinese Psychological Semantic Analysis Based on LIWC
范妙榕
导师朱廷劭
2021-01
摘要中国悠久的历史蕴含着巨大的研究价值。历史由人所创造,对历史的研究离不开对历史中的“人”的研究,即历史中个体与群体的心理与行为及其背后的社会文化因素等。以往的历史研究多为定性研究,鲜有量化分析。此外对古籍的研究要求研究者需要具备相当的文言文阅读能力基础。随着计算机语言学的发展,在利用计算机处理现代汉语方面,已经有许多成熟的文本分析方法与工具。而对于古汉语文本,目前还没有比较完善的可用于分析古文语义的工具被开发出来。基于语词计量的文本分析工具(LIWC)以关键词词典为基础,主要关注于文本的心理学语义,可对语句的心理学意义进行量化分析,广泛应用于情绪性书写等语言相关的心理学研究中。 本文首先搭建了古籍文档库,涵盖了从春秋到清朝的史书、自我表达文本,及17亿字的古籍文本数据库,作为文本分析的数据库。在预实验,我们探索了如何将关键词词频统计法应用于古文文本分析,并得到分析结果,与我们对历史的真实认知相吻合。在研究一中,我们基于现有的简体中文LIWC词典(SC-LIWC)构建了用于古文心理语义分析的古文版本LIWC(CC-LIWC)词典,并使用词典进行实际的分析,通过假设检验考察其分析的效果,证明了古文版本LIWC可以有效区分不同文本表达之间心理学意义的差异。在研究二中,我们通过人工标注与相关性检验对古文词典进行效度验证并对词典进行优化。 预实验考察了关键词词频统计法在古文文本分析中的应用,旨在证明词频统计法在古文文本分析中的效果,为接下来的研究提供支持。预实验选取了中国传统文化中儒家思想的其中两个代表性词汇“孝”和“礼”作为词频统计的关键词,统计“孝”和“礼”在历代史书中的词频变化趋势,并解释了其变化的原因。在儒家思想中,“孝”作为“礼”的一部分,两者之间存在相关关系,因此我们对“孝”和“礼”的词频进行了相关性检验,证明了两者具备中等以上的相关关系,为后续的研究提供了依据。 研究一包括2个实验,实验1通过获取在线汉语词典的全部词汇及其对应解释,将SC-LIWC词与古汉语词进行匹配,获得古文版本LIWC(CC-LIWC)词典的候选词。接着对候选词进行人工标注,去除含义不匹配的词汇,并进行二次核查,最终生成CC-LIWC词典,包含了79个词类与49136个文言文词条。实验2使用CC-LIWC词典对孔子与韩非子的自我表达文本进行分析,并进行差异检验,分析结果体现了孔子与韩非子在情感过程、认知过程、动机等方面的差异,进而发现了其背后所代表的儒家与法家的价值观之间的差异。 研究二包括2个实验,实验3旨在验证CC-LIWC词典的效度。研究中选择了词典中的7个心理学特性相关类进行效度验证,并挑选了35篇古文文本,对每篇古文依据7个维度进行人工评分,把人工评分与LIWC评分进行相关性检验。实验4则是对前期分析发现的词典的问题进行改进,对词典进行优化,并提出了后续的改善思路。 本文为了解决古文文本语义分析的问题,构建了CC-LIWC词典并使用词典对古文文本进行分析,最后对词典进行效度验证与优化。实验结果表明,利用LIWC与人类评分者对古文文本材料进行评分,优化后的CC-LIWC分数与人工评分的相关性在可接受范围内,部分维度可达到水平中等以上相关(p<0.01)。由此可得,CC-LIWC词典在分析古文文本的心理学语义方面具备一定的效度。
其他摘要China has a long and glorious history and of great value to be discovered. History is created by people. The research into human beings are indispensable to the research of history. When we do research on history, we should pay great attention to the psychology, behaviours of individuals and groups and the social and cultural factors behind.In the past, most of researches into history focused on qualitative approach, and seldom focused on quantitative analysis. Besides, doing research on Chinese ancient history requires great understanding of classical Chinese, which is difficult to general public. With the development of computer linguistics, there are many mature methods and tools to analyse text that using computer to deal with modern Chinese. But there is few such tool to use for analysing ancient Chinese meanings. LIWC is a dictionary based on word counting and focused on psychology meanings of text. It is skilled at quantitative analysis in sentences and articles and is widely used in emotional writing and other linguistics related psychology researches. Firstly, we constructed the Ancient Chinese Archive, covering historical books and self-expression articles from the Spring and Autumn period to Qing dynasty and ancient books of 1.7 billion characters, used as database for text analysis. In the pretest study, we applied the word frequency method to analyse classical Chinese texts and figured out that the result is consistent with our knowledge of Chinese ancient history. Based on SC-LIWC, Study 1 constructed CC-LIWC and used it for analysis. The outcoming was verified using hypothesis test method, proved the capability of CC-LIWC for discovering the psychological difference between expressions. In Study 2, we validated CC-LIWC by comparing human scoring and LIWC scoring and tried to optimise the dictionary. Pretest study used word frequency method to analyse classical Chinese texts, and examined the outcoming, providing support to the coming researches. The pretest study chose Xiào (孝, filial piety) and Lǐ (禮/礼, proper rite) as two representative words of Confucianism in Chinese traditional culture. We analysed the change process of the word frequency of these two words in Chinese historical books through dynasties and explained the reason behind the trends. In Confucianism, filial piety is part of proper rite, the correlation coefficient of frequency of Xiào and Lǐ is calculated and there was a moderate correlation between them. Study 1 includes 2 experiments. Experiment 1 obtained all the word items and their corresponding explanations from the online Chinese dictionary and mapping SC-LIWC words with the classical Chinese words, thus we got candidate words for CC-LIWC dictionary; then we check the candidate words manually and remove the words that are not actually mapped; finally, we double check the result and generated CC-LIWC dictionary, included 79 categories and 49136 classical Chinese entries. Experiment 2 used CC-LIWC to analyse the self-expression texts of Confucius and Han Feizi separately and examined the difference between them. The result showed significant difference between Confucius and Han Feizi in emotional process, cognitive process, drives.etc, which led to the discovery of core value differences between Confucianism and Legalism. Study 2 includes 2 experiments. Experiment 3 was aimed to validate CC-LIWC. In the study, we selected 7 psychological dimensions of CC-LIWC and then invited 8 coders to grade 35 pieces of classical Chinese texts according to the 7 dimensions, and final calculated the correlation coefficient between human score and LIWC score. Experiment 4 focused on how to solve the problems that are found and put forward the solution to optimise CC-LIWC dictionary.In order to analyse the psychological meanings of classical Chinese texts, this paper constructed CC-LIWC dictionary and uses it to analyse ancient texts, evaluated the validity of the dictionary and finally optimised it to reach a better validity. The results show that the correlation between CC-LIWC score and human score is within the acceptable range, and some of the dimensions can even reach medium to high level correlation(p<0.01). Finally, we came to the conclusion that CC-LIWC has a considerable validity in analysing psychological meanings of classical Chinese texts.
关键词文化心理学 词频统计 中国古代历史 语义分析
学位类型硕士
语种中文
学位名称理学硕士(同等学力硕士)
学位专业应用心理学
学位授予单位中国科学院心理研究所
学位授予地点中国科学院心理研究所
文献类型学位论文
条目标识符http://ir.psych.ac.cn/handle/311026/41677
专题社会与工程心理学研究室
推荐引用方式
GB/T 7714
范妙榕. 基于词典的中国古文心理语义分析关键技术研究[D]. 中国科学院心理研究所. 中国科学院心理研究所,2021.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
基于词典的中国古文心理语义分析关键技术研(2116KB)学位论文 限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[范妙榕]的文章
百度学术
百度学术中相似的文章
[范妙榕]的文章
必应学术
必应学术中相似的文章
[范妙榕]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。