Current Location:home > Detailed Browse

Article Detail

基于文本数据增强的生活满意度预测模型优化

Optimization of a prediction model of life satisfaction based on text data augmentation

Submit Time: 2022-01-04
Author: 陈佳婧 1,2 ; 胡丁鼎 1,2 ; 宋蕊 1,2 ; 谭诗奇 1,2 ; 李雨晴 1,2 ; 张胜楠 1,2 ; 朱廷劭 1,2 ; 赵楠 1,2 ;
Institute: 1.中国科学院心理研究所,北京,100101; 2.中国科学院大学心理学系, 北京 100049;

Abstracts

[目的]随着网络大数据以及机器学习的方法的发展,越来越多研究结合文本分析与机器学习来预测满意度。在建立生活满意度预测模型的研究中,针对获取大量有效的有标注数据困难的问题,本研究提出基于文本数据增强以优化生活满意度预测模型。 [方法]改编大连理工词典后,以357份生活现状描述为原始文本、生活满意度量表自评分为标注,经过EDA和回译进行文本数据增强,利用传统机器学习算法建立预测模型。 [结果]结果显示,大连理工词典改编后,各模型预测能力大大提高;数据增强后,仅在线性回归模型上观察到回译和EDA的提升作用。使用原始数据进行训练的岭回归模型预测值与实际值的皮尔逊相关系数最高,达0.4131。 [结论]特征提取精度的提升可优化目前的生活满意度预测模型,但对于以词频为特征建立的生活满意度预测模型,基于回译和EDA进行的文本数据增强可能并不十分适用。
[英文摘要][Objective] With the development of network big data and machine learning, more and more studies starting to combine text analysis and machine learning algorithms to predict individual satisfaction. In the studies focused on building life satisfaction prediction models, it is often difficult to obtain large amounts of valid and labeled data. This study aims at solving this problem using data augmentation and optimizing the prediction model of life satisfaction. [Method] Using 357 life status descriptions annotated by self-rating life satisfaction scale scores as original text data. After preprocessing using DLUT-Emotionontology, EAD and back-translation method was applied and the prediction model was built using traditional machine learning algorithms. [Results] Results showed that (1) the prediction accuracy was largely enhanced after using the adapted version of DLUT-Emotionontology; (2) only linear regression model was enhanced after data augmentation; (3) rigid regression model showed the greatest prediction accuracy when trained by original data (r = 0.4131). [Conclusion] The improvement of feature extraction accuracy can optimize the current life satisfaction prediction model, but the text data augmentation methods, such as back translation and EDA may not be applicable for the life satisfaction prediction model based on word frequency.
Download Comment Hits:5220 Downloads:430
From: 朱廷劭
DOI:10.12074/202201.00007
Recommended references: 陈佳婧,胡丁鼎,宋蕊,谭诗奇,李雨晴,张胜楠,朱廷劭,赵楠.(2022).基于文本数据增强的生活满意度预测模型优化.[ChinaXiv:202201.00007] (Click&Copy)
Version History
[V1] 2022-01-04 11:07:49 chinaXiv:202201.00007V1 Download
Related Paper

Download

Current Browse

Change Subject Browse

Cross Subject Browse