ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

2023
5
2017
4

Subjects

Authors

Institution

result total 9.

Hide Summary

Hits

Date

Downloads

Your conditions: 余传明

1. ChinaXiv:202308.00644
Download

User Profiling Based on the Behaviour and Content Combined Model

Subjects: Library Science，Information Science >> Library Science submitted time 2023-08-27 Cooperative journals: 《图书情报工作》

Yu Chuanming Tian Xin Guo Yajing An Lu

Abstract： [Purpose/significance] To identify and remove online reviews from irrational investors, enhance the professional degree and quality of comments, and to promote rational investment, this article takes identifying whether the users on the Guba website belong to the noise investors as an example, and carries out a user profiling study.[Method/process] Deep user representation learning method was used to learn text information such as users'posts, then a behavior and content combined model was proposed with respect to behavior characteristics such as fans number, influence, bar age, post number and so on, and an empirical and comparative study was done on the annotated data set.[Result/conclusion] Experiment result showed that the BCCM model got the F1 score of 79.47%, which is superior to Decision Tree model(69.90%), SVM model(75.61%), KNN model(73.21%) and ANN model(74.83%). In the specific user profiling task of identifying noise traders, by using deep user representation learning method to obtain text content characteristics, the various evaluation metrics of use profiling can be remarkably improved.

Hits 411 Downloads 158 Comment
2. ChinaXiv:202308.00284
Download

A Cross-domain Text Sentiment Analysis Based on Deep Recurrent Neural Network

Subjects: Library Science，Information Science >> Library Science submitted time 2023-08-26 Cooperative journals: 《图书情报工作》

Yu Chuanming

Abstract： [Purpose/significance] In order to solve the problem of classification model in target domain that caused by the lack of data, this study firstly trains the model of source domain that includes rich labeling/tagging data, and then, projects source and target domain documents into the same feature space. [Method/process] The reviews of three product categories, i.e. books, DVD and music, from Amazon, which are written in Chinese, are taken as the experimental data, and the cross-domain text sentiment analysis is considered as the research task. A novel model, i.e. the Cross Domain Deep Recurrent Neural Network (CD-DRNN), is proposed to achieve knowledge transfer among domains. The average accuracy value of CD-DRNN achieves 81.70%,which excels the values of Stacked Long Short Term Memory (79.90%), Bidirectional Long Short Term Memory(80.50%), Convolution Neural Network with Long Short Term Memory (74.70%) and Merged Convolution Neural Network with Long Short Term Memory (80.90%). [Result/conclusion] Knowledge transfer in source domain and target domain could effectively solve the difficulties of achieving good classification performances on small data sets. The proposed method can be leveraged to effectively select features from unlabeled data, thereby greatly reducing the workload related to data annotation in the target domain.

Hits 448 Downloads 168 Comment
3. ChinaXiv:202307.00492
Download

Research of Abstractive Chinese Text Summarization Based on Seq2seq Model

Subjects: Library Science，Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

Yu Chuanming Zhu Xingyu Gong Yutian An Lu

Abstract： [Purpose/significance] To deal with the Out Of Vocabulary (OOV) in text summarization while avoiding duplication of summaries, this article focuses on solving the OOV problem and the self-duplication and carries out a profiling study.[Method/process] Bases on the sequence-to-sequence model, a pointer generator module and a coverage processing module are added. An attempt is made to copy the OOV into abstractive summary to solve the problem of OOV by means of the pointer generator module. The coverage processing module tries to avoid the Attention Mechanism paying attention to the same position repeatedly to solve the duplicate problem. The model is applied to the Chinese summarization dataset LCSTS to conduct experiments to test the effectiveness.[Result/conclusion] Experiment results show that the ROUGE of the generated summary is much higher than that of seq2seq model and extractive model, indicating that in the abstractive Chinese text summary, the pointer generator module and the coverage mechanism module can effectively solve the problem of OOV and the repetition of the summary, thereby significantly improving text summary quality.

Hits 292 Downloads 138 Comment
4. ChinaXiv:202307.00585
Download

Research on Scale Adaptation of Text Sentiment Analysis Algorithm in Big Data Environment: Using Twitter as Data Source

Subjects: Library Science，Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

Yu Chuanming Yuan Sai Wang Feng An Lu

Abstract： [Purpose/significance] This paper aims to study the scale adaptation problem for the purpose of textual sentiment analysis in big data environment. The paper provides reference for the best choice between efficiency and cost when researchers in the field of information science carry out data analysis under big data environment. [Method/process] We use the Sentiment140 dataset of Stanford University. Based on the analysis of traditional sentiment analysis algorithms, we propose five textual sentiment analysis algorithms for big data to test the adaptation effectiveness of various algorithms under different environments and data sizes, and conduct empirical comparisons in terms of accuracy, scalability and efficiency. [Result/conclusion] The experimental results show that the cluster built in this paper has good operational efficiency, correctness, and scalability. Spark clusters have more efficiency advantages in processing large-scale text sentiment analysis data, and with increasing the data size, its efficiency advantage is more obvious. In resource utilization, as the number of nodes and cores increase, the overall operating efficiency of the cluster changes significantly. We find the configuration of five slave nodes with 4 cores and 4G memory can achieve the effect of saving resource costs while efficiently completing the classification task.

Hits 273 Downloads 154 Comment
5. ChinaXiv:202304.00109
Download

Research on the Model of Adversarial Entity Relation Extraction in Cross-Lingual Context

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Yu Chuanming Wang Manyi An Lu

Abstract： [Purpose/significance] From the perspective of entity relation extraction, the knowledge acquisition task in a single language context is extended to a cross-language context, and the relation extraction effect of low-resource languages is improved.[Method/process] This paper proposed a Cross-Lingual Adversarial Relation Extraction (CLARE) framework, which decomposed cross-lingual relation extraction into parallel corpus acquisition and adversarial adaptation relation extraction. Through dictionary expansion or self-learning methods, the source language relation extraction data set was converted into the target language data set. On this basis, the feature representation of the source language was transferred to the target language using adversarial feature adaptation, and then the target language relation extraction network obtained by training was used to classify the target language.[Result/conclusion] The method in this paper is applied to the English-Chinese and Chinese-English cross-lingual relation extraction task based on the ACE2005 multilingual dataset. The Macro-F1 values of the optimal models on the two tasks are 0.880 1 and 0.842 2 respectively, indicating that the proposed CLARE framework for cross-language adversarial relation extraction can significantly improve the effect of low-resource language entity relation extraction. The research results are of great significance for improving the relation extraction model in the cross-lingual context and promoting the application of entity relation extraction research in the field of information science.

Hits 172 Downloads 89 Comment
6. ChinaXiv:201712.01382
Download

基于多特征融合的金融领域科研合作推荐研究*

Subjects: Library Science，Information Science >> Information Science submitted time 2017-12-05 Cooperative journals: 《数据分析与知识发现》

余传明龚雨田赵晓莉安璐

Abstract：【目的】科研合作关系是一种重要的社会网络。为了促进科研合作, 提高科研生产率, 对金融领域的科研合作推荐模型进行研究。【方法】建立金融领域个人、机构和区域三个层面的科研合作网络, 提出一种新的融合基于邻居节点和基于路径的网络特征的科研合作推荐模型, 并从个人、机构和区域三个层面进行实证检验。【结果】通过对 2000 年到 2014 年刊载的 68 905 篇金融领域的文章进行分析并构建科研合作网络, 在个人、机构和区域三个层面上, 基于特征融合的链接预测方法的 AUC 值分别为 84.25%、87.34%和 91.84%, 均高于基于邻居节点的算法和基于路径的算法的 AUC 值。【局限】在进行训练集和测试集选取的时候只按时间进行切分, 有待使用更多的切分方式对实验结果进行优化。【结论】本文有助于金融科研领域的个人、机构和区域寻求合作对象, 为进行科研网络的研究以及科研合作推荐的学者提供新的研究方法和思路。

Hits 2658 Downloads 1464 Comment
7. ChinaXiv:201712.01391
Download

基于深度表示学习的跨领域情感分析

Subjects: Library Science，Information Science >> Information Science submitted time 2017-12-05 Cooperative journals: 《数据分析与知识发现》

余传明冯博琳安璐

Abstract：【目的】通过在标注资源丰富的源领域中学习, 并将目标领域的文档投影到与源领域相同的特征空间中去, 从而解决目标领域因数据量较小难以获得好的分类模型的问题。【方法】选择亚马逊在线购物网站在书籍、DVD 和音乐类目下的中文、英文和日文评论作为实验数据, 在卷积神经网络和结构对应学习的基础上提出跨领域深度表示模型(CDDRM), 以实现不同领域环境下的知识迁移, 并将其应用到跨领域情感分析任务之中。【结果】实验结果表明, CDDRM 在跨领域环境下最优的 F 值达到 0.7368, 证明了该模型的有效性。【局限】CDDRM 针对长文本的跨领域情感分类 F 值仍然有待提升。【结论】知识迁移能够解决监督学习在小数据集上难以获得好的分类效果的问题, 与传统监督学习的基本假设相比, 它并不要求训练集和测试集服从相同或相似的数据分布。

Hits 2355 Downloads 1295 Comment
8. ChinaXiv:201712.01600
Download

基于多特征融合的金融领域科研合作推荐研究*

Subjects: Library Science，Information Science >> Information Science submitted time 2017-11-30 Cooperative journals: 《数据分析与知识发现》

余传明龚雨田赵晓莉安璐

Abstract：【目的】科研合作关系是一种重要的社会网络。为了促进科研合作, 提高科研生产率, 对金融领域的科研合作推荐模型进行研究。【方法】建立金融领域个人、机构和区域三个层面的科研合作网络, 提出一种新的融合基于邻居节点和基于路径的网络特征的科研合作推荐模型, 并从个人、机构和区域三个层面进行实证检验。【结果】通过对 2000 年到 2014 年刊载的 68 905 篇金融领域的文章进行分析并构建科研合作网络, 在个人、机构和区域三个层面上, 基于特征融合的链接预测方法的 AUC 值分别为 84.25%、87.34%和 91.84%, 均高于基于邻居节点的算法和基于路径的算法的 AUC 值。【局限】在进行训练集和测试集选取的时候只按时间进行切分, 有待使用更多的切分方式对实验结果进行优化。【结论】本文有助于金融科研领域的个人、机构和区域寻求合作对象, 为进行科研网络的研究以及科研合作推荐的学者提供新的研究方法和思路。

Hits 2632 Downloads 1513 Comment
9. ChinaXiv:201712.01606
Download

基于深度表示学习的跨领域情感分析

Subjects: Library Science，Information Science >> Information Science submitted time 2017-11-30 Cooperative journals: 《数据分析与知识发现》

余传明冯博琳安璐

Abstract：【目的】通过在标注资源丰富的源领域中学习, 并将目标领域的文档投影到与源领域相同的特征空间中去, 从而解决目标领域因数据量较小难以获得好的分类模型的问题。【方法】选择亚马逊在线购物网站在书籍、DVD 和音乐类目下的中文、英文和日文评论作为实验数据, 在卷积神经网络和结构对应学习的基础上提出跨领域深度表示模型(CDDRM), 以实现不同领域环境下的知识迁移, 并将其应用到跨领域情感分析任务之中。【结果】实验结果表明, CDDRM 在跨领域环境下最优的 F 值达到 0.7368, 证明了该模型的有效性。【局限】CDDRM 针对长文本的跨领域情感分类 F 值仍然有待提升。【结论】知识迁移能够解决监督学习在小数据集上难以获得好的分类效果的问题, 与传统监督学习的基本假设相比, 它并不要求训练集和测试集服从相同或相似的数据分布。

Hits 1832 Downloads 1024 Comment