ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

2018
4

Subjects

Integration Theory of Computer Science
4

Authors

Institution

result total 4.

Hide Summary

Hits

Date

Downloads

Your conditions: 芦天亮

1. ChinaXiv:201812.00123
Download

基于用户关系的跨社交网络用户身份关联方法

Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-12-13 Cooperative journals: 《计算机应用研究》

刘奇飞杜彦辉芦天亮

Abstract： In order to distinguish the accounts that belong to the same person, this paper proposed a method to link user identity across social networks based on user relations. Firstly, we designed a user relations feature extraction module based on network representation learning. It could embed large information networks into low-dimensional vector spaces. Secondly, we proposed CSN_LINE algorithm for heterogeneous information network. The improved algorithm could represent network combining with anchor links across networks. Finally, we constructed a user identity linkage model based on multi-layer perception . Experiments showed that the F1 rate and accuracy rate of this method increased over 12% compared with the current advanced algorithm. The validity and rationality of the method is proved.

Hits 1729 Downloads 986 Comment
2. ChinaXiv:201811.00197
Download

基于深度学习的中文微博作者身份识别研究

Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-11-29 Cooperative journals: 《计算机应用研究》

徐晓霖蔡满春芦天亮

Abstract： Author identification has always plays an important role in the public security and literary inspection work. Texts feature extraction is cumbersome and not universal. To solve this problem, the CABLSTM Chinese microblog author identification model is proposed without expert feature modeling, and the accuracy of the model is tested in the open microblog corpus. This model maximizes the extraction of short text features, fuses the Attention mechanism in the CNN and removes the pooling layer, and obtains context-related information through the bidirectional LSTM. The identity recognition result is output through the Softmax layer. Experimental results show that the model has a certain improvement in accuracy, recall rate, and F value in comparison with traditional machine learning algorithms and TextCNN and LSTM algorithms in the identification task of Chinese microblog authors.

Hits 1599 Downloads 868 Comment
3. ChinaXiv:201806.00106
Download

基于WMF_LDA主题模型的文本相似度计算

Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-06-19 Cooperative journals: 《计算机应用研究》

张璐芦天亮杜彦辉

Abstract： Text similarity calculation is a significant part with great research value in the field of NLP (Natural Language Processing) . The calculation of text similarity with LDA (Latent Dirichlet Allocation) model takes into account the semantic features, but it has the disadvantages of a large number of words, unconformity of the semantics of words, and the inability to dig and exploit the inter-domain differences inherent in texts of different categories. This paper proposes WMF_LDA topic model (Word Merging and Filtering_LDA) . This model maps domain words and synonyms, and filters the words based on POS. Finally, LDA theme is used on the processed result. Experiments show that this method greatly reduces the amount of words during modeling, reduces the time consumption of the modeling process, and improves the speed of the final text clustering. And compared with other text similarity methods, the method proposed in this paper also has a certain degree of improvement in accuracy.

Hits 2376 Downloads 1335 Comment
4. ChinaXiv:201804.01449
Download

基于HBase的列存储压缩策略的选择优化

Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-04-12 Cooperative journals: 《计算机应用研究》

孙靖超芦天亮

Abstract： In the era of big data, the usage of column storage database is increasing, which promoted the development of research in column-oriented storage field. In order to solve the problem of high learning cost and low compression efficiency caused by large data dispersion, small classification granularity and the defect of applied classification algorithm encountered in the compression process of the existing column-based database compression strategy, this paper designed a sorted-based hybrid compression strategy of column-based compression and sector-based compression. Firstly, we designed a method to sort the data in each column according to the characteristics of HBase to strengthen the data compaction. Secondly, according to the characteristics of the data, we applied the hybrid column-based compression strategy and the hybrid sector-based compression strategy respectively to recommend the compression algorithm . We have conducted experiments on TPC-DS standard data and the results demonstrate that the proposed strategy has excellent performance in both compression rate and compression / decompression time.

Hits 1805 Downloads 1113 Comment

基于用户关系的跨社交网络用户身份关联方法

基于深度学习的中文微博作者身份识别研究

基于WMF_LDA主题模型的文本相似度计算

基于HBase的列存储压缩策略的选择优化