Your conditions: 芦天亮
  • 基于用户关系的跨社交网络用户身份关联方法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-12-13 Cooperative journals: 《计算机应用研究》

    Abstract: In order to distinguish the accounts that belong to the same person, this paper proposed a method to link user identity across social networks based on user relations. Firstly, we designed a user relations feature extraction module based on network representation learning. It could embed large information networks into low-dimensional vector spaces. Secondly, we proposed CSN_LINE algorithm for heterogeneous information network. The improved algorithm could represent network combining with anchor links across networks. Finally, we constructed a user identity linkage model based on multi-layer perception . Experiments showed that the F1 rate and accuracy rate of this method increased over 12% compared with the current advanced algorithm. The validity and rationality of the method is proved.

  • 基于深度学习的中文微博作者身份识别研究

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-11-29 Cooperative journals: 《计算机应用研究》

    Abstract: Author identification has always plays an important role in the public security and literary inspection work. Texts feature extraction is cumbersome and not universal. To solve this problem, the CABLSTM Chinese microblog author identification model is proposed without expert feature modeling, and the accuracy of the model is tested in the open microblog corpus. This model maximizes the extraction of short text features, fuses the Attention mechanism in the CNN and removes the pooling layer, and obtains context-related information through the bidirectional LSTM. The identity recognition result is output through the Softmax layer. Experimental results show that the model has a certain improvement in accuracy, recall rate, and F value in comparison with traditional machine learning algorithms and TextCNN and LSTM algorithms in the identification task of Chinese microblog authors.

  • 基于WMF_LDA主题模型的文本相似度计算

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-06-19 Cooperative journals: 《计算机应用研究》

    Abstract: Text similarity calculation is a significant part with great research value in the field of NLP (Natural Language Processing) . The calculation of text similarity with LDA (Latent Dirichlet Allocation) model takes into account the semantic features, but it has the disadvantages of a large number of words, unconformity of the semantics of words, and the inability to dig and exploit the inter-domain differences inherent in texts of different categories. This paper proposes WMF_LDA topic model (Word Merging and Filtering_LDA) . This model maps domain words and synonyms, and filters the words based on POS. Finally, LDA theme is used on the processed result. Experiments show that this method greatly reduces the amount of words during modeling, reduces the time consumption of the modeling process, and improves the speed of the final text clustering. And compared with other text similarity methods, the method proposed in this paper also has a certain degree of improvement in accuracy.

  • 基于HBase的列存储压缩策略的选择优化

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-04-12 Cooperative journals: 《计算机应用研究》

    Abstract: In the era of big data, the usage of column storage database is increasing, which promoted the development of research in column-oriented storage field. In order to solve the problem of high learning cost and low compression efficiency caused by large data dispersion, small classification granularity and the defect of applied classification algorithm encountered in the compression process of the existing column-based database compression strategy, this paper designed a sorted-based hybrid compression strategy of column-based compression and sector-based compression. Firstly, we designed a method to sort the data in each column according to the characteristics of HBase to strengthen the data compaction. Secondly, according to the characteristics of the data, we applied the hybrid column-based compression strategy and the hybrid sector-based compression strategy respectively to recommend the compression algorithm . We have conducted experiments on TPC-DS standard data and the results demonstrate that the proposed strategy has excellent performance in both compression rate and compression / decompression time.