Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] The classics are the carrier of Chinese traditional culture, thought and wisdom. Combining the methods of data acquisition, labeling and analysis of digital humanities, it is of great significance for the automatic entity recognition of classics for subsequent application research. [Method/process] The corpus was constructed based on 25 pre-Qin literature that have been automatically segmented and manually annotated, based on the corpus of different sizes and seven deep learning models of Bi-LSTM, Bi-LSTM-Attention, Bi-LSTM-CRF, Bi-LSTM-CRF-Attention, Bi-RNN, Bi-RNN-CRF and BERT, we extracted the corresponding entities that constituted historical events and compared their effects.[Result/conclusion] The accuracy of the Bi-LSTM-Attention and Bi-RNN-CRF models trained on all corpus reached 89.79% and 89.33%, respectively, confirming the feasibility of applying deep learning to large-scale text datasets.