ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

2023
11
2022
1

Subjects

Authors

Institution

result total 12.

Hide Summary

Hits

Date

Downloads

Your conditions: 黄水清

1. ChinaXiv:202310.00635
Download

Automatic Summary Generation of News for People’s Daily Online Corpus

Subjects: Library Science，Information Science >> Library Science submitted time 2023-10-08 Cooperative journals: 《知识管理论坛》

Liang Yuan Wang Dongbo Huang Shuiqing

Abstract： [Purpose/significance] This paper conducts a study for the mainstream news media for People’s Daily Online corpus, aiming to provide ideas and practical support for the study of automatic text summarization, which can then be applied to news and other related text information processing, and contribute to knowledge aggregation services and information access research. [Method/process] The experimental corpus of this research was the sub-corpus of the People’s Daily Online in January 2015, June 2015 and January 2016 in the new era People’s Daily (NEPD). Based on TF-IDF, Textrank and other extractive automatic summarization algorithms, based on the generative automatic abstractive summarization model for the pointer-generator network, the research was carried out and analyzed and evaluated the summarization results. [Result/conclusion] The experiment builds a news extraction automatic abstractive algorithm the Pointer-Generator Networks model for the People’s Daily corpus, and constructs a network model of news generative automatic summary pointer generation for People’s Daily Online corpus. Fruitful experimental results are evaluated by Rouge indicator (including 3 indicators: Rouge-1, Rouge-2 and Rouge-L). This article provides corpus support and practical support for the automatic news summarization system.

YES

Hits 333 Downloads 109 Comment
2. ChinaXiv:202307.00295
Download

Construction, Performance and Application of New Era People's Daily Segmented Corpus (Ⅲ)——Analysis and Comparison of Sentence Length and Word

Subjects: Library Science，Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

Huang Shuiqing Wang Dongbo

Abstract： [Purpose/significance] The statistics and analysis of sentence length in different dimensions and vocabulary distribution based on the New Era People's Daily(NEPD) word segmentation corpus is not only conducive to a relatively comprehensively and systematically understanding of the linguistic characteristics of the contemporary Chinese text, but also beneficial to the subsequent exploration of natural language processing and text mining of the text.[Method/process] Based on the word segmentation data of People's Daily in January 2018 and the word segmentation data of People's Daily in January 1998, 6 sentence categories used in the statistics were determined, and the sentence length distribution of character and word units was counted and analyzed, and the distribution of words in static state was revealed based on Zipf's law.[Result/conclusion] From the perspective of the sentence length distribution in the word dimension and the Zipf distribution of vocabulary, the sentence length and vocabulary distribution have both changed in the 1998 and 2018 corpora as time goes by, but this change is continuous and related.

Hits 447 Downloads 110 Comment
3. ChinaXiv:202307.00312
Download

Construction, Performance and Application of New Era People's Daily Segmented Corpus (Ⅱ)——Constructing Automatic Word Segmentation Model of Deep Learning

Subjects: Library Science，Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

Huang Shuiqing Wang Dongbo

Abstract： [Purpose/significance] On the basis of the new era People's Daily(NEPD) word segmentation corpus, the construction of the automatic word segmentation model of deep learning not only can help to provide relevant experience for the construction of high-performance word segmentation model, but also can verify the performance of the corresponding model of deep learning through specific natural language processing tasks.[Method/process] Based on the introduction of Bi-directional Long Short-Term Memory (Bi-LSTM) and Bi-directional Long Short-Term Memory with conditional random field (Bi-LSTM-CRF), this paper expounded the process, type and situation of Chinese word segmentation preprocessing, the evaluation indexes and parameters and hardware platform, the Bi-LSTM and Bi-LSTM-CRF Chinese automatic word segmentation models were constructed respectively, and the overall performance of the models was analyzed.[Result/conclusion] The overall performance of the Bi-LSTM and Bi-LSTM-CRF Chinese automatic word segmentation model is relatively reasonable from the three indexes of precision, recall and F value. In terms of specific performance, Bi-LSTM word segmentation model is superior to Bi-LSTM-CRF word segmentation model, but the difference is very small.

Hits 358 Downloads 155 Comment
4. ChinaXiv:202307.00327
Download

Construction, Performance and Application of New Era People's Daily Segmented Corpus (I)——Construction and Evaluation of Corpus

Subjects: Library Science，Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

Huang Shuiqing . Wang Dongbo

Abstract： [Purpose/significance] The construction of the segmented corpus of People's Daily in line with the new era provides new annotated corpus for Chinese information processing, and also offers new language resources for analyzing modern Chinese from a diachronic perspective.[Method/process] The data source, annotation specification and process of the constructed corpus were explained on the basis of analyzing the existing Chinese word segmentation corpus, on the other hand, the corpus performance was evaluated by constructing the automatic word segmentation model by comparing with the existing corpus.[Result/conclusion] The New Era People's Daily Segmented Corpus(NEPD) with a large scale and a long time span follows the basic processing standards of modern Chinese corpus. The part of January 2018 is selected from NEPD to build a segmentation model based on conditional random field model. The performance of the corpus of People's Daily in January 2018 is evaluated and compared with that of the corpus of People's Daily in January 1998. The specific evaluation indexes obtained from the corpus show that the overall performance of the corpus of People's Daily in the new era is relatively outstanding. The corpus of 1998 could not be replaced, but it is very necessary to construct the NEPD.

Hits 395 Downloads 130 Comment
5. ChinaXiv:202304.00944
Download

Plant Knowledge Mining and Organization Construction in Pre-Qin Classics from the Perspective of Digital Humanities

Subjects: Library Science，Information Science >> Library Science submitted time 2023-04-13

Wu Mengcheng Lin Litao Qi Yue Huang Shuiqing Wang Dongbo Liu liu

Abstract： Purpose/significance The knowledge mining of plants in pre-Qin classics and the construction of pre-Qin plant knowledge map are of great significance for understanding the society and living conditions of ancient Chinese people. Method/process This paper makes a detailed labeling and quantitative analysis of plant words in pre-Qin classics. Based on CRF and a variety of deep learning models, a plant named entity recognition model for pre-Qin classics was constructed, and the performance of each model was compared and analyzed to determine the optimal model. A knowledge map-oriented knowledge organization model of classics and plants was designed. Result/conclusion The plant entity recognition model based on the domain pre-trained language model SikuRoBERTa has the best performance, and the harmonic average reaches 85.44%, which provides an effective method for entity-based plant knowledge mining. Aggregation and visualization of plant knowledge in pre-Qin classics.

YES

Hits 1348 Downloads 260 Comment
6. ChinaXiv:202304.00019
Download

Humanity Computing on Women in Spring and Autumn Annals and the Three Commentaries

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Liu Liu Huang Shuiqing Meng Kai Li Bin Wang Dongbo Su Xinning

Abstract： [Purpose/significance] The study of digital humanities in ancient Chinese classics shows a promising future based on the digitization and intelligent processing of ancient classics, because the methods of quantitative analysis provides new perspectives. [Method/process] The study is based on the data of Spring and Autumn Annals and the Three Commentaries. With the annotation of knowledge on women in the books, the study provides quantitative analysis based on names, countries and other important knowledges about ancient pre-Qin Chinese women. This study also conveys the marriages between countries based on the data annotated before. The activeness in marriages representing the importance in diplomacy is deeply measured. [Result/conclusion] The study gives a new interpretation of the female characters in the books, proposes a measurable and visual research method which provides reliable data verifications for relevant researches. The methods in this study will provide reliable data for related traditional studied.

Hits 186 Downloads 94 Comment
7. ChinaXiv:202304.00299
Download

A Research on the Visualization and Metric Analysis of War in Zuo Zhuan Based on Social Network Analysis

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Fan Wenjie Li Zhongkai Huang Shuiqing

Abstract： [Purpose/significance] The development of digital humanities has aroused widespread concern in the field of social sciences and humanities, by using convenient and efficient computing technology to extract potential information from massive data resources and unstructured text, and present it to users in a more intuitive and clear way.[Method/process] This paper took the war described in Zuo Zhuan as the research object, and extracted the strategic offensive sides and strategic defenders of each war from the war sentences. From the perspective of digital humanities, it explored the feasibility of using social network analysis methods to describe the changes of the war pattern in Spring and Autumn Period. On this basis, the vassal states in the Spring and Autumn Period were divided into different groups according to the relationship between war cooperation and war confrontation. The main groups and the core vassal states were analyzed and discussed one by one. In addition, the war in Zuo Zhuan was dynamically displayed in using 3 techniques:html, css and E-Charts.[Result/conclusion] We provided a method for extracting war information from the unstructured Zuo Zhuan texts during the Spring and Autumn Period and organized it into quantifiable data. It proved that it was feasible to show the relationship between the vassal states during the Spring and Autumn Period from the perspective of war, and also showed the feasibility and great potential of digital humanities technology in the research of humanistic history.

Hits 203 Downloads 88 Comment
8. ChinaXiv:202304.00414
Download

Errors Revision and Compilation Quality Evaluation of Index to Chunqiu Jingzhuan in a Digital Background

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Peng Qiuru Fan Wenjie Huang Shuiqing

Abstract： [Purpose/significance] Under the premise of digitization, this paper recognizes and revises missing errors of Index to Chunqiu Jingzhuan, whose compilation quality is reviewed and analyzed based on quantitative data. This paper shows the compilation quality of the great works of ancient Chinese classics index Sinological Index Series in the manual age and reveals the value of the printed form full-text index of ancient books in the digital age.[Method/process] The scriptures, biography and all index entries of Index to Chunqiu Jingzhuan was counted by digitization. Comparing them item by item was to find out and correct the errors and omissions, record and count the types and numbers of errors and omissions, and analyze the overall compilation quality.[Result/conclusion] There are few missing errors in Index to Chunqiu Jingzhuan, and the entry error rate is only about 1 in 10,000. The compilation quality can be called the pinnacle of the hand-crafted era, it deserves the great reputation among academic community and can be used as a high-quality foundation for digital corpus.

Hits 158 Downloads 87 Comment
9. ChinaXiv:202304.00438
Download

Impact Evaluation of Chinese Academic Books on Humanities and Social Sciences: Taking Library & Information Science as an Example

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Peng Qiuru He Bei Huang Shuiqing

Abstract： [Purpose/significance] Academic books are not only an important tool to present the results of scientific research activities, but also an important information resource in human social information activities. The impact evaluation of academic books is conductive to the full use of them.[Method/process] On the basis of previous research results, this paper designed a complete and comprehensive impact evaluation system of Chinese academic books on Humanities and Social Sciences, and set up multi-level evaluation indicators from the academic impact and social impact. And this paper selected 103 academic books in the field of Library & Information Science from CBKCI as research samples, collected the corresponding indicator data, and used CRITIC and TOPSIS to evaluate the impact of 103 sample books.[Result/conclusion] The research results show that indicators and methods of the impact evaluation of Chinese academic books on Humanities and Social Sciences proposed in this paper consider the factors of scientific research scholars, readers and books themselves, they comprehensively reflect all aspects of the impact of academic books, and there are a certain ease of use and feasibility.

Hits 170 Downloads 93 Comment
10. ChinaXiv:202304.00548
Download

The Analysis of Time Distribution and Evolution Characteristics of Crops in Classics: Taking Shihuozhi as an Example

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Cui Bin Wang Dongbo Huang Shuiqing

Abstract： [Purpose/significance] There is a long history of crop cultivation in China. It is of great significance to analyze the time distribution and development evolution of ancient crops for optimizing the modern agricultural planting structure. [Method/process] This paper put forward a set of analytical process of crop time distribution and evolution characteristics, which included four parts: corpus acquisition and digitization, segmentation and entity relationship extraction, time distribution characteristics analysis and evolution characteristics analysis, and selected Shihuozhi from 15 historical books for empirical analysis. [Result/conclusion] Based on the analysis results of Shihuozhi, the feasibility and effectiveness of the method are verified by the relevant historical, economic, philological and other multidisciplinary research data, which can provide reference for the analysis of the time distribution and evolution characteristics of ancient crops based on classics. But in the future, we need to improve the level of automation, expand the research sample, refine the event type and other aspects to further optimize the method process.

Hits 137 Downloads 73 Comment
11. ChinaXiv:202304.00607
Download

Research on Automatic Mining of Variants Expressing the Same Event in the Ancient Books

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Liang Yuan Wang Dongbo Huang Shuiqing

Abstract： [Purpose/significance] Variations are a common phenomenon and also an important research object in ancient books. The traditional collation of ancient books is to manually search for materials, including variations from a large number of ancient books. This work is not only time-consuming, laborious, and heavy, but the data may not be accurate and comprehensive. Automatic mining of variant sentences through computers can obtain effective information from larger-scale corpus. In addition, the collation method combined with automatic mining of variant sentences can realize exhaustive retrieval, which is of great significance to the collation of ancient books. It provides new ideas and methods for the collation research of ancient books in the new period.[Method/process] This research automatically mined the variant sentences in Three Biographies of the Spring and Autumn Period, combining deep learning and introducing parallel corpus commonly used in the field of machine translation. Subsequently, this study compared LSTM and BERT models'results with the classic SVM model and further explored and analyzed the related content of the variants expressing the same event with different descriptions in two ancient books.[Result/conclusion] The experiment obtained a deep learning model for automatic mining of variants expressing the same event suitable for Three Biographies of the Spring and Autumn Period. It proves the feasibility of integrating new technologies such as deep learning into the construction of ancient books' knowledge base. Meanwhile, the combination of deep learning and parallel corpus can play a more significant role in studying variant sentences and provide practical support for applying digital humanities in the Chinese language and literature.

Hits 175 Downloads 81 Comment
12. ChinaXiv:202201.00032
Download

From "Library, Information and Archives Management" to "Information Resources Management": Some Analysis and Reflections on the Renaming of the First Level Subject

Subjects: Library Science，Information Science >> Library Science submitted time 2022-01-06

初景利黄水清

Abstract： [Purpose/Significance] For the image files and related disciplines, the release of the 2021 new edition of the Discipline and Major Catalogue (Draft for Comment) of the Academic Degrees Committee Office of the State Council is not only related to the renaming of first-level disciplines, but also our discipline connotation and discipline system. major changes. To this end, it is necessary to analyze and think about the significance of this renaming and future discipline construction strategies. [Method/process] Through literature research and historical analysis, this paper sorts out the development and evolution of "information resource management", strengthens the rational understanding of the renaming of first-level disciplines, and proposes new construction strategies for first-level disciplines. [Result/conclusion] It is necessary for the academic community to increase the recognition of the concept and connotation, meaning and value, category and boundary, method and technology, discipline and theory, application and effect, planning and future of "information resource management" as a first-level discipline. Knowledge and research, and promote the fundamental transformation of first-level disciplines from name (name) to content (fact).

Peer Review Status:Awaiting Review

Hits 31381 Downloads 1484 Comment

Automatic Summary Generation of News for People’s Daily Online Corpus

Construction, Performance and Application of New Era People's Daily Segmented Corpus (Ⅲ)——Analysis and Comparison of Sentence Length and Word

Construction, Performance and Application of New Era People's Daily Segmented Corpus (Ⅱ)——Constructing Automatic Word Segmentation Model of Deep Learning

Construction, Performance and Application of New Era People's Daily Segmented Corpus (I)——Construction and Evaluation of Corpus

Plant Knowledge Mining and Organization Construction in Pre-Qin Classics from the Perspective of Digital Humanities

Humanity Computing on Women in Spring and Autumn Annals and the Three Commentaries

A Research on the Visualization and Metric Analysis of War in Zuo Zhuan Based on Social Network Analysis

Errors Revision and Compilation Quality Evaluation of Index to Chunqiu Jingzhuan in a Digital Background

Impact Evaluation of Chinese Academic Books on Humanities and Social Sciences: Taking Library & Information Science as an Example

The Analysis of Time Distribution and Evolution Characteristics of Crops in Classics: Taking Shihuozhi as an Example

Research on Automatic Mining of Variants Expressing the Same Event in the Ancient Books

From "Library, Information and Archives Management" to "Information Resources Management": Some Analysis and Reflections on the Renaming of the First Level Subject