摘要: Machine learning has become a crucial technique for classifying the morphology of galaxies as a result of the meteoric development of galactic data. Unfortunately, traditional supervised learning has significant learning costs since it needs a lot of labeled data to be effective. FixMatch, a semi-supervised learning algorithm that serves as a good method, is now a key tool for using large amounts of unlabeled data. Nevertheless, the performance degrades significantly when dealing with large, imbalanced datasets since FixMatch uses a fixed threshold to filter pseudo labels. Therefore, this study proposes a dynamic threshold alignment (DTA) algorithm based on the FixMatch model. First, the class with the highest amount has its reliable pseudo label ratio determined, and the remaining classes' reliable pseudo label ratios are approximated in accordance. Second, based on the predicted reliable pseudo label ratio for each category, dynamically calculate the threshold for choosing pseudo labels. By employing this dynamic threshold, the accuracy bias of each category is decreased and the learning of classes with less samples is improved. Experimental results show that in galaxy morphology classification tasks, compared with supervised learning, the proposed algorithm significantly improves performance. When the amount of labeled data is 100, the accuracy and F1-score are improved by 12.8% and 12.6%, respectively. Compared with popular semi-supervised algorithms such as FixMatch and MixMatch, the proposed algorithm has better classification performance, greatly reducing the accuracy bias of each category. When the amount of labeled data is 1000, the accuracy of the cigar-shaped smooth galaxy with the least samples is improved by 25.87% compared to FixMatch.
[V1] | 2023-09-08 14:39:45 | ChinaXiv:202309.00137V1 | 下载全文 |