HYBRID TEXT CLASSIFICATION BASED ON TF-IDF AND ADAPTIVE ALSHE ENSEMBLE
;
Text classification, TF-IDF, machine learning, NLP, LinearSVC, Naive Bayes, ensemble, ALSHE, Macro F1.Abstrak
This article investigates the task of multi-class classification of technical texts. The experiments utilized the LocalDocs-10 corpus, compiled from software package descriptions and partitioned into 10 thematic classes. Texts were represented via word-level and character-level TF-IDF -grams, alongside compact SVD-derived features. A comparative evaluation was conducted between classical machine learning algorithms and several hybrid approaches. Special emphasis was placed on the adaptive ALSHE-Gated model, which integrates Complement Naive Bayes and LinearSVC through a confidence-driven switching mechanism. The Passive-Aggressive Classifier achieved the highest performance among baseline models, attaining an Accuracy of and a Macro F1-score of . These findings affirm that lightweight TF-IDF models constitute a viable alternative to computationally intensive neural networks for small- to medium-sized corpora.
Iqtiboslar
Scikit-learn developers. Working With Text Data. Scikit-learn documentation.
Pedregosa F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 2011, 12, pp. 2825-2830.
Cortes C., Vapnik V. Support-vector networks. Machine Learning, 1995, 20, pp. 273-297.
Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1), pp. 1-47.
Joulin A., Grave E., Bojanowski P., Mikolov T. Bag of Tricks for Efficient Text Classification. EACL, 2017.
Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, 2019.
##submission.downloads##
Nashr qilingan
Nashr
Bo'lim
Iqtibos keltirish tartibi