Search in Medwell
Journal of Engineering and Applied Sciences
Year: 2019 | Volume: 14 | Issue: 7 | Page No.: 2292-2301
DOI: 10.36478/jeasci.2019.2292.2301  
Feature Engineering for Arabic Text Classification
Ghassan Khazal and Alexander Zamyatin
Abstract: Arabic is one of the most complex languages and it has a rich vocabulary also it has difficult and different structure when compared with the others languages. Arabic language has many challenges in text mining one these challenges are how to achieve highest classification accuracy. We proposed in this research a feature engineering of the best combination of preprocessing procedures with appropriate feature representation that has direct affected the classification accuracy of the Arabic text. Preprocessing and feature representation represent the main steps in any text classification framework. This phase is very important to design any text classifier that deals with this sophisticated language. In this study, we used four classification classifiers Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB) and K-Nearest Neighbor KNN. From analysis and experimental results on Arabic text data we reveal that preprocessing techniques and feature representation and weighting have an important influence on the classification accuracy. Also, its depend on choosing the suitable combinations of preprocessing tasks with the appropriate feature representation and classification techniques provides a good improvement in the accuracy of classification. This study shows that the SVM (82.6%) and KNN (78.33%) have better performance on average over the DT (57.49%) and NB (76.21%). The SVM achieved accuracy (88.67%) with the combination of tokenization, filtering, normalization and light stemming with TFIDF as feature representation and KNN classifier gives 88.00% using the combination of tokenization, filtering as preprocessing and TFIDF as feature representation with information gain as feature selection.
How to cite this article:
Ghassan Khazal and Alexander Zamyatin, 2019. Feature Engineering for Arabic Text Classification. Journal of Engineering and Applied Sciences, 14: 2292-2301.
DOI: 10.36478/jeasci.2019.2292.2301