Journal of Engineering and Applied Sciences

Year: 2018
Volume: 13
Issue: 17
Page No. 7329 - 7340

A Large-Scale Arabic Sentiment Corpus Construction Using Online News Media

Authors : Ahmed Nasser and Hayri Sever

Abstract: Within computer-based technologies, the usage of collected data and its size are continuously on a rise. This continuously growing big data processing and computational requirements introduce new challenges, especially for Natural Language Processing (NLP) applications. One of these challenges is maintaining massive information-rich linguistic resources which are fit with the requirements of the big data handling, processing and analysis for NLP applications such as large-scale text corpus. In this research we present a large-scale sentiment corpus for the Arabic language called GLASC which is built using online news articles and metadata shared by the big data resource GDELT. Our GLASC corpus consists of a total number of 620,082 news article which are organized in categories (Positive, negative and neutral). Besides that, each news article within our corpus has a sentiment rating score in the range between-1 and 1. We have also carried out some experiments on our corpus, using machine learning algorithms to generate a sentiment classifier for document-level Arabic sentiment analyses. For training the sentiment classifier we generated different datasets from our corpus using different feature extraction and feature weighting method. We performed a comparative study, involving testing a wide range of classifiers that commonly used for sentiment analysis task and in addition we investigated several types of ensemble learning methods to verify its effect on improving the classification performance of sentiment analysis by using different comprehensive empirical experiments.

How to cite this article:

Ahmed Nasser and Hayri Sever, 2018. A Large-Scale Arabic Sentiment Corpus Construction Using Online News Media. Journal of Engineering and Applied Sciences, 13: 7329-7340.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved