Journal of Engineering and Applied Sciences

Year: 2019
Volume: 14
Issue: 14
Page No. 4780 - 4785

Extended-ATSD: Arabic Tweets Sentiment Dataset

Authors : Gehad S. Kaseb and Mona F. Ahmed

Abstract: Arabic Sentiment Analysis (SA) is one of the hottest research fields and there are still many topics open. The work in this field suffers from the lack of publicly available datasets and lexicons, however, there has been a lot of researches on SA in English. This study partially contributes by presenting a new annotated dataset, Arabic Tweets Sentiment Dataset (ATSD). The study will first detail, the process of collecting the data from Twitter for Egyptian and Saudi dialects. The gathered Tweets are classified as objective, subjective positive, subjective negative and subjective neutral. The study also discusses the process of filtering, pre-processing the dataset and annotating the Arabic text in order to build a big sentiment analysis dataset in Arabic. Determining sentiment expressed in a Tweet is not an easy task and depends on subjective judgment of human annotators. An analysis is made in order to adjust the best number of raters for new datasets annotation. The study then provides some modifications on a previous popular dataset called Arabic Sentiment Tweets Dataset (ASTD). It also combines both datasets into a collective dataset called extended ATSD. A detailed discussion of the full process adopted on the three datasets is presented. All the datasets (ATSD, Mini-ASTD and Extended-ATSD) built in this research are publicly available for academic use.

How to cite this article:

Gehad S. Kaseb and Mona F. Ahmed, 2019. Extended-ATSD: Arabic Tweets Sentiment Dataset. Journal of Engineering and Applied Sciences, 14: 4780-4785.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved