Journal of Engineering and Applied Sciences

Year: 2019
Volume: 14
Issue: 7
Page No. 2055 - 2094

English Sentiment Classification using A Hamann Coefficient and a Genetic Algorithm with a Roulette-Wheel Selection in a Parallel Network Environment

Authors : Vo Ngoc Phu and Vo Thi Ngoc Tran

Abstract: We have already studied a data mining field and a natural language processing field for many years. There are many significant relationships between the data mining and the natural language processing. Sentiment classification has had many crucial contributions to many different fields in everyday life such as in political activities, commodity production and commercial activities. A new model using a Hamann Coefficient (HC) and a Genetic Algorithm (GA) with a Fitness Fuction (FF) which is a Roulette-Wheel Selection (RWS) has been proposed for the sentiment classification. This can be applied to a big data. The GA can process many bit arrays. Thus, it saves a lot of storage spaces. We do not need lots of storage spaces to store a big data. Firstly, we create many sentiment lexicons of our basis English Sentiment Dictionary (bESD) by using the HC through a Google search engine with AND operator and OR operator. Next, According to the sentiment lexicons of the bESD, we encode 7,000,000 sentences of our training data set including the 3,500,000 negative and the 3,500,000 positive in English successfully into the bit arrays in a small storage space. We also encrypt all sentences of 9,000,000 documents of our testing data set comprising the 4,500,000 positive and the 4,500,000 negative in English successfully into the bit arrays in the small storage space. We use the GA with the RWS to cluster one bit array (corresponding to one sentence) of one document of the testing data set into either the bit arrays of the negative sentences or the bit arrays of the positive sentences of the training data set. The sentiment classification of one document is based on the results of the sentiment classification of the sentences of this document of the testing data set. We tested the proposed model in both a sequential environment and a distributed network system. We achieved 88.02% accuracy of the testing data set. The execution time of the model in the parallel network environment is faster than the execution time of the model in the sequential system. The results of this study can be widely used in applications and research of the English sentiment classification.

How to cite this article:

Vo Ngoc Phu and Vo Thi Ngoc Tran, 2019. English Sentiment Classification using A Hamann Coefficient and a Genetic Algorithm with a Roulette-Wheel Selection in a Parallel Network Environment. Journal of Engineering and Applied Sciences, 14: 2055-2094. Asian Journal of Information Technology, 18: 250-260.

Design and power by Medwell Web Development Team. © Medwell Publishing 2020 All Rights Reserved