Journal of Engineering and Applied Sciences

Year: 2014
Volume: 9
Issue: 10
Page No. 441 - 446

A Study on How to Improve the Performance of k-mean Data Mining Algorithm in a Parallel Environment

Authors : R.P.T.H. Gunasekara, M.C. Wijegunasekara and N.G.J. Dias

Abstract: The k-mean algorithm is widely used clustering algorithm for large datasets. But, there are limitations when k-mean is used for very large datasets. This study is carried out to enhance the performance of the k-mean data-mining algorithm by using parallel programming methodologies. In this research, mainly two methods of parallelizing k-mean clustering algorithm were compared. They were k-mean clustering on parallel and non-parallel execution in WEKA and k-mean clustering on constructed program using Message Passing Interface (MPI) for parallel k-mean algorithm. Firstly, the cluster building ability of WEKA parallel over non-parallel WEKA for very large datasets was investigated. To identify the performance of parallelizing, the number of machines connected to the WEKA parallel was varied and performances were analyzed for several k values using k-mean algorithm for each setup. The experiment was done on three real electricity consumption data consists of 80,000, 50,000 and 30,000 data entries and with 65 attributes. It was identified that there is a significant improvement in performance of the WEKA parallel. Further WEKA parallel can be applied to very large datasets which were failed to work with WEKA. Secondly, the k-mean algorithm was implemented in C programming language and its performance with non-parallel WEKA was compared. According to that the time taken to build clusters was almost similar for small datasets.

How to cite this article:

R.P.T.H. Gunasekara, M.C. Wijegunasekara and N.G.J. Dias, 2014. A Study on How to Improve the Performance of k-mean Data Mining Algorithm in a Parallel Environment. Journal of Engineering and Applied Sciences, 9: 441-446.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved