International Journal of Soft Computing

Year: 2009
Volume: 4
Issue: 4
Page No. 168 - 172

Impact of Normalization in Distributed K-Means Clustering

Authors : N. Karthikeyani Visalakshi and K. Thangavel

Abstract: Distributed clustering is an emerging research area in the broader field of Knowledge discovery in databases. Normalization is an essential preprocessing step in data mining, to standardize values of all attributes or features from different dynamic range into a specified range. In this study, distributed K-Means clustering algorithm is extended by applying global normalization before performing the clustering on distributed datasets, without necessarily downloading all the data into a single site. The performance of proposed normalization based distributed K-Means clustering algorithm is compared against distributed K-Means clustering algorithm and normalization based centralized K-Means clustering algorithm. The quality of clustering is also compared by three normalization procedures, namely Min-max, Z-score and decimal scaling for the proposed distributed clustering algorithm. The comparative analysis shows that the distributed clustering results depend on the type of normalization procedure. The experiments are carried out for various numerical datasets of UCI machine learning data repository.

How to cite this article:

N. Karthikeyani Visalakshi and K. Thangavel, 2009. Impact of Normalization in Distributed K-Means Clustering. International Journal of Soft Computing, 4: 168-172.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved