Distance based k-means clustering algorithm for determining number of clusters for high dimensional data

Mohamed Cassim Alibuhtto; Nor Idayu Mahat

Distance based k-means clustering algorithm for determining number of clusters for high dimensional data

Files

Distance based k-means.pdf (913.37 KB)

Date

2020-01

Authors

Mohamed Cassim Alibuhtto

Nor Idayu Mahat

Publisher

Growing Science

Abstract

Clustering is one of the most common unsupervised data mining classification techniques for splitting objects into a set of meaningful groups. However, the traditional k-means algorithm is not applicable to retrieve useful information / clusters, particularly when there is an overwhelming growth of multidimensional data. Therefore, it is necessary to introduce a new strategy to determine the optimal number of clusters. To improve the clustering task on high dimensional data sets, the distance based k-means algorithm is proposed. The proposed algorithm is tested using eighteen sets of normal and non-normal multivariate simulation data under various combinations. Evidence gathered from the simulation reveal that the proposed algorithm is capable of identifying the exact number of clusters.

Keywords

Clustering High Dimensional Data K-means algorithm Optimal Cluster Simulation

Citation

Alibuhtto, M.C., & Mahat, N.I.(2020).Distance based k-means clustering algorithm for determining number of clusters for high dimensional data. Decision science letters. 9(1), 51-58.