Initializing k-means Clustering by Bootstrap and Data Depth Articles uri icon

publication date

  • July 2021

start page

  • 232

end page

  • 256

issue

  • 38

International Standard Serial Number (ISSN)

  • 0176-4268

Electronic International Standard Serial Number (EISSN)

  • 1432-1343

abstract

  • The k-means algorithm is widely used in various research fields because of its fast convergence to the cost function minima; however, it frequently gets stuck in local optima as it is sensitive to initial conditions. This paper explores a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets of arbitrary dimensions. Our technique consists of two stages: firstly, we use the original data space to obtain a set of prototypes (cluster centers) by applying k-means to bootstrap replications of the data and, secondly, we cluster the space of centers, which has tighter (thus easier to separate) groups, and search the deepest point in each assembled cluster using a depth notion. We test this method with simulated and real data, compare it with commonly used k-means initialization algorithms, and show that it is feasible and more efficient than previous proposals in many situations.

subjects

  • Statistics

keywords

  • k-means algorithm; bootstrap; mbd data depth