Initializing k-means Clustering by Bootstrap and Data Depth Articles
Overview
published in
- JOURNAL OF CLASSIFICATION Journal
publication date
- July 2021
start page
- 232
end page
- 256
issue
- 38
Digital Object Identifier (DOI)
full text
International Standard Serial Number (ISSN)
- 0176-4268
Electronic International Standard Serial Number (EISSN)
- 1432-1343
abstract
- The k-means algorithm is widely used in various research fields because of its fast convergence to the cost function minima; however, it frequently gets stuck in local optima as it is sensitive to initial conditions. This paper explores a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets of arbitrary dimensions. Our technique consists of two stages: firstly, we use the original data space to obtain a set of prototypes (cluster centers) by applying k-means to bootstrap replications of the data and, secondly, we cluster the space of centers, which has tighter (thus easier to separate) groups, and search the deepest point in each assembled cluster using a depth notion. We test this method with simulated and real data, compare it with commonly used k-means initialization algorithms, and show that it is feasible and more efficient than previous proposals in many situations.
Classification
subjects
- Statistics
keywords
- k-means algorithm; bootstrap; mbd data depth