Electronic International Standard Serial Number (EISSN)
1873-6793
abstract
The explosive growth of uncategorized web documents requires effective methods for their organization. Clustering techniques address this challenge by automatically grouping similar documents, making large amounts of unstructured data more manageable. In recent times, multi-objective optimization approaches have become an effective way to solve the document clustering problem. However, in the scientific literature, there is a lack of studies that address this task by using swarm-intelligence algorithms from this optimization viewpoint. For this reason, a multi-objective swarm-intelligence algorithm for document clustering (MOSIDOC) has been designed, developed, and applied in this work. MOSIDOC is based on the idea of combining swarm-intelligence mechanisms from artificial bee colony with problem-aware operators to attain an accurate processing of the search space. The criteria of compactness, separation, and Davies-Bouldin index have been formulated as the objective functions to be optimized. The experimentation has been carried out on ODP-239, one of the most widely-used datasets for testing document clustering methods. To comprehensively evaluate the proposed approach, the evaluation metrics of cluster-level F1 measure, Adjusted Rand Index, and Normalized Mutual Information have been applied. The obtained results denote that MOSIDOC leads to average percentage improvements up to 86.81%, in comparison to other fifteen competing methods.