Improving Scalable K-Means++
Hämäläinen, J., Kärkkäinen, T., & Rossi, T. (2021). Improving Scalable K-Means++. Algorithms, 14(1), Article 6. https://doi.org/10.3390/a14010006
Julkaistu sarjassa
AlgorithmsPäivämäärä
2021Tekijänoikeudet
© 2020 by the authors. Licensee MDPI, Basel, Switzerland
Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art by improving clustering accuracy and the speed of convergence. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases
...
Julkaisija
MDPI AGISSN Hae Julkaisufoorumista
1999-4893Asiasanat
Julkaisu tutkimustietojärjestelmässä
https://converis.jyu.fi/converis/portal/detail/Publication/47636982
Metadata
Näytä kaikki kuvailutiedotKokoelmat
Rahoittaja(t)
Suomen AkatemiaRahoitusohjelmat(t)
Akatemiaohjelma, SA; Profilointi, SALisätietoja rahoituksesta
The work has been supported by the Academy of Finland from the projects 311877 (Demo) and 315550 (HNP-AI).Lisenssi
Samankaltainen aineisto
Näytetään aineistoja, joilla on samankaltainen nimeke tai asiasanat.
-
Scalable implementation of dependence clustering in Apache Spark
Ivannikova, Elena (IEEE, 2017)This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, ... -
Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods
Niemelä, Marko; Kärkkäinen, Tommi (Springer, 2022)Missing data introduces a challenge in the field of unsupervised learning. In clustering, when the form and the number of clusters are to be determined, one needs to deal with the missing values both in the clustering ... -
Improvements and applications of the elements of prototype-based clustering
Hämäläinen, Joonas (Jyväskylän yliopisto, 2018) -
Scalable robust clustering method for large and sparse data
Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo (ESANN, 2018)Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. Moreprecisely, a ... -
Taming big knowledge evolution
Cochez, Michael (University of Jyväskylä, 2016)Information and its derived knowledge are not static. Instead, information is changing over time and our understanding of it evolves with our ability and willingness to consume the information. When compared to humans, ...
Ellei toisin mainittu, julkisesti saatavilla olevia JYX-metatietoja (poislukien tiivistelmät) saa vapaasti uudelleenkäyttää CC0-lisenssillä.