Improving Scalable K-Means++

Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo

doi:10.3390/a14010006

dc.contributor.author	Hämäläinen, Joonas
dc.contributor.author	Kärkkäinen, Tommi
dc.contributor.author	Rossi, Tuomo
dc.date.accessioned	2021-01-14T14:02:20Z
dc.date.available	2021-01-14T14:02:20Z
dc.date.issued	2021
dc.identifier.citation	Hämäläinen, J., Kärkkäinen, T., & Rossi, T. (2021). Improving Scalable K-Means++. <i>Algorithms</i>, <i>14</i>(1), Article 6. <a href="https://doi.org/10.3390/a14010006" target="_blank">https://doi.org/10.3390/a14010006</a>
dc.identifier.other	CONVID_47636982
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/73628
dc.description.abstract	Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art by improving clustering accuracy and the speed of convergence. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases	en
dc.format.mimetype	application/pdf
dc.language	eng
dc.language.iso	eng
dc.publisher	MDPI AG
dc.relation.ispartofseries	Algorithms
dc.rights	CC BY 4.0
dc.subject.other	clustering initialization
dc.subject.other	K-means‖
dc.subject.other	K-means++
dc.subject.other	random projection
dc.title	Improving Scalable K-Means++
dc.type	article
dc.identifier.urn	URN:NBN:fi:jyu-202101141104
dc.contributor.laitos	Informaatioteknologian tiedekunta	fi
dc.contributor.laitos	Faculty of Information Technology	en
dc.type.uri	http://purl.org/eprint/type/JournalArticle
dc.type.coar	http://purl.org/coar/resource_type/c_2df8fbb1
dc.description.reviewstatus	peerReviewed
dc.relation.issn	1999-4893
dc.relation.numberinseries	1
dc.relation.volume	14
dc.type.version	publishedVersion
dc.rights.copyright	© 2020 by the authors. Licensee MDPI, Basel, Switzerland
dc.rights.accesslevel	openAccess	fi
dc.relation.grantnumber	315550
dc.relation.grantnumber	311877
dc.subject.yso	algoritmit
dc.subject.yso	tiedonlouhinta
dc.subject.yso	klusterianalyysi
dc.subject.yso	algoritmiikka
dc.format.content	fulltext
jyx.subject.uri	http://www.yso.fi/onto/yso/p14524
jyx.subject.uri	http://www.yso.fi/onto/yso/p5520
jyx.subject.uri	http://www.yso.fi/onto/yso/p27558
jyx.subject.uri	http://www.yso.fi/onto/yso/p3365
dc.rights.url	https://creativecommons.org/licenses/by/4.0/
dc.relation.doi	10.3390/a14010006
dc.relation.funder	Research Council of Finland	en
dc.relation.funder	Research Council of Finland	en
dc.relation.funder	Suomen Akatemia	fi
dc.relation.funder	Suomen Akatemia	fi
jyx.fundingprogram	Academy Programme, AoF	en
jyx.fundingprogram	Research profiles, AoF	en
jyx.fundingprogram	Akatemiaohjelma, SA	fi
jyx.fundingprogram	Profilointi, SA	fi
jyx.fundinginformation	The work has been supported by the Academy of Finland from the projects 311877 (Demo) and 315550 (HNP-AI).
dc.type.okm	A1