dc.contributor.author | Hämäläinen, Joonas | |
dc.contributor.author | Kärkkäinen, Tommi | |
dc.contributor.author | Rossi, Tuomo | |
dc.date.accessioned | 2021-01-14T14:02:20Z | |
dc.date.available | 2021-01-14T14:02:20Z | |
dc.date.issued | 2021 | |
dc.identifier.citation | Hämäläinen, J., Kärkkäinen, T., & Rossi, T. (2021). Improving Scalable K-Means++. <i>Algorithms</i>, <i>14</i>(1), Article 6. <a href="https://doi.org/10.3390/a14010006" target="_blank">https://doi.org/10.3390/a14010006</a> | |
dc.identifier.other | CONVID_47636982 | |
dc.identifier.uri | https://jyx.jyu.fi/handle/123456789/73628 | |
dc.description.abstract | Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art by improving clustering accuracy and the speed of convergence. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases | en |
dc.format.mimetype | application/pdf | |
dc.language | eng | |
dc.language.iso | eng | |
dc.publisher | MDPI AG | |
dc.relation.ispartofseries | Algorithms | |
dc.rights | CC BY 4.0 | |
dc.subject.other | clustering initialization | |
dc.subject.other | K-means‖ | |
dc.subject.other | K-means++ | |
dc.subject.other | random projection | |
dc.title | Improving Scalable K-Means++ | |
dc.type | article | |
dc.identifier.urn | URN:NBN:fi:jyu-202101141104 | |
dc.contributor.laitos | Informaatioteknologian tiedekunta | fi |
dc.contributor.laitos | Faculty of Information Technology | en |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | |
dc.type.coar | http://purl.org/coar/resource_type/c_2df8fbb1 | |
dc.description.reviewstatus | peerReviewed | |
dc.relation.issn | 1999-4893 | |
dc.relation.numberinseries | 1 | |
dc.relation.volume | 14 | |
dc.type.version | publishedVersion | |
dc.rights.copyright | © 2020 by the authors. Licensee MDPI, Basel, Switzerland | |
dc.rights.accesslevel | openAccess | fi |
dc.relation.grantnumber | 315550 | |
dc.relation.grantnumber | 311877 | |
dc.subject.yso | algoritmit | |
dc.subject.yso | tiedonlouhinta | |
dc.subject.yso | klusterianalyysi | |
dc.subject.yso | algoritmiikka | |
dc.format.content | fulltext | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p14524 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p5520 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p27558 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p3365 | |
dc.rights.url | https://creativecommons.org/licenses/by/4.0/ | |
dc.relation.doi | 10.3390/a14010006 | |
dc.relation.funder | Research Council of Finland | en |
dc.relation.funder | Research Council of Finland | en |
dc.relation.funder | Suomen Akatemia | fi |
dc.relation.funder | Suomen Akatemia | fi |
jyx.fundingprogram | Academy Programme, AoF | en |
jyx.fundingprogram | Research profiles, AoF | en |
jyx.fundingprogram | Akatemiaohjelma, SA | fi |
jyx.fundingprogram | Profilointi, SA | fi |
jyx.fundinginformation | The work has been supported by the Academy of Finland from the projects 311877 (Demo) and 315550 (HNP-AI). | |
dc.type.okm | A1 | |