Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods
Niemelä, M., & Kärkkäinen, T. (2022). Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods. In T. T. Tuovinen, J. Periaux, & P. Neittaanmäki (Eds.), Computational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical Challenges (pp. 123-133). Springer. Intelligent Systems, Control and Automation: Science and Engineering, 76. https://doi.org/10.1007/978-3-030-70787-3_9
Julkaistu sarjassa
Intelligent Systems, Control and Automation: Science and EngineeringPäivämäärä
2022Tekijänoikeudet
© Springer Nature Switzerland AG 2022
Missing data introduces a challenge in the field of unsupervised learning. In clustering, when the form and the number of clusters are to be determined, one needs to deal with the missing values both in the clustering process and in the cluster validation. In the previous research, the clustering algorithm has been treated using robust clustering methods and available data strategy, and the cluster validation indices have been computed with the partial distance approximation. However, lately special methods for distance estimation with missing values have been proposed and this work is the first one where these methods are systematically applied and tested in clustering and cluster validation. More precisely, we propose, implement, and analyze the use of distance estimation methods to improve the discrimination power of clustering and cluster validation indices. A novel, robust prototype-based clustering process in two stages is suggested. Our results and conclusions confirm the usefulness of the distance estimation methods in clustering but, surprisingly, not in cluster validation.
...
Julkaisija
SpringerEmojulkaisun ISBN
978-3-030-70786-6Kuuluu julkaisuun
Computational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical ChallengesISSN Hae Julkaisufoorumista
2213-8986Asiasanat
Julkaisu tutkimustietojärjestelmässä
https://converis.jyu.fi/converis/portal/detail/Publication/100292105
Metadata
Näytä kaikki kuvailutiedotKokoelmat
Rahoittaja(t)
Suomen AkatemiaRahoitusohjelmat(t)
Akatemiaohjelma, SA; Profilointi, SALisätietoja rahoituksesta
The authors would like to thank the Academy of Finland for the financial support (grants 311877 and 315550).Lisenssi
Samankaltainen aineisto
Näytetään aineistoja, joilla on samankaltainen nimeke tai asiasanat.
-
Improvements and applications of the elements of prototype-based clustering
Hämäläinen, Joonas (Jyväskylän yliopisto, 2018) -
Toolbox for Distance Estimation and Cluster Validation on Data With Missing Values
Niemelä, Marko; Äyrämö, Sami; Kärkkäinen, Tommi (Institute of Electrical and Electronics Engineers (IEEE), 2022)Missing data are unavoidable in the real-world application of unsupervised machine learning, and their nonoptimal processing may decrease the quality of data-driven models. Imputation is a common remedy for missing values, ... -
On data mining applications in mobile networking and network security
Zolotukhin, Mikhail (University of Jyväskylä, 2014) -
Improving Scalable K-Means++
Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo (MDPI AG, 2021)Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses ... -
Scalable robust clustering method for large and sparse data
Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo (ESANN, 2018)Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. Moreprecisely, a ...
Ellei toisin mainittu, julkisesti saatavilla olevia JYX-metatietoja (poislukien tiivistelmät) saa vapaasti uudelleenkäyttää CC0-lisenssillä.