Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods
Niemelä, M., & Kärkkäinen, T. (2022). Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods. In T. T. Tuovinen, J. Periaux, & P. Neittaanmäki (Eds.), Computational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical Challenges (pp. 123-133). Springer. Intelligent Systems, Control and Automation: Science and Engineering, 76. https://doi.org/10.1007/978-3-030-70787-3_9
Date
2022Copyright
© Springer Nature Switzerland AG 2022
Missing data introduces a challenge in the field of unsupervised learning. In clustering, when the form and the number of clusters are to be determined, one needs to deal with the missing values both in the clustering process and in the cluster validation. In the previous research, the clustering algorithm has been treated using robust clustering methods and available data strategy, and the cluster validation indices have been computed with the partial distance approximation. However, lately special methods for distance estimation with missing values have been proposed and this work is the first one where these methods are systematically applied and tested in clustering and cluster validation. More precisely, we propose, implement, and analyze the use of distance estimation methods to improve the discrimination power of clustering and cluster validation indices. A novel, robust prototype-based clustering process in two stages is suggested. Our results and conclusions confirm the usefulness of the distance estimation methods in clustering but, surprisingly, not in cluster validation.
...
Publisher
SpringerParent publication ISBN
978-3-030-70786-6Is part of publication
Computational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical ChallengesISSN Search the Publication Forum
2213-8986Keywords
Publication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/100292105
Metadata
Show full item recordCollections
Related funder(s)
Research Council of FinlandFunding program(s)
Academy Programme, AoF; Research profiles, AoFAdditional information about funding
The authors would like to thank the Academy of Finland for the financial support (grants 311877 and 315550).License
Related items
Showing items with similar title or keywords.
-
Improvements and applications of the elements of prototype-based clustering
Hämäläinen, Joonas (Jyväskylän yliopisto, 2018) -
Toolbox for Distance Estimation and Cluster Validation on Data With Missing Values
Niemelä, Marko; Äyrämö, Sami; Kärkkäinen, Tommi (Institute of Electrical and Electronics Engineers (IEEE), 2022)Missing data are unavoidable in the real-world application of unsupervised machine learning, and their nonoptimal processing may decrease the quality of data-driven models. Imputation is a common remedy for missing values, ... -
On data mining applications in mobile networking and network security
Zolotukhin, Mikhail (University of Jyväskylä, 2014) -
Improving Scalable K-Means++
Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo (MDPI AG, 2021)Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses ... -
Scalable robust clustering method for large and sparse data
Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo (ESANN, 2018)Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. Moreprecisely, a ...