Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods
Niemelä, M., & Kärkkäinen, T. (2022). Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods. In T. T. Tuovinen, J. Periaux, & P. Neittaanmäki (Eds.), Computational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical Challenges (pp. 123-133). Springer. Intelligent Systems, Control and Automation: Science and Engineering, 76. https://doi.org/10.1007/978-3-030-70787-3_9
© Springer Nature Switzerland AG 2022
Missing data introduces a challenge in the field of unsupervised learning. In clustering, when the form and the number of clusters are to be determined, one needs to deal with the missing values both in the clustering process and in the cluster validation. In the previous research, the clustering algorithm has been treated using robust clustering methods and available data strategy, and the cluster validation indices have been computed with the partial distance approximation. However, lately special methods for distance estimation with missing values have been proposed and this work is the first one where these methods are systematically applied and tested in clustering and cluster validation. More precisely, we propose, implement, and analyze the use of distance estimation methods to improve the discrimination power of clustering and cluster validation indices. A novel, robust prototype-based clustering process in two stages is suggested. Our results and conclusions confirm the usefulness of the distance estimation methods in clustering but, surprisingly, not in cluster validation. ...
Parent publication ISBN978-3-030-70786-6
Is part of publicationComputational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical Challenges
Publication in research information system
MetadataShow full item record
Related funder(s)Academy of Finland
Funding program(s)Academy Programme, AoF; Research profiles, AoF
Additional information about fundingThe authors would like to thank the Academy of Finland for the financial support (grants 311877 and 315550).
Showing items with similar title or keywords.
Hämäläinen, Joonas (Jyväskylän yliopisto, 2018)Clustering or cluster analysis is an essential part of data mining, machine learning, and pattern recognition. The most popularly applied clustering methods are partitioning-based or prototype-based methods. Prototype-based ...
Niemelä, Marko; Äyrämö, Sami; Kärkkäinen, Tommi (Institute of Electrical and Electronics Engineers (IEEE), 2022)Missing data are unavoidable in the real-world application of unsupervised machine learning, and their nonoptimal processing may decrease the quality of data-driven models. Imputation is a common remedy for missing values, ...
Linja, Joakim; Hämäläinen, Joonas; Nieminen, Paavo; Kärkkäinen, Tommi (MDPI AG, 2020)Minimal Learning Machine (MLM) is a recently popularized supervised learning method, which is composed of distance-regression and multilateration steps. The computational complexity of MLM is dominated by the solution of ...
Coutinho, Diogo; Gonçalves, Bruno; Laakso, Timo; Travassos, Bruno (Public Library of Science (PLoS), 2022)This study aimed to explore which offensive variables best discriminate the ball possession duration according to players specific role (defenders, midfielders, attackers) during a Gk+3vs3+Gk football small-sided games. ...
Zolotukhin, Mikhail (University of Jyväskylä, 2014)