Toolbox for Distance Estimation and Cluster Validation on Data With Missing Values
Niemelä, M., Äyrämö, S., & Kärkkäinen, T. (2022). Toolbox for Distance Estimation and Cluster Validation on Data With Missing Values. IEEE Access, 10, 352-367. https://doi.org/10.1109/ACCESS.2021.3136435
Published in
IEEE AccessDate
2022Discipline
TekniikkaKoulutusteknologia ja kognitiotiedeComputing, Information Technology and MathematicsHuman and Machine based Intelligence in LearningLaskennallinen tiedeEngineeringLearning and Cognitive SciencesComputing, Information Technology and MathematicsHuman and Machine based Intelligence in LearningComputational ScienceCopyright
© Authors, 2022
Missing data are unavoidable in the real-world application of unsupervised machine learning, and their nonoptimal processing may decrease the quality of data-driven models. Imputation is a common remedy for missing values, but directly estimating expected distances have also emerged. Because treatment of missing values is rarely considered in clustering related tasks and distance metrics have a central role both in clustering and cluster validation, we developed a new toolbox that provides a wide range of algorithms for data preprocessing, distance estimation, clustering, and cluster validation in the presence of missing values. All these are core elements in any comprehensive cluster analysis methodology. We describe the methodological background of the implemented algorithms and present multiple illustrations of their use. The experiments include validating distance estimation methods against selected reference methods and demonstrating the performance of internal cluster validation indices. The experimental results demonstrate the general usability of the toolbox for the straightforward realization of alternate data processing pipelines. Source code, data sets, results, and example macros are available on GitHub. https://github.com/markoniem/nanclustering_toolbox
...
Publisher
Institute of Electrical and Electronics Engineers (IEEE)ISSN Search the Publication Forum
2169-3536Keywords
Publication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/104072467
Metadata
Show full item recordCollections
Related funder(s)
Research Council of FinlandFunding program(s)
Academy Programme, AoF; Research profiles, AoFAdditional information about funding
Academy of Finland under Grant 311877 (Demo) and Grant 315550.License
Related items
Showing items with similar title or keywords.
-
Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods
Niemelä, Marko; Kärkkäinen, Tommi (Springer, 2022)Missing data introduces a challenge in the field of unsupervised learning. In clustering, when the form and the number of clusters are to be determined, one needs to deal with the missing values both in the clustering ... -
Knowledge mining using robust clustering
Äyrämö, Sami (University of Jyväskylä, 2006)FM Sami Äyrämö tutki väitöstyössään suurten digitaalisten tietomassojen tehokasta hyödyntämistä ja siihen sovellettavia laskennallisesti älykkäitä niin kutsuttuja tiedonlouhintamenetelmiä (data mining). Aihe on ajankohtainen, ... -
Scalable implementation of dependence clustering in Apache Spark
Ivannikova, Elena (IEEE, 2017)This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, ... -
Clustering and Structural Robustness in Causal Diagrams
Tikka, Santtu; Helske, Jouni; Karvanen, Juha (JMLR, 2023)Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study ... -
Kvanttitietokoneiden sovelluskohteet tietotekniikassa
Kääriäinen, Nico (2022)Tämän tutkielman tavoitteena on antaa lukijalle yleiskuva kvanttitietokoneiden toiminnasta, sekä tutustua muutamiin niiden sovelluskohteisiin tietotekniikan alalla. Tarkasteltuja sovelluskohteita ovat RSA-salauksen purkaminen, ...