Internal Cluster Validation for Data with Missing Values
Julkaistu sarjassa
JYU dissertationsTekijät
Päivämäärä
2022Tekijänoikeudet
© The Author & University of Jyväskylä
Clustering is an unsupervised data mining method used to label data into distinct groups. It has numerous applications in various fields, from bioinformatics to object recognition and categorization. The prototype-based clustering methods summarize information in form of cluster centroids that are often called as prototypes. Cluster validation methodology provides a means of assessing the goodness of a clustering solution and identify the optimal number of clusters in the data. Internal cluster validation methods evaluate the quality of clustering by assessing the cluster compactness and separability on the same data set that is input in the clustering phase. A common and sometimes complex issue for both data clustering and cluster validation is the presence of missing values in data that can occur for many different causes, such as non-respondents in questionnaire studies or device operation failures.
This dissertation focuses on extending cluster validation models for treating missing values on data. Since these models are not based on the values of the data vectors but on the computed distances between these vectors, missing value treatment is covered by direct distance estimation between data vectors. The thesis presents a toolbox that is used to demonstrate the usability of the developed methods for research and development purposes. In addition, the background theory of each element of the toolbox and use case examples are proposed. A real-world application is provided where cluster validation is utilized for categorizing learning game players into distinct profiles using a gameplay data in which a part of data values are missing. As the main outcome of the thesis, the missing value handling methods for data preprocessing, clustering, and cluster validation are presented. The functionality and validity of the methods are demonstrated using several numerical experiments and the results confirms the scalability of the techniques and their capability of reliably solving knowledge discovery problems.
Keywords: knowledge discovery, data mining, log data, data preprocessing, missing values, distance computation, distance estimation, clustering, protype-based clustering, number of clusters, cluster validation, internal cluster validation, cluster validation indices
...
Julkaisija
Jyväskylän yliopistoISBN
978-951-39-9321-4ISSN Hae Julkaisufoorumista
2489-9003Julkaisuun sisältyy osajulkaisuja
- Artikkeli I: Niemelä, M., Äyrämö, S., Ronimus, M., Richardson, U., & Lyytinen, H. (2020). Game learning analytics for understanding reading skills in transparent writing system. British Journal of Educational Technology, 51(6), 2376-2390. DOI: 10.1111/bjet.12916. JYX: jyx.jyu.fi/handle/123456789/67896
- Artikkeli II: Niemelä, M., Äyrämö, S., & Kärkkäinen, T. (2018). Comparison of cluster validation indices with missing data. In ESANN 2018 : Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (pp. 461-466). Full text
- Artikkeli III: Niemelä, M., & Kärkkäinen, T. (2022). Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods. In T. T. Tuovinen, J. Periaux, & P. Neittaanmäki (Eds.), Computational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical Challenges (pp. 123-133). Springer. Intelligent Systems, Control and Automation: Science and Engineering, 76. DOI: 10.1007/978-3-030-70787-3_9
- Artikkeli IV: Niemelä, M., Äyrämö, S., & Kärkkäinen, T. (2022). Toolbox for Distance Estimation and Cluster Validation on Data With Missing Values. IEEE Access, 10, 352-367. DOI: 10.1109/ACCESS.2021.3136435
Metadata
Näytä kaikki kuvailutiedotKokoelmat
- JYU Dissertations [836]
- Väitöskirjat [3535]
Lisenssi
Samankaltainen aineisto
Näytetään aineistoja, joilla on samankaltainen nimeke tai asiasanat.
-
Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering
Hämäläinen, Joonas; Jauhiainen, Susanne; Kärkkäinen, Tommi (MDPI, 2017)Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal ... -
Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods
Niemelä, Marko; Kärkkäinen, Tommi (Springer, 2022)Missing data introduces a challenge in the field of unsupervised learning. In clustering, when the form and the number of clusters are to be determined, one needs to deal with the missing values both in the clustering ... -
Optical Properties of Metal Clusters and Cluster Arrangements
Selenius, Elli (Jyväskylän yliopisto, 2020)Metal clusters are nanoparticles that have from two to thousands of metal atoms. The properties of metal clusters are extremely size-dependent, and adding or removing even one atom can make a difference. The optical response ... -
Internal control, risk management and internal audit in Finnish public companies
Vaittinen, Anniina (2015) -
Farewell to Anarchy : The Myth of International Anarchy and Birth of Anarcophilia in International Relations
Korvela, Paul-Erik (Manchester University Press, 2018)
Ellei toisin mainittu, julkisesti saatavilla olevia JYX-metatietoja (poislukien tiivistelmät) saa vapaasti uudelleenkäyttää CC0-lisenssillä.