Internal Cluster Validation for Data with Missing Values

Niemelä, Marko

dc.contributor.author	Niemelä, Marko
dc.date.accessioned	2022-06-03T12:47:05Z
dc.date.available	2022-06-03T12:47:05Z
dc.date.issued	2022
dc.identifier.isbn	978-951-39-9321-4
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/81469
dc.description.abstract	Clustering is an unsupervised data mining method used to label data into distinct groups. It has numerous applications in various fields, from bioinformatics to object recognition and categorization. The prototype-based clustering methods summarize information in form of cluster centroids that are often called as prototypes. Cluster validation methodology provides a means of assessing the goodness of a clustering solution and identify the optimal number of clusters in the data. Internal cluster validation methods evaluate the quality of clustering by assessing the cluster compactness and separability on the same data set that is input in the clustering phase. A common and sometimes complex issue for both data clustering and cluster validation is the presence of missing values in data that can occur for many different causes, such as non-respondents in questionnaire studies or device operation failures. This dissertation focuses on extending cluster validation models for treating missing values on data. Since these models are not based on the values of the data vectors but on the computed distances between these vectors, missing value treatment is covered by direct distance estimation between data vectors. The thesis presents a toolbox that is used to demonstrate the usability of the developed methods for research and development purposes. In addition, the background theory of each element of the toolbox and use case examples are proposed. A real-world application is provided where cluster validation is utilized for categorizing learning game players into distinct profiles using a gameplay data in which a part of data values are missing. As the main outcome of the thesis, the missing value handling methods for data preprocessing, clustering, and cluster validation are presented. The functionality and validity of the methods are demonstrated using several numerical experiments and the results confirms the scalability of the techniques and their capability of reliably solving knowledge discovery problems. Keywords: knowledge discovery, data mining, log data, data preprocessing, missing values, distance computation, distance estimation, clustering, protype-based clustering, number of clusters, cluster validation, internal cluster validation, cluster validation indices	en
dc.format.mimetype	application/pdf
dc.language.iso	eng
dc.publisher	Jyväskylän yliopisto
dc.relation.ispartofseries	JYU dissertations
dc.relation.haspart	<b>Artikkeli I:</b> Niemelä, M., Äyrämö, S., Ronimus, M., Richardson, U., & Lyytinen, H. (2020). Game learning analytics for understanding reading skills in transparent writing system. <i>British Journal of Educational Technology, 51(6), 2376-2390.</i> DOI: <a href="https://doi.org/10.1111/bjet.12916"target="_blank"> 10.1111/bjet.12916</a>. JYX: <a href="https://jyx.jyu.fi/handle/123456789/67896"target="_blank"> jyx.jyu.fi/handle/123456789/67896</a>
dc.relation.haspart	<b>Artikkeli II:</b> Niemelä, M., Äyrämö, S., & Kärkkäinen, T. (2018). Comparison of cluster validation indices with missing data. In <i>ESANN 2018 : Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (pp. 461-466).</i> <a href="https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2018-16.pdf"target="_blank"> Full text</a>
dc.relation.haspart	<b>Artikkeli III:</b> Niemelä, M., & Kärkkäinen, T. (2022). Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods. In <i>T. T. Tuovinen, J. Periaux, & P. Neittaanmäki (Eds.), Computational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical Challenges (pp. 123-133). Springer. Intelligent Systems, Control and Automation: Science and Engineering, 76.</i> DOI: <a href="https://doi.org/10.1007/978-3-030-70787-3_9"target="_blank"> 10.1007/978-3-030-70787-3_9</a>
dc.relation.haspart	<b>Artikkeli IV:</b> Niemelä, M., Äyrämö, S., & Kärkkäinen, T. (2022). Toolbox for Distance Estimation and Cluster Validation on Data With Missing Values. <i>IEEE Access, 10, 352-367.</i> DOI: <a href="https://doi.org/10.1109/ACCESS.2021.3136435"target="_blank"> 10.1109/ACCESS.2021.3136435</a>
dc.rights	In Copyright
dc.title	Internal Cluster Validation for Data with Missing Values
dc.type	Diss.
dc.identifier.urn	URN:ISBN:978-951-39-9321-4
dc.relation.issn	2489-9003
dc.rights.copyright	© The Author & University of Jyväskylä
dc.rights.accesslevel	openAccess
dc.type.publication	doctoralThesis
dc.format.content	fulltext
dc.rights.url	https://rightsstatements.org/page/InC/1.0/
dc.date.digitised

Aineistoon kuuluvat tiedostot

Nimi:: 978-951-39-9321-4_vaitos100620 ...
Koko:: 5.652Mb
Tiedostomuoto:: PDF

Katso/Avaa

Aineisto kuuluu seuraaviin kokoelmiin

JYU Dissertations [775]
Väitöskirjat [3442]

Näytä suppeat kuvailutiedot

Internal Cluster Validation for Data with Missing Values

Aineistoon kuuluvat tiedostot

Aineisto kuuluu seuraaviin kokoelmiin

Samankaltainen aineisto

Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering ﻿

Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods ﻿

Optical Properties of Metal Clusters and Cluster Arrangements ﻿

Internal control, risk management and internal audit in Finnish public companies ﻿

Farewell to Anarchy : The Myth of International Anarchy and Birth of Anarcophilia in International Relations ﻿

Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering

Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods

Optical Properties of Metal Clusters and Cluster Arrangements

Internal control, risk management and internal audit in Finnish public companies

Farewell to Anarchy : The Myth of International Anarchy and Birth of Anarcophilia in International Relations