Näytä suppeat kuvailutiedot

dc.contributor.authorNiemelä, Marko
dc.date.accessioned2022-06-03T12:47:05Z
dc.date.available2022-06-03T12:47:05Z
dc.date.issued2022
dc.identifier.isbn978-951-39-9321-4
dc.identifier.urihttps://jyx.jyu.fi/handle/123456789/81469
dc.description.abstractClustering is an unsupervised data mining method used to label data into distinct groups. It has numerous applications in various fields, from bioinformatics to object recognition and categorization. The prototype-based clustering methods summarize information in form of cluster centroids that are often called as prototypes. Cluster validation methodology provides a means of assessing the goodness of a clustering solution and identify the optimal number of clusters in the data. Internal cluster validation methods evaluate the quality of clustering by assessing the cluster compactness and separability on the same data set that is input in the clustering phase. A common and sometimes complex issue for both data clustering and cluster validation is the presence of missing values in data that can occur for many different causes, such as non-respondents in questionnaire studies or device operation failures. This dissertation focuses on extending cluster validation models for treating missing values on data. Since these models are not based on the values of the data vectors but on the computed distances between these vectors, missing value treatment is covered by direct distance estimation between data vectors. The thesis presents a toolbox that is used to demonstrate the usability of the developed methods for research and development purposes. In addition, the background theory of each element of the toolbox and use case examples are proposed. A real-world application is provided where cluster validation is utilized for categorizing learning game players into distinct profiles using a gameplay data in which a part of data values are missing. As the main outcome of the thesis, the missing value handling methods for data preprocessing, clustering, and cluster validation are presented. The functionality and validity of the methods are demonstrated using several numerical experiments and the results confirms the scalability of the techniques and their capability of reliably solving knowledge discovery problems. Keywords: knowledge discovery, data mining, log data, data preprocessing, missing values, distance computation, distance estimation, clustering, protype-based clustering, number of clusters, cluster validation, internal cluster validation, cluster validation indicesen
dc.format.mimetypeapplication/pdf
dc.language.isoeng
dc.publisherJyväskylän yliopisto
dc.relation.ispartofseriesJYU dissertations
dc.relation.haspart<b>Artikkeli I:</b> Niemelä, M., Äyrämö, S., Ronimus, M., Richardson, U., & Lyytinen, H. (2020). Game learning analytics for understanding reading skills in transparent writing system. <i>British Journal of Educational Technology, 51(6), 2376-2390.</i> DOI: <a href="https://doi.org/10.1111/bjet.12916"target="_blank"> 10.1111/bjet.12916</a>. JYX: <a href="https://jyx.jyu.fi/handle/123456789/67896"target="_blank"> jyx.jyu.fi/handle/123456789/67896</a>
dc.relation.haspart<b>Artikkeli II:</b> Niemelä, M., Äyrämö, S., & Kärkkäinen, T. (2018). Comparison of cluster validation indices with missing data. In <i>ESANN 2018 : Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (pp. 461-466).</i> <a href="https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2018-16.pdf"target="_blank"> Full text</a>
dc.relation.haspart<b>Artikkeli III:</b> Niemelä, M., & Kärkkäinen, T. (2022). Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods. In <i>T. T. Tuovinen, J. Periaux, & P. Neittaanmäki (Eds.), Computational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical Challenges (pp. 123-133). Springer. Intelligent Systems, Control and Automation: Science and Engineering, 76.</i> DOI: <a href="https://doi.org/10.1007/978-3-030-70787-3_9"target="_blank"> 10.1007/978-3-030-70787-3_9</a>
dc.relation.haspart<b>Artikkeli IV:</b> Niemelä, M., Äyrämö, S., & Kärkkäinen, T. (2022). Toolbox for Distance Estimation and Cluster Validation on Data With Missing Values. <i>IEEE Access, 10, 352-367.</i> DOI: <a href="https://doi.org/10.1109/ACCESS.2021.3136435"target="_blank"> 10.1109/ACCESS.2021.3136435</a>
dc.rightsIn Copyright
dc.titleInternal Cluster Validation for Data with Missing Values
dc.typeDiss.
dc.identifier.urnURN:ISBN:978-951-39-9321-4
dc.relation.issn2489-9003
dc.rights.copyright© The Author & University of Jyväskylä
dc.rights.accesslevelopenAccess
dc.type.publicationdoctoralThesis
dc.format.contentfulltext
dc.rights.urlhttps://rightsstatements.org/page/InC/1.0/
dc.date.digitised


Aineistoon kuuluvat tiedostot

Thumbnail

Aineisto kuuluu seuraaviin kokoelmiin

Näytä suppeat kuvailutiedot

In Copyright
Ellei muuten mainita, aineiston lisenssi on In Copyright