Smart prototype selection for machine learning based on ignorance zones analysis
The size of databases has been considerably growing over recent decades and Machine Learning algorithms are not ready to process such large volume of information. Being one of the most useful algorithms in Data Mining the Nearest neighbor classifier suffers from high storage requirements and slow response when working with large data sets. Prototype Selection methods help to alleviate this problem by choosing a subset of data with a smaller size. In this thesis, the overview of existing instance selection methods is provided together with the introduction of a new approach. The majority of current methods select a subset experimentally by checking whether certain point affects classification accuracy or not. The new approach, presented in this thesis, is based on analyzing data set instances and choosing prototypes based on discovered ignorance zones. The results obtained from the analysis show that the proposed method can effectively decrease the size of the data set while maintaining the same classification accuracy with the Nearest neighbor classifier. In addition, it allows removing noisy data making the decision boundaries smoother.
...
Keywords
Metadata
Show full item recordCollections
- Pro gradu -tutkielmat [29135]
Related items
Showing items with similar title or keywords.
-
Minimal learning machine in hyperspectral imaging classification
Hakola, Anna-Maria; Pölönen, Ilkka (SPIE, 2020)A hyperspectral (HS) image is typically a stack of frames, where each frame represents the intensity of a different wavelength of light. Each spatial pixel has a spectrum. In the classification of the HS image, each spectrum ... -
Improvements and applications of the elements of prototype-based clustering
Hämäläinen, Joonas (Jyväskylän yliopisto, 2018) -
Towards Liquid AI in IoT with WebAssembly : A Prototype Implementation
Kotilainen, Pyry; Heikkilä, Ville; Systä, Kari; Mikkonen, Tommi (Springer, 2023)An Internet of Things (IoT) system typically comprises numerous subsystems and devices, such as sensors, actuators, gateways for internet connectivity, cloud services, end-user applications, and analytics. Currently, these ... -
Updating strategies for distance based classification model with recursive least squares
Raita-Hakola, Anna-Maria; Pölönen, Ilkka (Copernicus Publications, 2022)The idea is to create a self-learning Minimal Learning Machine (MLM) model that is computationally efficient, easy to implement and performs with high accuracy. The study has two hypotheses. Experiment A examines the ... -
Adaptive framework for network traffic classification using dimensionality reduction and clustering
Juvonen, Antti; Sipola, Tuomo (IEEE, 2012)Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting ...