Feature extraction for supervised learning in knowledge discovery systems

Pechenizkiy, Mykola

dc.contributor.author	Pechenizkiy, Mykola
dc.date.accessioned	2008-01-09T12:55:46Z
dc.date.available	2008-01-09T12:55:46Z
dc.date.issued	2005
dc.identifier.isbn	951-39-2271-5
dc.identifier.other	oai:jykdok.linneanet.fi:975629
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/13253
dc.description.abstract	Tiedon louhinnalla pyritään paljastamaan tietokannasta tietomassaan sisältyviä säännönmukaisuuksia, joiden olemassaolosta ei vielä olla tietoisia. Kun tietokantaan sisältyvät tiedot ovat kovin moniulotteisia, yksittäisten tapausten sisältäessä lukuisia piirteitä, monen koneoppimisen menetelmän suorituskyky heikkenee ratkaisevasti. Tätä ilmiötä nimitetään ”moniulotteisuuden kiroukseksi”, koska se johtaa usein sekä koneellisen käsittelyn monimutkaisuuden että käsittelyn yhteydessä syntyvien luokitusvirheiden kasvuun. Toisaalta tietokantaan mahdollisesti sisältyvät epärelevantit tai vain epäsuorasti relevantit piirteet tarjoavat heikon esitysavaruuden tietokannan käsiterakenteen kuvaamiseen. Piirteiden muodostamisella pyritäänkin joko ulotteisuuden pienentämiseen tai esitysavaruuden parantamiseen, tai molempiin, ohjatun koneoppimisen tarpeita varten.Työ koostuu erillisistä artikkeleista ja niihin tukeutuvasta yhteenvedosta. Kukin artikkeli käsittelee yhtä tai kahta tutkimuskysymystä ja niihin liittyviä havaintoja, jotka Pechenizkiy lopuksi yhdistää ehdotukseksi sellaiseksi järjestelyksi, jonka avulla tiedonlouhintatekniikoiden ja niiden kombinaatioiden käyttökokemuksia kokoamalla voidaan systemaattisesti tukea sopivimman tiedonlouhintastrategian valintaa.	fi
dc.description.abstract	Knowledge discovery or data mining is the process of finding previously unknown and potentially interesting patterns and relations in large databases. The so-called “curse of dimensionality” pertinent to many learning algorithms, denotes the drastic increase in computational complexity and classification error with data having a great number of dimensions. Beside this problem, some individual features, being irrelevant or indirectly relevant for the learning concepts, form poor problem representation space. The purpose of this study is to develop theoretical background and practical aspects of feature extraction (FE) as means of (1) dimensionality reduction, and (2) representation space improvement, for supervised learning (SL) in knowledge discovery systems. The focus is on applying conventional Principal Component Analysis (PCA) and two class-conditional approaches for two targets: (1) for a base level classifier construction, and (2) for dynamic integration of the base level classifiers. Theoretical bases are derived from classical studies in data mining, machine learning and pattern recognition. The software prototype for the experimental study is built within WEKA open-source machine-learning library in Java. The different aspects of the experimental study on a number of benchmark and real-world data sets include analyses of (1) importance of class information use in the FE process; (2) (dis-)advantages of using either extracted features or both original and extracted features for SL; (3) applying FE globally to the whole data and locally within natural clusters; (4) the effect of sampling reduction on FE for SL; and (5) the problems of FE techniques selection for SL for a problem at consideration. The hypothesis and detailed results of the many-sided experimental research process are reported in the corresponding papers included in the thesis. The main contributions of the thesis can be divided into contribution (1) to current theoretical knowledge and (2) to development of practical suggestion on applying FE for SL.	en
dc.format.extent	86 sivua
dc.language.iso	eng
dc.publisher	University of Jyväskylä
dc.relation.ispartofseries	Jyväskylä studies in computing
dc.relation.isversionof	ISBN 951-39-2299-5
dc.rights	In Copyright
dc.title	Feature extraction for supervised learning in knowledge discovery systems
dc.type	Diss.
dc.identifier.urn	URN:ISBN:951-39-2271-5
dc.type.dcmitype	Text	en
dc.type.ontasot	Väitöskirja	fi
dc.type.ontasot	Doctoral dissertation	en
dc.contributor.tiedekunta	Informaatioteknologian tiedekunta	fi
dc.contributor.tiedekunta	Faculty of Information Technology	en
dc.contributor.yliopisto	University of Jyväskylä	en
dc.contributor.yliopisto	Jyväskylän yliopisto	fi
dc.contributor.oppiaine	Tietojenkäsittelytiede	fi
dc.relation.issn	1456-5390
dc.relation.numberinseries	56
dc.rights.accesslevel	openAccess
dc.subject.yso	tiedonlouhinta
dc.subject.yso	tiedonhankinta
dc.subject.yso	tiedonhakujärjestelmät
dc.subject.yso	tietokannat
dc.subject.yso	tietotekniikka
dc.subject.yso	koneoppiminen
dc.rights.url	https://rightsstatements.org/page/InC/1.0/