Knowledge mining using robust clustering

Äyrämö, Sami

dc.contributor.author	Äyrämö, Sami
dc.date.accessioned	2008-01-09T12:56:02Z
dc.date.available	2008-01-09T12:56:02Z
dc.date.issued	2006
dc.identifier.isbn	951-39-2655-9
dc.identifier.other	oai:jykdok.linneanet.fi:999198
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/13286
dc.description.abstract	FM Sami Äyrämö tutki väitöstyössään suurten digitaalisten tietomassojen tehokasta hyödyntämistä ja siihen sovellettavia laskennallisesti älykkäitä niin kutsuttuja tiedonlouhintamenetelmiä (data mining). Aihe on ajankohtainen, sillä informaatiojärjestelmien nopea kehittyminen ja yleistyminen johtavat entistä useammin ”datatulvaan": digitaalisessa muodossa kerätään tietoa niin paljon, että oleellinen informaatio voi hukkua epäoleellisen ja moninkertaisen tiedon sekaan.Väitöstyönsä päätuloksena Äyrämö esittelee luotettavan, laskennallisesti tehokkaan ja käyttäjälle yksinkertaisen klusterointimenetelmän, joka ei ota kantaa sovelluskohteeseen ja on siten hyvin yleiskäyttöinen. Menetelmän pohjana Äyrämö on käyttänyt niin sanottuja prototyyppipohjaisia osittavia klusterointialgoritmeja.Usein tietovarastot ovat liian suuria selailtavaksi manuaalisesti tietokoneella. Datan klusteroinnin tavoitteena on löytää datasta ryhmiä eli klustereita, joiden sisällä havainnot ovat mahdollisimman samanlaisia ja erot ryhmien välillä mahdollisimman suuria. Näin voidaan yksittäisten havaintojen sijaan tarkastella joko ryhmien tai niitä parhaiten kuvaavien havaintojen eli prototyyppien ominaisuuksia.Äyrämön menetelmässä luotettavuus on saavutettu soveltamalla prototyyppien laskentaan niin sanottuja robusteja moniulotteisia estimaatteja, jotka eivät reagoi yhtä herkästi datassa esiintyvään virheisiin ja puutteisiin kuin perinteisemmät vaihtoehdot.Osaksi klusterointimenetelmää Äyrämö on kehittänyt niin kutsuttuun SOR-menetelmään perustuvan iteratiivisen algoritmin, jolla prototyyppiestimaatteja voidaan tehokkaasti ja tarkasti approksimoida. Klusterointimenetelmä kaikkine komponentteineen on toteutettu niin, että loppukäyttäjän ei tarvitse tehdä monimutkaisia esikäsittelyoperaatioita, kuten puuttuvien arvojen ennustamista, ennen varsinaista ryhmittelyä. Menetelmään on myös toteutettu alustusmenetelmä, joka automatisoi menetelmää tuottamalla ryhmittelyalgoritmin parametreille mahdollisimman hyvät alkuarvaukset.	fi
dc.description.abstract	This work is devoted to the development of scalable and robust algorithms for data mining and knowledge discovery problems. The main interest lies in so-called prototype-based clustering methods that are implemented using iterative relocation algorithms. Different elements of prototype-based data clustering are discussed and basic algorithms are described. In order to support the usability of the new methods and algorithms, a modified knowledge mining process model is also proposed. The refined model is based on the well-known knowledge discovery process, but it emphasizes more domain analysis and ''black box'' nature of data mining. Significance and importance of knowledge mining are clarified by outlining the current body of the existing knowledge with real applications.As the main outcome of this thesis, a highly automated robust clustering method is presented. The method consists of a number of separately developed and tested elements such as initialization, prototype estimation, and missing data strategy. Non-smooth nature of the robust statistics is rigorously considered from the point of view of non-smooth optimization. Numerical and statistical properties, such as robustness, scalability, computational and statistical efficiency, of the presented methods are tested and illustrated through a number of numerical experiments. The results are completed with some analytic results and illustrative real-world examples. Furthermore, in order to estimate the correct number of clusters, a new proposal of a cluster validity index is given.	en
dc.format.extent	295 sivua
dc.language.iso	eng
dc.publisher	University of Jyväskylä
dc.relation.ispartofseries	Jyväskylä studies in computing
dc.relation.isversionof	ISBN 951-39-2621-4
dc.rights	In Copyright
dc.subject.other	klusterointi
dc.title	Knowledge mining using robust clustering
dc.type	doctoral thesis
dc.identifier.urn	URN:ISBN:951-39-2655-9
dc.type.dcmitype	Text	en
dc.type.ontasot	Väitöskirja	fi
dc.type.ontasot	Doctoral dissertation	en
dc.contributor.tiedekunta	Informaatioteknologian tiedekunta	fi
dc.contributor.tiedekunta	Faculty of Information Technology	en
dc.contributor.yliopisto	University of Jyväskylä	en
dc.contributor.yliopisto	Jyväskylän yliopisto	fi
dc.contributor.oppiaine	Tietotekniikka	fi
dc.type.coar	http://purl.org/coar/resource_type/c_db06
dc.relation.issn	1456-5390
dc.relation.numberinseries	63
dc.rights.accesslevel	openAccess
dc.type.publication	doctoralThesis
dc.subject.yso	tietojenkäsittely
dc.subject.yso	tiedonlouhinta
dc.subject.yso	tiedonhallinta
dc.subject.yso	tietovarastot
dc.subject.yso	klusterit
dc.rights.url	https://rightsstatements.org/page/InC/1.0/

Aineistoon kuuluvat tiedostot

Nimi:: 9513926559.pdf
Koko:: 8.327Mb
Tiedostomuoto:: PDF

Katso/Avaa

Aineisto kuuluu seuraaviin kokoelmiin

Väitöskirjat [3598]

Näytä suppeat kuvailutiedot

Knowledge mining using robust clustering

Aineistoon kuuluvat tiedostot

Aineisto kuuluu seuraaviin kokoelmiin

Samankaltainen aineisto

Research literature clustering using diffusion maps ﻿

Improvements and applications of the elements of prototype-based clustering ﻿

Clustering of vocabulary for different levels of Finnish learners of EFL : a content analysis on textbooks ﻿

Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering ﻿

Viitekehys tietovarastojärjestelmälle ja sen soveltaminen SQL-standardin ja Oraclen tarjoaman analysointituen arviointiin ﻿

Research literature clustering using diffusion maps

Improvements and applications of the elements of prototype-based clustering

Clustering of vocabulary for different levels of Finnish learners of EFL : a content analysis on textbooks

Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering

Viitekehys tietovarastojärjestelmälle ja sen soveltaminen SQL-standardin ja Oraclen tarjoaman analysointituen arviointiin