Diversity in Search Strategies for Ensemble Feature Selection
Tsymbal, A., Pechenizkiy, M., & Cunningham, P. (2005). Diversity in Search Strategies for Ensemble Feature Selection. Information fusion, 6(1), 83-98. https://doi.org/10.1016/j.inffus.2004.04.003
Published in
Information fusionDate
2005Copyright
© 2004 Elsevier B.V. All rights reserved.
Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. Ensembles allow us to achieve higher accuracy, which is often not achievable with single models. It was shown theoretically and experimentally that in order for an ensemble to be effective, it should consist of base classifiers that have diversity in their predictions. One technique, which proved to be effective for constructing an ensemble of diverse base classifiers, is the use of different feature subsets, or so-called ensemble feature selection. Many ensemble feature selection strategies incorporate diversity as an objective in the search for the best collection of feature subsets. A number of ways are known to quantify diversity in ensembles of classifiers, and little research has been done about their appropriateness to ensemble feature selection. In this paper, we compare five measures of diversity with regard to their possible use in ensemble feature selection. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the ensemble accuracy and other characteristics for the ensembles built with ensemble feature selection based on the considered measures of diversity. We consider four search strategies for ensemble feature selection together with the simple random subspacing: genetic search, hill-climbing, and ensemble forward and backward sequential selection. In the experiments, we show that, in some cases, the ensemble feature selection process can be sensitive to the choice of the diversity measure, and that the question of the superiority of a particular measure depends on the context of the use of diversity and on the data being processed. In many cases and on average, the plain disagreement measure is the best. Genetic search, kappa, and dynamic voting with selection form the best combination of a search strategy, diversity measure and integration method.
...
Publisher
ElsevierISSN Search the Publication Forum
1566-2535Keywords
Publication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/15086328
Metadata
Show full item recordCollections
Additional information about funding
This material is based upon works supported by the Science Foundation Ireland under Grant No. S.F.I.-02IN.1I111. This research is partly supported by the COMAS Graduate School of the University of Jyväskylä, Finland.License
Related items
Showing items with similar title or keywords.
-
Assessment of Classifiers and Remote Sensing Features of Hyperspectral Imagery and Stereo-Photogrammetric Point Clouds for Recognition of Tree Species in a Forest Area of High Species Diversity
Tuominen, Sakari; Näsi, Roope; Honkavaara, Eija; Balazs, Andras; Hakala, Teemu; Viljanen, Niko; Pölönen, Ilkka; Saari, Heikki; Ojanen, Harri (MDPI, 2018)Recognition of tree species and geospatial information on tree species composition is essential for forest management. In this study, tree species recognition was examined using hyperspectral imagery from visible to ... -
Dynamic integration of classifiers for handling concept drift
Tsymbal, Alexey; Pechenizkiy, Mykola; Cunningham, Padraig; Puuronen, Seppo (Elsevier, 2008)In the real world concepts are often not stable but change with time. A typical example of this in the biomedical context is antibiotic resistance, where pathogen sensitivity may change over time as new pathogen strains ... -
Unstable feature relevance in classification tasks
Skrypnyk, Iryna (University of Jyväskylä, 2011) -
Feature extraction for supervised learning in knowledge discovery systems
Pechenizkiy, Mykola (University of Jyväskylä, 2005)Tiedon louhinnalla pyritään paljastamaan tietokannasta tietomassaan sisältyviä säännönmukaisuuksia, joiden olemassaolosta ei vielä olla tietoisia. Kun tietokantaan sisältyvät tiedot ovat kovin moniulotteisia, yksittäisten ... -
Sustainability strategies for business : an integrated approach with a life cycle perspective
Horn, Susanna (University of Jyväskylä, 2014)