Scalable Hierarchical Clustering : Twister Tries with a Posteriori Trie Elimination
Cochez, M., & Neri, F. (2015). Scalable Hierarchical Clustering : Twister Tries with a Posteriori Trie Elimination. In SSCI 2015 : Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence. Symposium CIDM 2015 : 6th IEEE Symposium on Computational Intelligence and Data Mining (pp. 756-763). IEEE. https://doi.org/10.1109/SSCI.2015.12
Date
2015Copyright
© 2015 IEEE. This is an author's post-print version of an article whose final and definitive form has been published in the conference proceeding by IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. This is a generally accepted fact and cannot be improved without using specifics of certain metric spaces. Twister tries is an algorithm that produces a dendrogram (i.e., Outcome of a hierarchical clustering) which resembles the one produced by AHC, while only needing linear space and time. However, twister tries are sensitive to rare, but still possible, hash evaluations. These might have a disastrous effect on the final outcome. We propose the use of a metaheuristic algorithm to overcome this sensitivity and show how approximate computations of dendrogram quality can help to evaluate the heuristic within reasonable time. The proposed metaheuristic is based on an evolutionary framework and integrates a surrogate model of the fitness within it to enhance the algorithmic performance in terms of computational time.
...
Publisher
IEEEParent publication ISBN
978-1-4799-7560-0Conference
IEEE Symposium on Computational Intelligence and Data MiningIs part of publication
SSCI 2015 : Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence. Symposium CIDM 2015 : 6th IEEE Symposium on Computational Intelligence and Data MiningKeywords
Publication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/25335406
Metadata
Show full item recordCollections
Related items
Showing items with similar title or keywords.
-
Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time
Cochez, Michael; Mou, Hao (Association for Computing Machinery, 2015)Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the ... -
Scalable robust clustering method for large and sparse data
Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo (ESANN, 2018)Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. Moreprecisely, a ... -
Scalable implementation of dependence clustering in Apache Spark
Ivannikova, Elena (IEEE, 2017)This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, ... -
A hierarchical cluster analysis to determine whether injured runners exhibit similar kinematic gait patterns
Jauhiainen, Susanne; Pohl, Andrew J.; Äyrämö, Sami; Kauppi, Jukka-Pekka; Ferber, Reed (Wiley-Blackwell, 2020)Previous studies have suggested that runners can be subgrouped based on homogeneous gait patterns, however, no previous study has assessed the presence of such subgroups in a population of individuals across a wide variety ... -
GIS-data related route optimization, hierarchical clustering, location optimization, and kernel density methods are useful for promoting distributed bioenergy plant planning in rural areas
Laasasenaho, K.; Lensu, Anssi; Lauhanen, R.; Rintala, J. (Elsevier BV, 2019)Currently, geographic information system (GIS) models are popular for studying location-allocation-related questions concerning bioenergy plants. The aim of this study was to develop a model to investigate optimal locations ...