Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time
Cochez, M., & Mou, H. (2015). Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time. In T. Sellis, S. B. Davidson, & Z. Ives (Eds.), SIGMOD '15 : Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 505-517). Association for Computing Machinery. https://doi.org/10.1145/2723372.2751521
© 2015 ACM. This is a final draft version of an article whose final and definitive form has been published by ACM. Published in this repository with the kind permission of the publisher.
Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only linear space. Furthermore, its time complexity is linear in the number of items to be clustered, making it feasible to apply it on a larger scale. We evaluate the approach both analytically and by applying it to several data sets.
PublisherAssociation for Computing Machinery
Parent publication ISBN
ConferenceACM SIGMOD international conference on management of data
Is part of publicationSIGMOD '15 : Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
Publication in research information system
MetadataShow full item record
Showing items with similar title or keywords.
Facile fabrication of flower like self-assembled mesoporous hierarchical microarchitectures of In(OH)3 and In2O3: In(OH)3 micro flowers with electron beam sensitive thin petals Prakasam, Balasubramaniam Arul; Lahtinen, Manu; Peuronen, Anssi; Muruganandham, Manickavachagam; Sillanpää, Mika (Elsevier S.A.; Chinese Society for Materials Scien, 2016)A template and capping-reagent free facile fabrication method for mesoporous hierarchical microarchitectures of flower-like In(OH)3 particles under benign hydrothermal conditions is reported. Calcination of In(OH)3 to In2O3 ...
Cochez, Michael; Neri, Ferrante (IEEE, 2015)Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. ...
A hierarchical cluster analysis to determine whether injured runners exhibit similar kinematic gait patterns Jauhiainen, Susanne; Pohl, Andrew J.; Äyrämö, Sami; Kauppi, Jukka-Pekka; Ferber, Reed (Wiley-Blackwell, 2020)Previous studies have suggested that runners can be subgrouped based on homogeneous gait patterns, however, no previous study has assessed the presence of such subgroups in a population of individuals across a wide variety ...
GIS-data related route optimization, hierarchical clustering, location optimization, and kernel density methods are useful for promoting distributed bioenergy plant planning in rural areas Laasasenaho, K.; Lensu, Anssi; Lauhanen, R.; Rintala, J. (Elsevier BV, 2019)Currently, geographic information system (GIS) models are popular for studying location-allocation-related questions concerning bioenergy plants. The aim of this study was to develop a model to investigate optimal locations ...
Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation Korhonen, Pekka (2020)Yhteisöekologian alalla tutkijat ovat usein kiinnostuneita yhden tai useamman kasvi- tai eläinlajin välisistä esiintyvyyssuhteista eri mittauspaikoilla tai ekosysteemeissä. Tämänkaltaiset tutkimuskysymykset johtavat ...