Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time

Cochez, Michael; Mou, Hao

doi:10.1145/2723372.2751521

dc.contributor.author	Cochez, Michael
dc.contributor.author	Mou, Hao
dc.contributor.editor	Sellis, Timos
dc.contributor.editor	Davidson, Susan B.
dc.contributor.editor	Ives, Zack
dc.date.accessioned	2015-07-24T05:48:19Z
dc.date.available	2015-07-24T05:48:19Z
dc.date.issued	2015
dc.identifier.citation	Cochez, M., & Mou, H. (2015). Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time. In T. Sellis, S. B. Davidson, & Z. Ives (Eds.), <i>SIGMOD '15 : Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data</i> (pp. 505-517). Association for Computing Machinery. <a href="https://doi.org/10.1145/2723372.2751521" target="_blank">https://doi.org/10.1145/2723372.2751521</a>
dc.identifier.isbn	978-1-4503-2758-9
dc.identifier.other	CONVID_24741029
dc.identifier.other	TUTKAID_66325
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/46537
dc.description.abstract	Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only linear space. Furthermore, its time complexity is linear in the number of items to be clustered, making it feasible to apply it on a larger scale. We evaluate the approach both analytically and by applying it to several data sets.
dc.format.extent	2084
dc.language.iso	eng
dc.publisher	Association for Computing Machinery
dc.relation.ispartof	SIGMOD '15 : Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
dc.subject.other	hierarchical clustering
dc.subject.other	locality-sensitive hashing
dc.subject.other	average linkage
dc.subject.other	linear complexity
dc.title	Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time
dc.type	conferenceObject
dc.identifier.urn	URN:NBN:fi:jyu-201506112286
dc.contributor.laitos	Tietotekniikan laitos	fi
dc.contributor.laitos	Department of Mathematical Information Technology	en
dc.contributor.oppiaine	Tietotekniikka	fi
dc.contributor.oppiaine	Mathematical Information Technology	en
dc.type.uri	http://purl.org/eprint/type/ConferencePaper
dc.date.updated	2015-06-11T09:15:02Z
dc.relation.isbn	978-1-4503-2758-9
dc.type.coar	http://purl.org/coar/resource_type/c_5794
dc.description.reviewstatus	peerReviewed
dc.format.pagerange	505-517
dc.type.version	acceptedVersion
dc.rights.copyright	© 2015 ACM. This is a final draft version of an article whose final and definitive form has been published by ACM. Published in this repository with the kind permission of the publisher.
dc.rights.accesslevel	openAccess	fi
dc.relation.conference	ACM SIGMOD international conference on management of data
dc.relation.doi	10.1145/2723372.2751521
dc.type.okm	A4

Aineistoon kuuluvat tiedostot

Nimi:: cochezmousigmod15finalcamerare ...
Koko:: 1.072Mb
Tiedostomuoto:: PDF
Kuvaus:: Final Draft

Katso/Avaa

Aineisto kuuluu seuraaviin kokoelmiin

Informaatioteknologian tiedekunta [2259]

Näytä suppeat kuvailutiedot

Näytetään aineistoja, joilla on samankaltainen nimeke tai asiasanat.

Locality-sensitive hashing for massive string-based ontology matching

Cochez, Michael (IEEE, 2014)

This paper reports initial research results related to the use of locality-sensitive hashing (LSH) for string-based matching of big ontologies. Two ways of transforming the matching problem into a LSH problem are proposed ...
Facile fabrication of flower like self-assembled mesoporous hierarchical microarchitectures of In(OH)3 and In2O3: In(OH)3 micro flowers with electron beam sensitive thin petals

Prakasam, Balasubramaniam Arul; Lahtinen, Manu; Peuronen, Anssi; Muruganandham, Manickavachagam; Sillanpää, Mika (Elsevier S.A.; Chinese Society for Materials Scien, 2016)

A template and capping-reagent free facile fabrication method for mesoporous hierarchical microarchitectures of flower-like In(OH)3 particles under benign hydrothermal conditions is reported. Calcination of In(OH)3 to In2O3 ...
Scalable Hierarchical Clustering : Twister Tries with a Posteriori Trie Elimination

Cochez, Michael; Neri, Ferrante (IEEE, 2015)

Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. ...
A hierarchical cluster analysis to determine whether injured runners exhibit similar kinematic gait patterns

Jauhiainen, Susanne; Pohl, Andrew J.; Äyrämö, Sami; Kauppi, Jukka-Pekka; Ferber, Reed (Wiley-Blackwell, 2020)

Previous studies have suggested that runners can be subgrouped based on homogeneous gait patterns, however, no previous study has assessed the presence of such subgroups in a population of individuals across a wide variety ...
GIS-data related route optimization, hierarchical clustering, location optimization, and kernel density methods are useful for promoting distributed bioenergy plant planning in rural areas

Laasasenaho, K.; Lensu, Anssi; Lauhanen, R.; Rintala, J. (Elsevier BV, 2019)

Currently, geographic information system (GIS) models are popular for studying location-allocation-related questions concerning bioenergy plants. The aim of this study was to develop a model to investigate optimal locations ...

Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time

Aineistoon kuuluvat tiedostot

Aineisto kuuluu seuraaviin kokoelmiin

Samankaltainen aineisto

Locality-sensitive hashing for massive string-based ontology matching ﻿

Facile fabrication of flower like self-assembled mesoporous hierarchical microarchitectures of In(OH)3 and In2O3: In(OH)3 micro flowers with electron beam sensitive thin petals ﻿

Scalable Hierarchical Clustering : Twister Tries with a Posteriori Trie Elimination ﻿

A hierarchical cluster analysis to determine whether injured runners exhibit similar kinematic gait patterns ﻿

GIS-data related route optimization, hierarchical clustering, location optimization, and kernel density methods are useful for promoting distributed bioenergy plant planning in rural areas ﻿

Locality-sensitive hashing for massive string-based ontology matching

Facile fabrication of flower like self-assembled mesoporous hierarchical microarchitectures of In(OH)3 and In2O3: In(OH)3 micro flowers with electron beam sensitive thin petals

Scalable Hierarchical Clustering : Twister Tries with a Posteriori Trie Elimination

A hierarchical cluster analysis to determine whether injured runners exhibit similar kinematic gait patterns

GIS-data related route optimization, hierarchical clustering, location optimization, and kernel density methods are useful for promoting distributed bioenergy plant planning in rural areas