dc.contributor.author | Cochez, Michael | |
dc.contributor.author | Mou, Hao | |
dc.contributor.editor | Sellis, Timos | |
dc.contributor.editor | Davidson, Susan B. | |
dc.contributor.editor | Ives, Zack | |
dc.date.accessioned | 2015-07-24T05:48:19Z | |
dc.date.available | 2015-07-24T05:48:19Z | |
dc.date.issued | 2015 | |
dc.identifier.citation | Cochez, M., & Mou, H. (2015). Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time. In T. Sellis, S. B. Davidson, & Z. Ives (Eds.), <i>SIGMOD '15 : Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data</i> (pp. 505-517). Association for Computing Machinery. <a href="https://doi.org/10.1145/2723372.2751521" target="_blank">https://doi.org/10.1145/2723372.2751521</a> | |
dc.identifier.isbn | 978-1-4503-2758-9 | |
dc.identifier.other | CONVID_24741029 | |
dc.identifier.other | TUTKAID_66325 | |
dc.identifier.uri | https://jyx.jyu.fi/handle/123456789/46537 | |
dc.description.abstract | Many commonly used data-mining techniques utilized across
research fields perform poorly when used for large data sets.
Sequential agglomerative hierarchical non-overlapping clustering
is one technique for which the algorithms’ scaling
properties prohibit clustering of a large amount of items.
Besides the unfavorable time complexity of O(n
2
), these
algorithms have a space complexity of O(n
2
), which can
be reduced to O(n) if the time complexity is allowed to
rise to O(n
2
log2 n). In this paper, we propose the use of
locality-sensitive hashing combined with a novel data structure
called twister tries to provide an approximate clustering
for average linkage. Our approach requires only linear space.
Furthermore, its time complexity is linear in the number of
items to be clustered, making it feasible to apply it on a
larger scale. We evaluate the approach both analytically
and by applying it to several data sets. | |
dc.format.extent | 2084 | |
dc.language.iso | eng | |
dc.publisher | Association for Computing Machinery | |
dc.relation.ispartof | SIGMOD '15 : Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data | |
dc.subject.other | hierarchical clustering | |
dc.subject.other | locality-sensitive hashing | |
dc.subject.other | average linkage | |
dc.subject.other | linear complexity | |
dc.title | Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time | |
dc.type | conferenceObject | |
dc.identifier.urn | URN:NBN:fi:jyu-201506112286 | |
dc.contributor.laitos | Tietotekniikan laitos | fi |
dc.contributor.laitos | Department of Mathematical Information Technology | en |
dc.contributor.oppiaine | Tietotekniikka | fi |
dc.contributor.oppiaine | Mathematical Information Technology | en |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | |
dc.date.updated | 2015-06-11T09:15:02Z | |
dc.relation.isbn | 978-1-4503-2758-9 | |
dc.type.coar | http://purl.org/coar/resource_type/c_5794 | |
dc.description.reviewstatus | peerReviewed | |
dc.format.pagerange | 505-517 | |
dc.type.version | acceptedVersion | |
dc.rights.copyright | © 2015 ACM. This is a final draft version of an article whose final and definitive form has been published by ACM. Published in this repository with the kind permission of the publisher. | |
dc.rights.accesslevel | openAccess | fi |
dc.relation.conference | ACM SIGMOD international conference on management of data | |
dc.relation.doi | 10.1145/2723372.2751521 | |
dc.type.okm | A4 | |