Näytä suppeat kuvailutiedot

dc.contributor.authorCochez, Michael
dc.contributor.authorMou, Hao
dc.contributor.editorSellis, Timos
dc.contributor.editorDavidson, Susan B.
dc.contributor.editorIves, Zack
dc.date.accessioned2015-07-24T05:48:19Z
dc.date.available2015-07-24T05:48:19Z
dc.date.issued2015
dc.identifier.citationCochez, M., & Mou, H. (2015). Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time. In T. Sellis, S. B. Davidson, & Z. Ives (Eds.), <i>SIGMOD '15 : Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data</i> (pp. 505-517). Association for Computing Machinery. <a href="https://doi.org/10.1145/2723372.2751521" target="_blank">https://doi.org/10.1145/2723372.2751521</a>
dc.identifier.isbn978-1-4503-2758-9
dc.identifier.otherCONVID_24741029
dc.identifier.otherTUTKAID_66325
dc.identifier.urihttps://jyx.jyu.fi/handle/123456789/46537
dc.description.abstractMany commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only linear space. Furthermore, its time complexity is linear in the number of items to be clustered, making it feasible to apply it on a larger scale. We evaluate the approach both analytically and by applying it to several data sets.
dc.format.extent2084
dc.language.isoeng
dc.publisherAssociation for Computing Machinery
dc.relation.ispartofSIGMOD '15 : Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
dc.subject.otherhierarchical clustering
dc.subject.otherlocality-sensitive hashing
dc.subject.otheraverage linkage
dc.subject.otherlinear complexity
dc.titleTwister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time
dc.typeconferenceObject
dc.identifier.urnURN:NBN:fi:jyu-201506112286
dc.contributor.laitosTietotekniikan laitosfi
dc.contributor.laitosDepartment of Mathematical Information Technologyen
dc.contributor.oppiaineTietotekniikkafi
dc.contributor.oppiaineMathematical Information Technologyen
dc.type.urihttp://purl.org/eprint/type/ConferencePaper
dc.date.updated2015-06-11T09:15:02Z
dc.relation.isbn978-1-4503-2758-9
dc.type.coarhttp://purl.org/coar/resource_type/c_5794
dc.description.reviewstatuspeerReviewed
dc.format.pagerange505-517
dc.type.versionacceptedVersion
dc.rights.copyright© 2015 ACM. This is a final draft version of an article whose final and definitive form has been published by ACM. Published in this repository with the kind permission of the publisher.
dc.rights.accesslevelopenAccessfi
dc.relation.conferenceACM SIGMOD international conference on management of data
dc.relation.doi10.1145/2723372.2751521
dc.type.okmA4


Aineistoon kuuluvat tiedostot

Thumbnail

Aineisto kuuluu seuraaviin kokoelmiin

Näytä suppeat kuvailutiedot