Scalable implementation of dependence clustering in Apache Spark
Ivannikova, E. (2017). Scalable implementation of dependence clustering in Apache Spark. In I. Škrjanc, & S. Blažič (Eds.), EAIS 2017 : Proceedings of the 2017 Evolving and Adaptive Intelligent Systems (EAIS) (pp. 1-6). IEEE. https://doi.org/10.1109/EAIS.2017.7954843
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs.
Parent publication ISBN978-1-5090-6444-1
ConferenceEvolving and Adaptive Intelligent Systems
Is part of publicationEAIS 2017 : Proceedings of the 2017 Evolving and Adaptive Intelligent Systems (EAIS)
Publication in research information system
MetadataShow full item record
Showing items with similar title or keywords.
Algorithmic issues in computational intelligence optimization : from design to implementation, from implementation to design Caraffini, Fabio (University of Jyväskylä, 2016)The vertiginous technological growth of the last decades has generated a variety of powerful and complex systems. By embedding within modern hardware devices sophisticated software, they allow the solution of complicated ...
Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo (MDPI AG, 2021)Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses ...
Hämäläinen, Joonas; Jauhiainen, Susanne; Kärkkäinen, Tommi (MDPI, 2017)Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal ...
Äyrämö, Sami (University of Jyväskylä, 2006)FM Sami Äyrämö tutki väitöstyössään suurten digitaalisten tietomassojen tehokasta hyödyntämistä ja siihen sovellettavia laskennallisesti älykkäitä niin kutsuttuja tiedonlouhintamenetelmiä (data mining). Aihe on ajankohtainen, ...
Weber, Matthieu (University of Jyväskylä, 2010)