Scalable implementation of dependence clustering in Apache Spark
Ivannikova, E. (2017). Scalable implementation of dependence clustering in Apache Spark. In I. Škrjanc, & S. Blažič (Eds.), EAIS 2017 : Proceedings of the 2017 Evolving and Adaptive Intelligent Systems (EAIS) (pp. 1-6). IEEE. https://doi.org/10.1109/EAIS.2017.7954843
Authors
Date
2017Copyright
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
This article proposes a scalable version of the
Dependence Clustering algorithm which belongs to the class
of spectral clustering methods. The method is implemented
in Apache Spark using GraphX API primitives. Moreover, a
fast approximate diffusion procedure that enables algorithms
of spectral clustering type in Spark environment is introduced.
In addition, the proposed algorithm is benchmarked against
Spectral clustering. Results of applying the method to real-life
data allow concluding that the implementation scales well, yet
demonstrating good performance for densely connected graphs.
Publisher
IEEEParent publication ISBN
978-1-5090-6444-1Conference
Evolving and Adaptive Intelligent SystemsIs part of publication
EAIS 2017 : Proceedings of the 2017 Evolving and Adaptive Intelligent Systems (EAIS)ISSN Search the Publication Forum
2473-4691Keywords
Publication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/27339946
Metadata
Show full item recordCollections
Related items
Showing items with similar title or keywords.
-
Algorithmic issues in computational intelligence optimization : from design to implementation, from implementation to design
Caraffini, Fabio (University of Jyväskylä, 2016)The vertiginous technological growth of the last decades has generated a variety of powerful and complex systems. By embedding within modern hardware devices sophisticated software, they allow the solution of complicated ... -
Improving Scalable K-Means++
Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo (MDPI AG, 2021)Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses ... -
Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering
Hämäläinen, Joonas; Jauhiainen, Susanne; Kärkkäinen, Tommi (MDPI, 2017)Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal ... -
Knowledge mining using robust clustering
Äyrämö, Sami (University of Jyväskylä, 2006)FM Sami Äyrämö tutki väitöstyössään suurten digitaalisten tietomassojen tehokasta hyödyntämistä ja siihen sovellettavia laskennallisesti älykkäitä niin kutsuttuja tiedonlouhintamenetelmiä (data mining). Aihe on ajankohtainen, ... -
Toolbox for Distance Estimation and Cluster Validation on Data With Missing Values
Niemelä, Marko; Äyrämö, Sami; Kärkkäinen, Tommi (Institute of Electrical and Electronics Engineers (IEEE), 2022)Missing data are unavoidable in the real-world application of unsupervised machine learning, and their nonoptimal processing may decrease the quality of data-driven models. Imputation is a common remedy for missing values, ...