Scalable implementation of dependence clustering in Apache Spark
Ivannikova, E. (2017). Scalable implementation of dependence clustering in Apache Spark. In I. Škrjanc, & S. Blažič (Eds.), EAIS 2017 : Proceedings of the 2017 Evolving and Adaptive Intelligent Systems (EAIS) (pp. 1-6). IEEE. https://doi.org/10.1109/EAIS.2017.7954843
Tekijät
Päivämäärä
2017Tekijänoikeudet
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
This article proposes a scalable version of the
Dependence Clustering algorithm which belongs to the class
of spectral clustering methods. The method is implemented
in Apache Spark using GraphX API primitives. Moreover, a
fast approximate diffusion procedure that enables algorithms
of spectral clustering type in Spark environment is introduced.
In addition, the proposed algorithm is benchmarked against
Spectral clustering. Results of applying the method to real-life
data allow concluding that the implementation scales well, yet
demonstrating good performance for densely connected graphs.
Julkaisija
IEEEEmojulkaisun ISBN
978-1-5090-6444-1Konferenssi
Evolving and Adaptive Intelligent SystemsKuuluu julkaisuun
EAIS 2017 : Proceedings of the 2017 Evolving and Adaptive Intelligent Systems (EAIS)ISSN Hae Julkaisufoorumista
2473-4691Asiasanat
Julkaisu tutkimustietojärjestelmässä
https://converis.jyu.fi/converis/portal/detail/Publication/27339946
Metadata
Näytä kaikki kuvailutiedotKokoelmat
Samankaltainen aineisto
Näytetään aineistoja, joilla on samankaltainen nimeke tai asiasanat.
-
Algorithmic issues in computational intelligence optimization : from design to implementation, from implementation to design
Caraffini, Fabio (University of Jyväskylä, 2016)The vertiginous technological growth of the last decades has generated a variety of powerful and complex systems. By embedding within modern hardware devices sophisticated software, they allow the solution of complicated ... -
Improving Scalable K-Means++
Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo (MDPI AG, 2021)Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses ... -
Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering
Hämäläinen, Joonas; Jauhiainen, Susanne; Kärkkäinen, Tommi (MDPI, 2017)Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal ... -
Knowledge mining using robust clustering
Äyrämö, Sami (University of Jyväskylä, 2006)FM Sami Äyrämö tutki väitöstyössään suurten digitaalisten tietomassojen tehokasta hyödyntämistä ja siihen sovellettavia laskennallisesti älykkäitä niin kutsuttuja tiedonlouhintamenetelmiä (data mining). Aihe on ajankohtainen, ... -
Toolbox for Distance Estimation and Cluster Validation on Data With Missing Values
Niemelä, Marko; Äyrämö, Sami; Kärkkäinen, Tommi (Institute of Electrical and Electronics Engineers (IEEE), 2022)Missing data are unavoidable in the real-world application of unsupervised machine learning, and their nonoptimal processing may decrease the quality of data-driven models. Imputation is a common remedy for missing values, ...
Ellei toisin mainittu, julkisesti saatavilla olevia JYX-metatietoja (poislukien tiivistelmät) saa vapaasti uudelleenkäyttää CC0-lisenssillä.