Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach
Karim, R., Cochez, M., Beyan, O. D., Ahmed, C. F., & Decker, S. (2018). Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach. Information Sciences, 432, 278-300. https://doi.org/10.1016/j.ins.2017.11.064
Published in
Information SciencesAuthors
Date
2018Copyright
© 2017 Elsevier Inc. This is a final draft version of an article whose final and definitive form has been published by Elsevier. Published in this repository with the kind permission of the publisher.
Mining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers’ purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefore, proposed an efficient way to mining MFPs with Apache Spark to overcome these issues. For the faster computation and efficient utilization of memory, we utilized a prime number based data transformation technique, in which values of individual transaction have been preserved. After removing null transactions and infrequent items, the resulting transformed dataset becomes denser compared to the original distributions. We tested our proposed algorithms in both real static TDBs and DDSs. Experimental results and performance analysis show that our approach is efficient and scalable to large dataset sizes
...
Publisher
Elsevier Inc.ISSN Search the Publication Forum
0020-0255Keywords
Publication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/27413860
Metadata
Show full item recordCollections
Related items
Showing items with similar title or keywords.
-
An approach for network outage detection from drive-testing databases
Turkka, Jussi; Chernogorov, Fedor; Brigatti, Kimmo; Ristaniemi, Tapani; Lempiäinen, Jukka (Hindawi Publishing Corporation, 2012)A data-mining framework for analyzing a cellular network drive testing database is described in this paper. The presented method is designed to detect sleeping base stations, network outage, and change of the dominance ... -
Frequently Using Passwords Increases Their Memorability - A False Assumption or Reality?
Woods, Naomi (AIS Electronic Library (AISeL), 2017)Password memorability is a significant problem that is getting worse as the numbers grow. As a direct result of memory limitations, adopted insecure password practices have substantial consequences as organizations lose ... -
Study of various machine learning approaches to predict default behavior of a borrower based on transactional dataset
Hossain, Mohammad Farhad (2021)Predicting ‘default’ behavior of borrowers is quite challenging and time consuming, although financial institutions require faster and more reliable decision on loan applications to survive in the competitive market. ... -
Tourism-related informal interaction in Chembe, Malawi : an ethnographic study
Vuorensyrjä, Katja (2016)Tutkielmassa tarkastellaan turismin liittyvää vapaamuotoista kanssakäymistä paikallisten malawilaisten sekä ulkomaalaisten asukkaiden ja matkailijoiden välillä. Etnografinen pääaineisto on kerätty Malawissa keväällä 2013 ... -
Real-time sentiment analysis of Twitter public stream
Akhavan Rahnama, Amir (2015)Sentiment analysis on Twitter public stream has been a topic of research recently. Several non-commercial libraries and software were developed to perform sentiment analysis, however none of them performed the analytics ...