Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach

Abstract
Mining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers’ purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefore, proposed an efficient way to mining MFPs with Apache Spark to overcome these issues. For the faster computation and efficient utilization of memory, we utilized a prime number based data transformation technique, in which values of individual transaction have been preserved. After removing null transactions and infrequent items, the resulting transformed dataset becomes denser compared to the original distributions. We tested our proposed algorithms in both real static TDBs and DDSs. Experimental results and performance analysis show that our approach is efficient and scalable to large dataset sizes
Main Authors
Format
Articles Research article
Published
2018
Series
Subjects
Publication in research information system
Publisher
Elsevier Inc.
The permanent address of the publication
https://urn.fi/URN:NBN:fi:jyu-201712184760Use this for linking
Review status
Peer reviewed
ISSN
0020-0255
DOI
https://doi.org/10.1016/j.ins.2017.11.064
Language
English
Published in
Information Sciences
Citation
  • Karim, R., Cochez, M., Beyan, O. D., Ahmed, C. F., & Decker, S. (2018). Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach. Information Sciences, 432, 278-300. https://doi.org/10.1016/j.ins.2017.11.064
License
Open Access
Copyright© 2017 Elsevier Inc. This is a final draft version of an article whose final and definitive form has been published by Elsevier. Published in this repository with the kind permission of the publisher.

Share