Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach

Karim, Rezaul; Cochez, Michael; Beyan, Oya Deniz; Ahmed, Chowdhury Farhan; Decker, Stefan

doi:10.1016/j.ins.2017.11.064

dc.contributor.author	Karim, Rezaul
dc.contributor.author	Cochez, Michael
dc.contributor.author	Beyan, Oya Deniz
dc.contributor.author	Ahmed, Chowdhury Farhan
dc.contributor.author	Decker, Stefan
dc.date.accessioned	2017-12-19T08:42:20Z
dc.date.available	2020-03-01T22:35:44Z
dc.date.issued	2018
dc.identifier.citation	Karim, R., Cochez, M., Beyan, O. D., Ahmed, C. F., & Decker, S. (2018). Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach. <i>Information Sciences</i>, <i>432</i>, 278-300. <a href="https://doi.org/10.1016/j.ins.2017.11.064" target="_blank">https://doi.org/10.1016/j.ins.2017.11.064</a>
dc.identifier.other	CONVID_27413860
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/56418
dc.description.abstract	Mining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers’ purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefore, proposed an efficient way to mining MFPs with Apache Spark to overcome these issues. For the faster computation and efficient utilization of memory, we utilized a prime number based data transformation technique, in which values of individual transaction have been preserved. After removing null transactions and infrequent items, the resulting transformed dataset becomes denser compared to the original distributions. We tested our proposed algorithms in both real static TDBs and DDSs. Experimental results and performance analysis show that our approach is efficient and scalable to large dataset sizes
dc.language.iso	eng
dc.publisher	Elsevier Inc.
dc.relation.ispartofseries	Information Sciences
dc.subject.other	transactional databases
dc.subject.other	dynamic data streams
dc.subject.other	null transactions
dc.subject.other	prime number theory
dc.subject.other	apache spark
dc.subject.other	maximal frequent patterns
dc.title	Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach
dc.type	research article
dc.identifier.urn	URN:NBN:fi:jyu-201712184760
dc.contributor.laitos	Informaatioteknologian tiedekunta	fi
dc.contributor.laitos	Faculty of Information Technology	en
dc.contributor.oppiaine	Tietotekniikka	fi
dc.contributor.oppiaine	Mathematical Information Technology	en
dc.type.uri	http://purl.org/eprint/type/JournalArticle
dc.date.updated	2017-12-18T13:15:21Z
dc.type.coar	http://purl.org/coar/resource_type/c_2df8fbb1
dc.description.reviewstatus	peerReviewed
dc.format.pagerange	278-300
dc.relation.issn	0020-0255
dc.relation.numberinseries	0
dc.relation.volume	432
dc.type.version	acceptedVersion
dc.rights.copyright	© 2017 Elsevier Inc. This is a final draft version of an article whose final and definitive form has been published by Elsevier. Published in this repository with the kind permission of the publisher.
dc.rights.accesslevel	openAccess	fi
dc.type.publication	article
dc.subject.yso	tiedonlouhinta
dc.subject.yso	big data
jyx.subject.uri	http://www.yso.fi/onto/yso/p5520
jyx.subject.uri	http://www.yso.fi/onto/yso/p27202
dc.relation.doi	10.1016/j.ins.2017.11.064
dc.type.okm	A1

Files in this item

Name:: karimym1s2.0s002002551731126xm ...
Size:: 1.426Mb
Format:: PDF
Description:: Final Draft

View/Open

This item appears in the following Collection(s)

Informaatioteknologian tiedekunta [2330]

Show simple item record

Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach

Files in this item

This item appears in the following Collection(s)

Related items

An approach for network outage detection from drive-testing databases ﻿

Frequently Using Passwords Increases Their Memorability - A False Assumption or Reality? ﻿

Study of various machine learning approaches to predict default behavior of a borrower based on transactional dataset ﻿

Tourism-related informal interaction in Chembe, Malawi : an ethnographic study ﻿

Real-time sentiment analysis of Twitter public stream ﻿

An approach for network outage detection from drive-testing databases

Frequently Using Passwords Increases Their Memorability - A False Assumption or Reality?

Study of various machine learning approaches to predict default behavior of a borrower based on transactional dataset

Tourism-related informal interaction in Chembe, Malawi : an ethnographic study

Real-time sentiment analysis of Twitter public stream