Show simple item record

dc.contributor.authorKarim, Rezaul
dc.contributor.authorCochez, Michael
dc.contributor.authorBeyan, Oya Deniz
dc.contributor.authorAhmed, Chowdhury Farhan
dc.contributor.authorDecker, Stefan
dc.date.accessioned2017-12-19T08:42:20Z
dc.date.available2020-03-01T22:35:44Z
dc.date.issued2018
dc.identifier.citationKarim, R., Cochez, M., Beyan, O. D., Ahmed, C. F., & Decker, S. (2018). Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach. <i>Information Sciences</i>, <i>432</i>, 278-300. <a href="https://doi.org/10.1016/j.ins.2017.11.064" target="_blank">https://doi.org/10.1016/j.ins.2017.11.064</a>
dc.identifier.otherCONVID_27413860
dc.identifier.urihttps://jyx.jyu.fi/handle/123456789/56418
dc.description.abstractMining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers’ purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefore, proposed an efficient way to mining MFPs with Apache Spark to overcome these issues. For the faster computation and efficient utilization of memory, we utilized a prime number based data transformation technique, in which values of individual transaction have been preserved. After removing null transactions and infrequent items, the resulting transformed dataset becomes denser compared to the original distributions. We tested our proposed algorithms in both real static TDBs and DDSs. Experimental results and performance analysis show that our approach is efficient and scalable to large dataset sizes
dc.language.isoeng
dc.publisherElsevier Inc.
dc.relation.ispartofseriesInformation Sciences
dc.subject.othertransactional databases
dc.subject.otherdynamic data streams
dc.subject.othernull transactions
dc.subject.otherprime number theory
dc.subject.otherapache spark
dc.subject.othermaximal frequent patterns
dc.titleMining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach
dc.typeresearch article
dc.identifier.urnURN:NBN:fi:jyu-201712184760
dc.contributor.laitosInformaatioteknologian tiedekuntafi
dc.contributor.laitosFaculty of Information Technologyen
dc.contributor.oppiaineTietotekniikkafi
dc.contributor.oppiaineMathematical Information Technologyen
dc.type.urihttp://purl.org/eprint/type/JournalArticle
dc.date.updated2017-12-18T13:15:21Z
dc.type.coarhttp://purl.org/coar/resource_type/c_2df8fbb1
dc.description.reviewstatuspeerReviewed
dc.format.pagerange278-300
dc.relation.issn0020-0255
dc.relation.numberinseries0
dc.relation.volume432
dc.type.versionacceptedVersion
dc.rights.copyright© 2017 Elsevier Inc. This is a final draft version of an article whose final and definitive form has been published by Elsevier. Published in this repository with the kind permission of the publisher.
dc.rights.accesslevelopenAccessfi
dc.type.publicationarticle
dc.subject.ysotiedonlouhinta
dc.subject.ysobig data
jyx.subject.urihttp://www.yso.fi/onto/yso/p5520
jyx.subject.urihttp://www.yso.fi/onto/yso/p27202
dc.relation.doi10.1016/j.ins.2017.11.064
dc.type.okmA1


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record