Comparative analysis of data stream processing systems

Zeb, Mian Shah

dc.contributor.advisor	Khriyenko, Oleksiy
dc.contributor.advisor	Terziyan, Vagan
dc.contributor.author	Zeb, Mian Shah
dc.date.accessioned	2020-02-24T11:33:39Z
dc.date.available	2020-02-24T11:33:39Z
dc.date.issued	2020
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/67932
dc.description.abstract	Big data-käsittelyjärjestelmät ovat tällä hetkellä kehittymässä stream-orientoituneiksi, eli data käsitellään heti saapuessaan. Perinteisemmin data säilöttiin tietokantaan, tiedostopohjaisesti tai muuhun tiedonsäilytysjärjestelmään, ja applikaatiot hakivat datan tarvittaessa. Stream-pohjainen järjestelmä käsittelee liikkuvaa dataa, jatkuva-aikaista dataa useasta lähteestä. Sen sijaan, että haetaan ajoittain dataa, stream-pohjaiset frameworkit pystyvät käsittelemään dataa heti kun se on saatavilla, täten vähentäen viivettä. Tässä tutkielmassa tehdään komparatiivinen analyysi eri stream-pohjaisten frameworkien välillä, perustuen valittuihin ominaisuuksiin. Tutkittavat frameworkit ovat Apache Samza, Apache Flink, Apache Storm ja Apache Spark Structured Streaming. Tutkielmassa perehdytään myös Apache Kafkaan, joka on lokiperusteinen tietovarasto, jota laajalti käytetään stream-pohjaisissa frameworkeissa.	fi
dc.description.abstract	Big data processing systems are evolving to be more stream oriented where data is processed continuously by processing it as soon as it arrives. Earlier data was often stored in a database, a file system or other form of data storage system. Applications would query the data as needed. Stram processing is the processing of data in motion. It works on continuous data retrieved from different resources. Instead of periodically collecting huge static data, streaming frameworks process data as soon as it becomes available, hence reducing latency. This thesis aims to conduct a comparative analysis of different streaming processors based on selected features. Research focuses on Apache Samza, Apache Flink, Apache Storm and Apache Spark Structured Streaming. Also, this thesis explains Apache Kafka which is a log-based data storage widely used in streaming frameworks.	en
dc.format.extent	48
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject.other	Stream Processing
dc.subject.other	Batch Processing
dc.subject.other	Apache Kafka
dc.subject.other	Apache Samza
dc.subject.other	Streaming Engines
dc.title	Comparative analysis of data stream processing systems
dc.identifier.urn	URN:NBN:fi:jyu-202002242154
dc.type.ontasot	Pro gradu -tutkielma	fi
dc.type.ontasot	Master’s thesis	en
dc.contributor.tiedekunta	Informaatioteknologian tiedekunta	fi
dc.contributor.tiedekunta	Faculty of Information Technology	en
dc.contributor.laitos	Informaatioteknologia	fi
dc.contributor.laitos	Information Technology	en
dc.contributor.yliopisto	Jyväskylän yliopisto	fi
dc.contributor.yliopisto	University of Jyväskylä	en
dc.contributor.oppiaine	Tietojenkäsittelytiede	fi
dc.contributor.oppiaine	Computer Science	en
dc.rights.copyright	Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.	fi
dc.rights.copyright	This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.	en
dc.type.publication	masterThesis
dc.contributor.oppiainekoodi	601
dc.subject.yso	tietojenkäsittely
dc.subject.yso	big data
dc.subject.yso	tietojärjestelmät
dc.subject.yso	tietotekniikka
dc.subject.yso	data
dc.subject.yso	suoratoisto
dc.subject.yso	data processing
dc.subject.yso	big data
dc.subject.yso	data systems
dc.subject.yso	information technology
dc.subject.yso	data
dc.subject.yso	streaming
dc.format.content	fulltext
dc.type.okm	G2

Aineistoon kuuluvat tiedostot

Nimi:: URN:NBN:fi:jyu-202002242154.pdf
Koko:: 426.7Kb
Tiedostomuoto:: PDF

Katso/Avaa

Aineisto kuuluu seuraaviin kokoelmiin

Pro gradu -tutkielmat [29561]

Näytä suppeat kuvailutiedot

Comparative analysis of data stream processing systems

Aineistoon kuuluvat tiedostot

Aineisto kuuluu seuraaviin kokoelmiin

Samankaltainen aineisto

Information Technology–Supported value Co-Creation and Co-Destruction via social interaction and resource integration in service systems ﻿

The determinants affecting on the investment proposals adoption ﻿

The use of Electronic Information Systems in social work : A scoping review of the empirical articles published between 2000 and 2019 ﻿

Features of System-Information Models of the Mechanical Process Based on the Platform (USIS + PLSI) of Digital Twins ﻿

Interdisciplinary perceptions on comparing systems analysis and design to the practices of digital service design ﻿

Information Technology–Supported value Co-Creation and Co-Destruction via social interaction and resource integration in service systems

The determinants affecting on the investment proposals adoption

The use of Electronic Information Systems in social work : A scoping review of the empirical articles published between 2000 and 2019

Features of System-Information Models of the Mechanical Process Based on the Platform (USIS + PLSI) of Digital Twins

Interdisciplinary perceptions on comparing systems analysis and design to the practices of digital service design