Big data : challenges, ecosystems and technologies

Rautiainen, Wiljam

dc.contributor.advisor	Saarela, Mirka
dc.contributor.advisor	Hämäläinen, Joonas
dc.contributor.author	Rautiainen, Wiljam
dc.date.accessioned	2022-06-21T10:36:25Z
dc.date.available	2022-06-21T10:36:25Z
dc.date.issued	2022
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/81931
dc.description.abstract	Tiedonkeruu ja -hallinta ovat kokeneet merkittäviä muutoksia viimeisen 50-vuoden aikana ja ovat tuoneet uusia tapoja ja teknologioita tiedon hallintaan ja tallentamiseen. Tuotamme nykyään valtavia määriä dataa ja käytämme tätä dataa yhä enemmän yhteiskunnan eri alueilla. Kasvava tietomäärä on luonut uusia ongelmia datan käytössä. Termistä big data on tullut laaja termi viittamaan valtavia datajoukkoja, joita ei voida prosessoida käyttäen hyväksi perinteisiä tietojenkäsittelysovelluksia. Nämä massiiviset datajoukot ovat luoneet uusia teknologioita ja ekosysteemejä näiden tietokokonaisuuksien käsittelemiseksi. Termit tietoallas, tietovarasto Apache Hadoop ja Apache Spark liitetään usein termiin big data. Tämä tutkielma tutkii, mitä big data on ja mistä komponenteista sen ekosysteemi koostuu. Tutkielmassa tarkastellaan ensin, miten tiedonhallinta on kehittynyt historian aikana ja miten olemme päätyneet nykyiseen tilanteeseen. Tämän jälkeen tutkielmassa tarkastellaan, miten big data määritellään tieteellisessä kirjallisuudessa ja mistä osista sen ekosysteemin koostuu. Seuraavaksi tutkielmassa tarkastellaan kahta yleisintä big data teknologiaa, Apache Hadoop, Apache Spark- teknologiaa. Tämän tutkielman tarkoituksena on selventää termiä big data ja tutkia, miten sen eri osat määritellään tieteellisessä kirjallisuudessa, sekä miten sen sisältämät kokonaisuudet ilmaistaan tieteellisessä kirjallisuudessa.	fi
dc.description.abstract	Data collection and management have undergone significant changes over the past 50 years, introducing new ways and technologies for data management and data storing. Data has become increasingly more used in various areas of society, and we are now generating enormous amounts of data. This rising amount of data has created new problems when using this vast amount of data. Big data has become a broad term for enormous datasets that traditional data processing applications cannot process. Big data has created new technologies and ecosystems to process these datasets. The terms data lake, data warehouse, Apache Hadoop, and Apache Spark are often linked with big data applications. This thesis explores what big data is and what components its ecosystem consists of. The thesis will first examine how data management has evolved over history and how we have ended up in the current situation. The thesis then examines how big data is defined in the academic literature and what parts its ecosystem consists of. Next, the thesis will examine the two most common ways of big data data processing technologies, Apache Hadoop and Apache Spark. In sum, this thesis aims at clarifying the term big data and studying how its various aspects are defined in the academic literature.	en
dc.format.extent	56
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.rights	In Copyright	en
dc.subject.other	big data ecosystems
dc.subject.other	Apache Spark
dc.title	Big data : challenges, ecosystems and technologies
dc.type	master thesis
dc.identifier.urn	URN:NBN:fi:jyu-202206213538
dc.type.ontasot	Pro gradu -tutkielma	fi
dc.type.ontasot	Master’s thesis	en
dc.contributor.tiedekunta	Informaatioteknologian tiedekunta	fi
dc.contributor.tiedekunta	Faculty of Information Technology	en
dc.contributor.laitos	Informaatioteknologia	fi
dc.contributor.laitos	Information Technology	en
dc.contributor.yliopisto	Jyväskylän yliopisto	fi
dc.contributor.yliopisto	University of Jyväskylä	en
dc.contributor.oppiaine	Tietotekniikka	fi
dc.contributor.oppiaine	Mathematical Information Technology	en
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc
dc.rights.accesslevel	restrictedAccess
dc.type.publication	masterThesis
dc.contributor.oppiainekoodi	602
dc.subject.yso	big data
dc.subject.yso	Apache Hadoop
dc.subject.yso	big data
dc.subject.yso	Apache Hadoop
dc.format.content	fulltext
dc.rights.url	https://rightsstatements.org/page/InC/1.0/
dc.rights.accessrights	The author has not given permission to make the work publicly available electronically. Therefore the material can be read only at the archival workstation at Jyväskylä University Library (https://kirjasto.jyu.fi/collections/archival-workstation).	en
dc.rights.accessrights	Tekijä ei ole antanut lupaa avoimeen julkaisuun, joten aineisto on luettavissa vain Jyväskylän yliopiston kirjaston arkistotyösemalta. Ks. https://kirjasto.jyu.fi/kokoelmat/arkistotyoasema..	fi
dc.type.okm	G2

Aineistoon kuuluvat tiedostot

Nimi:: URN:NBN:fi:jyu-202206213538.pdf
Koko:: 575.4Kb
Tiedostomuoto:: PDF

Katso/Avaa

Aineisto kuuluu seuraaviin kokoelmiin

Pro gradu -tutkielmat [29747]

Näytä suppeat kuvailutiedot

Big data : challenges, ecosystems and technologies

Aineistoon kuuluvat tiedostot

Aineisto kuuluu seuraaviin kokoelmiin

Samankaltainen aineisto

Quantum Software Ecosystem : Stakeholders, Interactions and Challenges ﻿

Self-Sovereign Identity Ecosystems : Benefits and Challenges ﻿

Exploring the Finnish Impact Investing Ecosystem : Perspectives on Challenges from Technology Startups ﻿

Challenge to define and quantify ecosystem collapse debt ﻿

Are we solving the right challenges? : evaluating the roles and responsibilities of public governance in emerging talent hub ecosystems : case study: City of Jyväskylä ﻿

Quantum Software Ecosystem : Stakeholders, Interactions and Challenges

Self-Sovereign Identity Ecosystems : Benefits and Challenges

Exploring the Finnish Impact Investing Ecosystem : Perspectives on Challenges from Technology Startups

Challenge to define and quantify ecosystem collapse debt

Are we solving the right challenges? : evaluating the roles and responsibilities of public governance in emerging talent hub ecosystems : case study: City of Jyväskylä