Darknet-liikenteen analysointi koneoppimisalgoritmeilla

Arikainen, Anna

Katso/Avaa

4.5 Mb

Lataukset:

Show download details Hide download details

Tekijät

Arikainen, Anna

Päivämäärä

2023

Oppiaine

Tietotekniikka Mathematical Information Technology

Tekijänoikeudet

Tämä pro gradu -tutkielma käsittelee Darknet 2020 -nimisen datasetin testaamista random forest-, gradient boosting- ja logistic regression-algoritmeilla. Tutkimus toteutettiin konstruktiivisena tutkimuksena. Tutkimuksen aineisto koostuu New Brunswick yliopiston tutkijoiden Habibi Lashkarin, Kaurin ja Rahalin tekemästä artikkelista DIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic using Deep Image Learning sekä heidän tuottamastaan Darknet 2020 -datasetistä. Tutkimuksen tarkoituksena oli selvittää, miten koneoppimisen algoritmit selviytyvät datasetissä olevan darknet-tietoliikennettä imitoivan datan luokitellusta sekä verrata saatuja tuloksia tutkijoiden esittelemään syväoppimisen malliin nimeltä DIDarknet. Tutkimuksen lopputuloksena voidaan nähdä useamman eri koneoppimisalgoritmin tarkkudet luokitella datasetin tietoliikenne Label-ominaisuuden perusteella. Random forest -algoritmi suoriutui luokittelutehtävästä huomattavasti kahta muuta algoritmia pare ... showmore

This master's thesis deals with testing the Darknet 2020 dataset with random forest, gradient boosting and logistic regression algorithms. The study was carried out as a constructive study. The material of the study consists of the article \emph{DIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic using Deep Image Learning} by researchers Habibi Lashkari, Kaur and Rahali of the University of New Brunswick and the Darknet 2020 dataset produced by them. The purpose of the study was to find out how the machine learning algorithms cope with the classification of the data simulating darknet communication in the dataset, and to compare the obtained results with the deep learning model presented by the researchers called DIDarknet. The final result of the research is the accuracy of several different machine learning algorithms to classify data traffic based on the Label feature. The random forest algorithm performed the classification task significantly better ... showmore

Lisenssi