Machine learning based ISA detection for short shellcodes

Niiranen, Antti

dc.contributor.advisor	Costin, Andrei
dc.contributor.author	Niiranen, Antti
dc.date.accessioned	2021-06-21T11:04:43Z
dc.date.available	2021-06-21T11:04:43Z
dc.date.issued	2021
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/76761
dc.description.abstract	Hyökkäyskoodi (engl. shellcode) on usein käytössä kyberrikollisuudessa, kun tarkoituksena on tunkeutua erilaisiin tietoteknisiin järjestelmiin. Koodi-injektio on yhä toimiva hyökkäysmenetelmä, sillä ohjelmistohaavoittuvuudet eivät ole kadonneet mihinkään. Tyypillisesti tällainen koodi kirjoitetaan konekielellä. Perinteisesti näitä hyökkäyskoodeja on analysoitu takaisinmallintamalla, mutta menetelmän vaikeuden takia on ryhdytty turvautumaan koneoppimiseen, jotta prosessista tulisi helpompi. Tutkielmassa tehdyn kirjallisuuskatsauksen avulla hankittiin tietoa hyökkäyskoodeista, tekoälystä ja koneoppimisesta. Tässä tutkielmassa selvitettiin, kuinka tarkasti viimeisintä tekniikkaa edustava koneoppimispohjainen sovellus havaitsee hyökkäyskoodin käskykanta-arkkitehtuurin. Tutkimus oli kokeellinen ja se suoritettiin virtuaaliympäristössä muun muassa turvallisuuden takia. Työssä rakennettiin reaalimaailmaan perustuva hyökkäyskooditietokanta, joka sisältää noin 20000 hyökkäyskooditiedostoa 15 eri arkkitehtuurille. Koodit hankittiin kolmesta eri lähteestä, jotka ovat Exploit Database, Shell-Storm ja MSFvenom. Näistä koodeista koostettiin pienempi joukko testaamista varten. Tutkimuksen rajoituksia pohdittaessa todettiin, että testitietokanta saattaa olla liian suppea, mutta sen avulla kuitenkin pystyttiin kartoittamaan sovelluksen tämänhetkinen toiminta. Testeissä selvisi, että sovellus ei tällä hetkellä kykene havaitsemaan hyökkäyskoodin käskykanta-arkkitehtuuria riittävällä tarkkuudella. Kahta eri skannausasetusta testattiin, joista molemmat saavuttivat noin 30% tarkkuuden. Sovelluksen luokittelijat testattiin myös, niistä satunnaismetsä toimi parhaiten.	fi
dc.description.abstract	Shellcodes are often used by cybercriminals in order to breach computer systems. Code injection is still a viable attack method because software vulnerabilities have not ceased to exist. Typically these codes are written in assembly language. Traditional method of analysis has been reverse engineering, but as it can be difficult and time-consuming, machine learning has been utilized to make the process easier. A literature review was performed to gain an understanding about shellcodes, artificial intelligence and machine learning. This thesis explores how accurately a state-of-the-art machine learning ISA detection tool can detect the instruction set architecture from short shellcodes. The used method was experimental research, and the research was conducted in a virtual environment mainly for safety reasons. Using three different sources which were Exploit Database, Shell-Storm and MSFvenom, approximately 20000 shellcodes for 15 different architectures were collected. Using these files, a smaller set of shellcodes was created in order to test the performance of a machine learning based ISA detection tool. When limitations were identified, it was noted that the test set may not be diverse or large enough. Nevertheless, with this set it was possible to gain an understanding on how the program currently handles shellcodes. The study found that with the current training, the program is not able to reliably detect ISA from the shellcodes of the database. Two different detection options were used and they both achieved the accuracy of approximately 30%. The different classifiers were tested as well and random forest had the best performance.	en
dc.format.extent	94
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject.other	shellcode
dc.subject.other	code analysis
dc.title	Machine learning based ISA detection for short shellcodes
dc.identifier.urn	URN:NBN:fi:jyu-202106213954
dc.type.ontasot	Pro gradu -tutkielma	fi
dc.type.ontasot	Master’s thesis	en
dc.contributor.tiedekunta	Informaatioteknologian tiedekunta	fi
dc.contributor.tiedekunta	Faculty of Information Technology	en
dc.contributor.laitos	Informaatioteknologia	fi
dc.contributor.laitos	Information Technology	en
dc.contributor.yliopisto	Jyväskylän yliopisto	fi
dc.contributor.yliopisto	University of Jyväskylä	en
dc.contributor.oppiaine	Kyberturvallisuus	fi
dc.contributor.oppiaine	Kyberturvallisuus	en
dc.rights.copyright	Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.	fi
dc.rights.copyright	This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.	en
dc.type.publication	masterThesis
dc.contributor.oppiainekoodi	601
dc.subject.yso	kyberturvallisuus
dc.subject.yso	koneoppiminen
dc.subject.yso	tekoäly
dc.subject.yso	cyber security
dc.subject.yso	machine learning
dc.subject.yso	artificial intelligence
dc.format.content	fulltext
dc.type.okm	G2

Files in this item

Name:: URN:NBN:fi:jyu-202106213954.pdf
Size:: 560.5Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Pro gradu -tutkielmat [29543]

Show simple item record

Machine learning based ISA detection for short shellcodes

Files in this item

This item appears in the following Collection(s)

Related items

Strategic cyber threat intelligence : Building the situational picture with emerging technologies ﻿

On Attacking Future 5G Networks with Adversarial Examples : Survey ﻿

Adversarial Attack’s Impact on Machine Learning Model in Cyber-Physical Systems ﻿

Artificial Intelligence for Cybersecurity : A Systematic Mapping of Literature ﻿

Analysing Multidimensional Strategies for Cyber Threat Detection in Security Monitoring ﻿

Strategic cyber threat intelligence : Building the situational picture with emerging technologies

On Attacking Future 5G Networks with Adversarial Examples : Survey

Adversarial Attack’s Impact on Machine Learning Model in Cyber-Physical Systems

Artificial Intelligence for Cybersecurity : A Systematic Mapping of Literature

Analysing Multidimensional Strategies for Cyber Threat Detection in Security Monitoring