University of Jyväskylä | JYX Digital Repository

  • English  | Give feedback |
    • suomi
    • English
 
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.
View Item 
  • JYX
  • Opinnäytteet
  • Pro gradu -tutkielmat
  • View Item
JYX > Opinnäytteet > Pro gradu -tutkielmat > View Item

Automatic identification of architecture and endianness using binary file contents

Thumbnail
View/Open
763.9 Kb

Downloads:  
Show download detailsHide download details  
Authors
Kairajärvi, Sami
Date
2019
Discipline
TietotekniikkaMathematical Information Technology
Copyright
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

 
This thesis explores how architecture and endianness of executable code can be identified using binary file contents, as falsely identifying the architecture caused about 10% of failures of firmware analysis in a recent study by Costin et al. (2014) . A literature review was performed to identify the current state-of-the-art methods and how they could be improved in terms of algorithms, performance, data sets, and support tools. The thorough review identified methods presented by Clemens (2015) and De Nicolao et al. (2018) as the state-of-the-art and found that they had good results. However, these methods were found lacking essential tools to acquire or build the data sets as well as requiring more comprehensive comparison of classifier performance on full binaries. An experimental evaluation was performed to test classifier performance on different situations. For example, when training and testing classifiers with only code sections from executable files, all the classifiers performed equally well achieving over 98% accuracy. On samples with very small code sections 3-nearest neighbors and SVM had the best performance achieving 90% accuracy at 128 bytes. At the same time, random forest classifier performed the best classifying full binaries when trained with code sections at 90% accuracy and 99.2% when trained using full binaries. ...
Keywords
Firmware Analysis Supervised Machine Learning Classification Binary Code
Dataset(s) related to the publication
https://github.com/kairis/isadetect
https://etsin.fairdata.fi/dataset/80fa69af-addb-4f9a-b45c-c16011bae366
URI

http://urn.fi/URN:NBN:fi:jyu-201904182217

Metadata
Show full item record
Collections
  • Pro gradu -tutkielmat [25543]

Related items

Showing items with similar title or keywords.

  • Automatic image‐based identification and biomass estimation of invertebrates 

    Ärje, Johanna; Melvad, Claus; Jeppesen, Mads Rosenhøj; Madsen, Sigurd Agerskov; Raitoharju, Jenni; Rasmussen, Maria Strandgård; Iosifidis, Alexandros; Tirronen, Ville; Gabbouj, Moncef; Meissner, Kristian; Høye, Toke Thomas (Wiley, 2020)
    Understanding how biological communities respond to environmental changes is a key challenge in ecology and ecosystem management. The apparent decline of insect populations necessitates more biomonitoring but the time-consuming ...
  • Towards Automated Classification of Firmware Images and Identification of Embedded Devices 

    Costin, Andrei; Zarras, Apostolis; Francillon, Aurélien (Springer, 2017)
    Embedded systems, as opposed to traditional computers, bring an incredible diversity. The number of devices manufactured is constantly increasing and each has a dedicated software, commonly known as firmware. Full ...
  • Minimal learning machine in hyperspectral imaging classification 

    Hakola, Anna-Maria; Pölönen, Ilkka (SPIE, 2020)
    A hyperspectral (HS) image is typically a stack of frames, where each frame represents the intensity of a different wavelength of light. Each spatial pixel has a spectrum. In the classification of the HS image, each spectrum ...
  • Automatic sleep scoring : a deep learning architecture for multi-modality time series 

    Yan, Rui; Li, Fan; Zhou, Dong Dong; Ristaniemi, Tapani; Cong, Fengyu (Elsevier, 2021)
    Background: Sleep scoring is an essential but time-consuming process, and therefore automatic sleep scoring is crucial and urgent to help address the growing unmet needs for sleep research. This ...
  • Problem Transformation Methods with Distance-Based Learning for Multi-Target Regression 

    Hämäläinen, Joonas; Kärkkäinen, Tommi (ESANN, 2020)
    Multi-target regression is a special subset of supervised machine learning problems. Problem transformation methods are used in the field to improve the performance of basic methods. The purpose of this article is to test ...
  • Browse materials
  • Browse materials
  • Articles
  • Conferences and seminars
  • Electronic books
  • Historical maps
  • Journals
  • Tunes and musical notes
  • Photographs
  • Presentations and posters
  • Publication series
  • Research reports
  • Research data
  • Study materials
  • Theses

Browse

All of JYXCollection listBy Issue DateAuthorsSubjectsPublished inDepartmentDiscipline

My Account

Login

Statistics

View Usage Statistics
  • How to publish in JYX?
  • Self-archiving
  • Publish Your Thesis Online
  • Publishing Your Dissertation
  • Publication services

Open Science at the JYU
 
Data Protection Description

Accessibility Statement

Unless otherwise specified, publicly available JYX metadata (excluding abstracts) may be freely reused under the CC0 waiver.
Open Science Centre