University of Jyväskylä | JYX Digital Repository

  • English  | Give feedback |
    • suomi
    • English
 
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.
View Item 
  • JYX
  • Opinnäytteet
  • Pro gradu -tutkielmat
  • View Item
JYX > Opinnäytteet > Pro gradu -tutkielmat > View Item

Comparing the forecasting performance of logistic regression and random forest models in criminal recidivism

Icon
891.2 Kb

Authors
Aaltonen, Olli-Pekka
Date
2016
Discipline
Tietojärjestelmätiede
Access restrictions
This material has a restricted access due to copyright reasons. It can be read at the workstation at Jyväskylä University Library reserved for the use of archival materials: https://kirjasto.jyu.fi/en/workspaces/facilities.
You can request a copy of this thesis here

 
Rikosseuraamusalalla on viime vuosina kehitetty uusintarikollisuutta ennustavia malleja (Tyni, 2015), jotka perustuvat tyypillisesti rekisteripohjaisiin mittareihin, jotka mittaavat mm. tuomitun sukupuolta, ikää, rikostaustaa ja vankikertaisuutta. Yleensä tällaisten mallien kehityksessä käytetään logistisen regressioanalyysin kaltaisia parametrisia malleja, joissa uusintarikollisuuden todennäköisyyttä mallinnetaan taustamuuttujien lineaarisena funktiona. Näiden mallien rinnalle on viime aikoina kehitetty koneoppimisalgoritmeihin perustuvia vaihtoehtoja, joiden on todettu suoriutuvan käytännön sovelluksissa uusintarikollisuuden ennustamisessa perinteisiä malleja paremmin (Berk & Bleich, 2014). Tällaisten mallien toimivuutta suhteessa perinteisiin malleihin ei ole kuitenkaan testattu suomalaisella datalla. Tutkielman tarkoituksena on tarkastella sitä, kuinka hyvin erilaiset ennustemallit onnistuvat tehtävässään. Tutkielman ensimmäisessä vaiheessa luodaan logistiseen regressioanalyysiin ja koneoppimisalgoritmiin (Random forest) perustuvat uusintarikollisuutta ennustavat mallit Kriminologian ja oikeuspolitiikan instituutin Rikosten ja seuraamusten tutkimusrekisteristä poimitulla aineistolla, joka sisältää referenssituomioita vuosilta 2005-2007. Tuomituille henkilöille on haettu tietoa myös referenssituomiota edeltävästä ja seuraavasta rikoskäyttäytymisestä. Ennustemalli luodaan vuosien 2005–2006 välillä tuomittujen aineistolla, ja ennustemallia testataan vuoden 2007 datalla. Näin simuloidaan tilannetta, jossa havaittuun aineistoon perustuvalla historiallisella toteumatiedolla ennustetaan uuden tuomittujen ryhmän vielä toteutumatonta uusintarikollisuutta. Tutkimuskysymyksenä kysytäänkin, kumpi malleista pystyy luomaan rikoshistoriatiedon perusteella paremman ennustusmallin. Molemmat mallit ennustavat uusinta-rikollisuutta tutkielman asetelmassa verrattain hyvin. Kumpikaan ennustemalli ei kuitenkaan ole toista parempi, sillä menetelmät tuottavat ennustustehokkuudeltaan varsin samantasoiset mallit. Tutkielman tuloksena todetaan, ettei Random forest –koneoppimismenetelmän ja logistisen regressiomallin ennustustehokkuuden välille saada merkittävää eroa tutkielman asetelmalla. ...
 
During the recent years, predictive models have been created to predict the future criminal behavior (recidivism) of past offenders (e.g. Tyni, 2015). Predictive models are often created by using register-based indicators, e.g. offender’s gender, age, criminal background, or prior imprisonments. Usually, these predictive models are created by using parametric models, where the likelihood of recidivating is modelled as a linear function of independent variables. Lately, machine learning algorithms have been introduced as alternatives to these more traditional models. In a recent American study, machine learning algorithms were stated to be more accurate predictors of recidivism than the more traditional logistic regression model (Berk & Bleich, 2014). However, these machine learning algorithms have not been tested for criminal recidivism prediction utilizing Finnish data. The aim of this thesis is to examine the comparative effectiveness of different risk prediction models in a Finnish setting. In this thesis, two predictive models for recidivism are created, one being a logistic regression model, and the other a machine learning algorithm-based model called Random forest. Research data was gathered from the RST (Rikosten ja seuraamusten tutkimusrekisteri, which translates to “the research register of crimes and sanctions”) database of Institute of Criminology and Legal Policy, and includes all offenders convicted to several common crime type offenses in Finland from 2005 to 2007. Data also includes information on past and future criminal behavior for those offenders. Predictive models are developed with data from the years 2005 and 2006. The model testing is done with the remaining 2007 data, in order to simulate a situation where predictive models are used to predict recidivism yet to be actualized. The research question asks which of these models perform better in forecasting the criminal recidivism of a previous offender. The results of this study show that both logistic regression and Random forest algorithm create decent predictive models, but neither model outperforms the other on chosen performance metrics. The outcome, and the answer to the research question is, that neither model is better than the other in predicting recidivism among convicted offenders in Finland. ...
 
Keywords
Recidivism machine learning Random forest logistic regression forecasting uusintarikollisuus ennusteet regressioanalyysi koneoppiminen
URI

http://urn.fi/URN:NBN:fi:jyu-201611234724

Metadata
Show full item record
Collections
  • Pro gradu -tutkielmat [24534]

Related items

Showing items with similar title or keywords.

  • Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia : A retrospective study 

    Zheng, Dongying; Hao, Xinyu; Khan, Muhanmmad; Wang, Lixia; Li, Fan; Xiang, Ning; Kang, Fuli; Hamalainen, Timo; Cong, Fengyu; Song, Kedong; Qiao, Chong (Frontiers Media SA, 2022)
    Introduction: Preeclampsia, one of the leading causes of maternal and fetal morbidity and mortality, demands accurate predictive models for the lack of effective treatment. Predictive models based on machine learning ...
  • Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine? 

    Linja, Joakim; Hämäläinen, Joonas; Nieminen, Paavo; Kärkkäinen, Tommi (MDPI AG, 2020)
    Minimal Learning Machine (MLM) is a recently popularized supervised learning method, which is composed of distance-regression and multilateration steps. The computational complexity of MLM is dominated by the solution of ...
  • Intelligent solutions for real-life data-driven applications 

    Ivannikova, Elena (University of Jyväskylä, 2017)
    The subject of this thesis belongs to the topic of machine learning or, specifically, to the development of advanced methods for regression analysis, clustering, and anomaly detection. Industry is constantly seeking ...
  • Eight Simple Guidelines for Improved Understanding of Transformations and Nonlinear Effects 

    Rönkkö, Mikko; Aalto, Eero; Tenhunen, Henni; Aguirre-Urreta, Miguel I. (SAGE Publications, 2022)
    Transforming variables before analysis or applying a transformation as a part of a generalized linear model are common practices in organizational research. Several methodological articles addressing the topic, either ...
  • Problem Transformation Methods with Distance-Based Learning for Multi-Target Regression 

    Hämäläinen, Joonas; Kärkkäinen, Tommi (ESANN, 2020)
    Multi-target regression is a special subset of supervised machine learning problems. Problem transformation methods are used in the field to improve the performance of basic methods. The purpose of this article is to test ...
  • Browse materials
  • Browse materials
  • Articles
  • Conferences and seminars
  • Electronic books
  • Historical maps
  • Journals
  • Tunes and musical notes
  • Photographs
  • Presentations and posters
  • Publication series
  • Research reports
  • Research data
  • Study materials
  • Theses

Browse

All of JYXCollection listBy Issue DateAuthorsSubjectsPublished inDepartmentDiscipline

My Account

Login

Statistics

View Usage Statistics
  • How to publish in JYX?
  • Self-archiving
  • Publish Your Thesis Online
  • Publishing Your Dissertation
  • Publication services

Open Science at the JYU
 
Data Protection Description

Accessibility Statement

Unless otherwise specified, publicly available JYX metadata (excluding abstracts) may be freely reused under the CC0 waiver.
Open Science Centre