Comparing the forecasting performance of logistic regression and random forest models in criminal recidivism

Aaltonen, Olli-Pekka

dc.contributor.advisor	Veijalainen, Jari
dc.contributor.author	Aaltonen, Olli-Pekka
dc.date.accessioned	2016-11-23T08:10:17Z
dc.date.available	2016-11-23T08:10:17Z
dc.date.issued	2016
dc.identifier.other	oai:jykdok.linneanet.fi:1643461
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/51967
dc.description.abstract	Rikosseuraamusalalla on viime vuosina kehitetty uusintarikollisuutta ennustavia malleja (Tyni, 2015), jotka perustuvat tyypillisesti rekisteripohjaisiin mittareihin, jotka mittaavat mm. tuomitun sukupuolta, ikää, rikostaustaa ja vankikertaisuutta. Yleensä tällaisten mallien kehityksessä käytetään logistisen regressioanalyysin kaltaisia parametrisia malleja, joissa uusintarikollisuuden todennäköisyyttä mallinnetaan taustamuuttujien lineaarisena funktiona. Näiden mallien rinnalle on viime aikoina kehitetty koneoppimisalgoritmeihin perustuvia vaihtoehtoja, joiden on todettu suoriutuvan käytännön sovelluksissa uusintarikollisuuden ennustamisessa perinteisiä malleja paremmin (Berk & Bleich, 2014). Tällaisten mallien toimivuutta suhteessa perinteisiin malleihin ei ole kuitenkaan testattu suomalaisella datalla. Tutkielman tarkoituksena on tarkastella sitä, kuinka hyvin erilaiset ennustemallit onnistuvat tehtävässään. Tutkielman ensimmäisessä vaiheessa luodaan logistiseen regressioanalyysiin ja koneoppimisalgoritmiin (Random forest) perustuvat uusintarikollisuutta ennustavat mallit Kriminologian ja oikeuspolitiikan instituutin Rikosten ja seuraamusten tutkimusrekisteristä poimitulla aineistolla, joka sisältää referenssituomioita vuosilta 2005-2007. Tuomituille henkilöille on haettu tietoa myös referenssituomiota edeltävästä ja seuraavasta rikoskäyttäytymisestä. Ennustemalli luodaan vuosien 2005–2006 välillä tuomittujen aineistolla, ja ennustemallia testataan vuoden 2007 datalla. Näin simuloidaan tilannetta, jossa havaittuun aineistoon perustuvalla historiallisella toteumatiedolla ennustetaan uuden tuomittujen ryhmän vielä toteutumatonta uusintarikollisuutta. Tutkimuskysymyksenä kysytäänkin, kumpi malleista pystyy luomaan rikoshistoriatiedon perusteella paremman ennustusmallin. Molemmat mallit ennustavat uusinta-rikollisuutta tutkielman asetelmassa verrattain hyvin. Kumpikaan ennustemalli ei kuitenkaan ole toista parempi, sillä menetelmät tuottavat ennustustehokkuudeltaan varsin samantasoiset mallit. Tutkielman tuloksena todetaan, ettei Random forest –koneoppimismenetelmän ja logistisen regressiomallin ennustustehokkuuden välille saada merkittävää eroa tutkielman asetelmalla.	fi
dc.description.abstract	During the recent years, predictive models have been created to predict the future criminal behavior (recidivism) of past offenders (e.g. Tyni, 2015). Predictive models are often created by using register-based indicators, e.g. offender’s gender, age, criminal background, or prior imprisonments. Usually, these predictive models are created by using parametric models, where the likelihood of recidivating is modelled as a linear function of independent variables. Lately, machine learning algorithms have been introduced as alternatives to these more traditional models. In a recent American study, machine learning algorithms were stated to be more accurate predictors of recidivism than the more traditional logistic regression model (Berk & Bleich, 2014). However, these machine learning algorithms have not been tested for criminal recidivism prediction utilizing Finnish data. The aim of this thesis is to examine the comparative effectiveness of different risk prediction models in a Finnish setting. In this thesis, two predictive models for recidivism are created, one being a logistic regression model, and the other a machine learning algorithm-based model called Random forest. Research data was gathered from the RST (Rikosten ja seuraamusten tutkimusrekisteri, which translates to “the research register of crimes and sanctions”) database of Institute of Criminology and Legal Policy, and includes all offenders convicted to several common crime type offenses in Finland from 2005 to 2007. Data also includes information on past and future criminal behavior for those offenders. Predictive models are developed with data from the years 2005 and 2006. The model testing is done with the remaining 2007 data, in order to simulate a situation where predictive models are used to predict recidivism yet to be actualized. The research question asks which of these models perform better in forecasting the criminal recidivism of a previous offender. The results of this study show that both logistic regression and Random forest algorithm create decent predictive models, but neither model outperforms the other on chosen performance metrics. The outcome, and the answer to the research question is, that neither model is better than the other in predicting recidivism among convicted offenders in Finland.	en
dc.format.extent	1 verkkoaineisto (53 sivua)
dc.format.mimetype	application/pdf
dc.language.iso	eng
dc.rights	This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.	en
dc.rights	Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.	fi
dc.subject.other	Recidivism
dc.subject.other	machine learning
dc.subject.other	Random forest
dc.subject.other	logistic regression
dc.subject.other	forecasting
dc.title	Comparing the forecasting performance of logistic regression and random forest models in criminal recidivism
dc.identifier.urn	URN:NBN:fi:jyu-201611234724
dc.type.ontasot	Master’s thesis	en
dc.type.ontasot	Pro gradu -tutkielma	fi
dc.contributor.tiedekunta	Faculty of Information Technology	en
dc.contributor.tiedekunta	Informaatioteknologian tiedekunta	fi
dc.contributor.laitos	Tietojenkäsittelytieteiden laitos	fi
dc.contributor.laitos	Department of Computer Science and Information Systems	en
dc.contributor.yliopisto	University of Jyväskylä	en
dc.contributor.yliopisto	Jyväskylän yliopisto	fi
dc.contributor.oppiaine	Information Systems Science	en
dc.contributor.oppiaine	Tietojärjestelmätiede	fi
dc.date.updated	2016-11-23T08:10:18Z
dc.rights.accesslevel	restrictedAccess	fi
dc.type.publication	masterThesis
dc.contributor.oppiainekoodi	601
dc.subject.yso	uusintarikollisuus
dc.subject.yso	ennusteet
dc.subject.yso	regressioanalyysi
dc.subject.yso	koneoppiminen
dc.format.content	fulltext
dc.rights.accessrights	This material has a restricted access due to copyright reasons. It can be read at the workstation at Jyväskylä University Library reserved for the use of archival materials: https://kirjasto.jyu.fi/en/workspaces/facilities.	en
dc.rights.accessrights	Aineistoon pääsyä on rajoitettu tekijänoikeussyistä. Aineisto on luettavissa Jyväskylän yliopiston kirjaston arkistotyöasemalta. Ks. https://kirjasto.jyu.fi/fi/tyoskentelytilat/laitteet-ja-tilat.	fi
dc.type.okm	G2

Aineistoon kuuluvat tiedostot

Nimi:: URN:NBN:fi:jyu-201611234724.pdf
Koko:: 891.2Kb
Tiedostomuoto:: PDF

Katso/Avaa

Aineisto kuuluu seuraaviin kokoelmiin

Pro gradu -tutkielmat [29561]

Näytä suppeat kuvailutiedot

Näytetään aineistoja, joilla on samankaltainen nimeke tai asiasanat.

Predicting Children's Myopia Risk : A Monte Carlo Approach to Compare the Performance of Machine Learning Models

Artiemjew, Piotr; Cybulski, Radosław; Emamian, Mohammad; Grzybowski, Andrzej; Jankowski, Andrzej; Lanca, Carla; Mehravaran, Shiva; Młyński, Marcin; Morawski, Cezary; Nordhausen, Klaus; Pärssinen, Olavi; Ropiak, Krzysztof (SCITEPRESS Science and Technology Publications, 2024)

This study presents the initial results of the Myopia Risk Calculator (MRC) Consortium, introducing an innovative approach to predict myopia risk by using trustworthy machine-learning models. The dataset included approximately ...
Effect of variable selection strategy on the predictive models for adverse pregnancy outcomes of pre-eclampsia : A retrospective study

Zheng, Dongying; Hao, Xinyu; Khan, Muhanmmad; Kang, Fuli; Li, Fan; Hämäläinen, Timo; Wang, Lixia (Scholar Media Publishing Company, 2024)

Objectives: The improvement of prediction for adverse pregnancy outcomes is quite essential to the women suffering from pre-eclampsia, while the collection of predictive indicators is the prerequisite. The traditional ...
Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia : A retrospective study

Zheng, Dongying; Hao, Xinyu; Khan, Muhanmmad; Wang, Lixia; Li, Fan; Xiang, Ning; Kang, Fuli; Hamalainen, Timo; Cong, Fengyu; Song, Kedong; Qiao, Chong (Frontiers Media SA, 2022)

Introduction: Preeclampsia, one of the leading causes of maternal and fetal morbidity and mortality, demands accurate predictive models for the lack of effective treatment. Predictive models based on machine learning ...
Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine?

Linja, Joakim; Hämäläinen, Joonas; Nieminen, Paavo; Kärkkäinen, Tommi (MDPI AG, 2020)

Minimal Learning Machine (MLM) is a recently popularized supervised learning method, which is composed of distance-regression and multilateration steps. The computational complexity of MLM is dominated by the solution of ...
Intelligent solutions for real-life data-driven applications

Ivannikova, Elena (University of Jyväskylä, 2017)

The subject of this thesis belongs to the topic of machine learning or, speciﬁcally, to the development of advanced methods for regression analysis, clustering, and anomaly detection. Industry is constantly seeking ...

Comparing the forecasting performance of logistic regression and random forest models in criminal recidivism

Aineistoon kuuluvat tiedostot

Aineisto kuuluu seuraaviin kokoelmiin

Samankaltainen aineisto

Predicting Children's Myopia Risk : A Monte Carlo Approach to Compare the Performance of Machine Learning Models ﻿

Effect of variable selection strategy on the predictive models for adverse pregnancy outcomes of pre-eclampsia : A retrospective study ﻿

Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia : A retrospective study ﻿

Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine? ﻿

Intelligent solutions for real-life data-driven applications ﻿

Predicting Children's Myopia Risk : A Monte Carlo Approach to Compare the Performance of Machine Learning Models

Effect of variable selection strategy on the predictive models for adverse pregnancy outcomes of pre-eclampsia : A retrospective study

Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia : A retrospective study

Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine?

Intelligent solutions for real-life data-driven applications