Study of various machine learning approaches to predict default behavior of a borrower based on transactional dataset
Authors
Date
2021Predicting ‘default’ behavior of borrowers is quite challenging and time consuming, although financial institutions require faster and more reliable decision on loan applications to survive in the competitive market. Availability of huge amount of data makes the work of current credit scoring system harder. To deal with such situation machine learning engineers are trying to build a system that can predict default behavior of a borrower by analyzing application and transaction data. In our current study we applied different machine learning models such as decision tree, logistic regression, gradient boosting, XGBoosting, support vector machine and KNeighbors on transactional dataset to find which model performed better. We also applied deep neural network on the datasets. To further extend the study, we created new features by using manual process and unsupervised machine learning to observe whether they boost the performance or not. In addition to that, we used feature selection to see how it affected the prediction. Due to small dataset, we achieved 70% ac-curacy with 72% AUC on aggregated dataset from Random Forest. The dataset created by using unsupervised machine learning showed 62% accuracy with 68% AUC value. Manually created ratio-based features and feature selection could not yield any significant difference in results. Deep learning also per-formed lower than others probably due to small dataset.
...
Keywords
Metadata
Show full item recordCollections
- Pro gradu -tutkielmat [29740]
License
Related items
Showing items with similar title or keywords.
-
Description of movement sensor dataset for dog behavior classification
Vehkaoja, Antti; Somppi, Sanni; Törnqvist, Heini; Valldeoriola Cardó, Anna; Kumpulainen, Pekka; Väätäjä, Heli; Majaranta, Päivi; Surakka, Veikko; Kujala, Miiamaaria V.; Vainio, Outi (Elsevier, 2022)Movement sensor data from seven static and dynamic dog behaviors (sitting, standing, lying down, trotting, walking, playing, and (treat) searching i.e. sniffing) was collected from 45 middle to large sized dogs with six ... -
CCTVCV : Computer Vision model/dataset supporting CCTV forensics and privacy applications
Turtiainen, Hannu; Costin, Andrei; Hämäläinen, Timo; Lahtinen, Tuomo; Sintonen, Lauri (IEEE, 2022)The increased, widespread, unwarranted, and unaccountable use of Closed-Circuit TeleVision (CCTV) cameras globally has raised concerns about privacy risks for the last several decades. Recent technological advances implemented ... -
Firefront Forecasting in Boreal Forests : Machine Learning Approach to Predict Wildfire Propagation
Raita-Hakola, Anna-Maria; Pölönen, Ilkka (Copernicus GmbH, 2024)Wildfires have become increasingly prevalent worldwide due to climate change, posing significant threats to human lives, property, and natural ecosystems. The rapid progression of wildfires necessitates predictive computational ... -
Approaching Optimal pH Enzyme Prediction with Large Language Models
Zaretckii, Mark; Buslaev, Pavel; Kozlovskii, Igor; Morozov, Alexander; Popov, Petr (American Chemical Society, 2024)Enzymes are widely used in biotechnology due to their ability to catalyze chemical reactions: food making, laundry, pharmaceutics, textile, brewing─all these areas benefit from utilizing various enzymes. Proton concentration ... -
Predicting Children's Myopia Risk : A Monte Carlo Approach to Compare the Performance of Machine Learning Models
Artiemjew, Piotr; Cybulski, Radosław; Emamian, Mohammad; Grzybowski, Andrzej; Jankowski, Andrzej; Lanca, Carla; Mehravaran, Shiva; Młyński, Marcin; Morawski, Cezary; Nordhausen, Klaus; Pärssinen, Olavi; Ropiak, Krzysztof (SCITEPRESS Science and Technology Publications, 2024)This study presents the initial results of the Myopia Risk Calculator (MRC) Consortium, introducing an innovative approach to predict myopia risk by using trustworthy machine-learning models. The dataset included approximately ...