Predicting high-growth firms with machine learning methods

Virtanen, Joosua

dc.contributor.advisor	Hyytinen, Ari
dc.contributor.author	Virtanen, Joosua
dc.date.accessioned	2019-03-25T11:13:44Z
dc.date.available	2019-03-25T11:13:44Z
dc.date.issued	2019
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/63260
dc.description.abstract	Kiinnostus nopeakasvuisia yrityksiä kohtaan on viime aikoina kasvanut politiikantekijöiden sekä sijoittajien keskuudessa. Tässä maisterin tutkielmassa tutkin, ovatko koneoppimismenetelmät hyödyllisiä tulevaisuuden nopeakasvuisten yrityksien ennustamisessa. Tutkin tätä kysymystä laajalla 13602:n suomalaisen liikeyrityksen paneeliaineistolla vuosilta 2005–2016 hyödyntäen Eurostat-OECD:n nopeakasvuisen yrityksen määritelmää. Tällä määritelmällä aineistossa noin 5% yrityksistä sijoittuu nopeakasvuisiksi. Tutkin myös, mitkä yhteensä 24:stä ennustavasta muuttujasta myötävaikuttavat ennusteisiin eniten. Viimeiseksi tarkastelen, onko vaihtoehtoisella nopean kasvun määritelmällä, asiantuntijainformaatiota sisältävillä lisämuuttujilla tai vain nuorten yrityksien aineiston käyttämisellä vaikutusta ennustetarkkuuteen. Lähestyn kysymyksiä soveltamalla kehikkoa, joka muistuttaa todellista ennustusskenaariota, missä historiatietoihin perustuvalla aineistolla pyritään ennustamaan tulevaisuuden lopputulemia. Ennustetarkkuutta arvioidaan erillisessä testiaineistossa. Tuloksieni perusteella useimmat koneoppimismenetelmät mahdollistavat lieviä ja tilastollisesti merkitseviä parannuksia ennustetarkkuudessa verrattuna tavanomaisiin menetelmiin. Random forest (RF) -algoritmin opettama luokittelija toimii tässä kontekstissa parhaiten opetusaineiston ulkopuolisella AUC (ROC käyrän rajaaman pinta-alan) -arvolla 0,6422 (mikä vastaa 9,4% parannusta vertailuarvoon) ja tunnistaa 17,07% nopeakasvuisista yrityksistä vain 2,19% riskillä luokitella ei-nopeakasvuinen yritys nopeakasvuiseksi. Yrityksen koon nykyisen hetken ja menneen muutoksen indikaattorit yrityksen iän kanssa myötävaikuttavat eniten ennusteiden muodostamisessa. Kasvun mittaaminen käyttäen liikevaihdon kasvua henkilöstön kasvun sijasta parantaa ennustetarkkuutta. Toisaalta pääomasijoituksien ja yritystukien informaatiota sisältävien muuttujien lisääminen malliin ei paranna tuloksia. Viimeiseksi ennustusongelma osoittautuu vaikeammaksi nuorten yrityksien aineistossa. Yhteenvetona koneoppimismenetelmien soveltamista tulisi harkita nopeakasvuisten yrityksien ennustamisen haastavaan tehtävään, kun ennustetarkkuus on ensisijainen tavoite. Mikäli laskennallisilla kustannuksilla ja mallin tulkittavuudella on painoarvoa, koneoppimismenetelmät eivät välttämättä ole ylivertaisia tässä kontekstissa.	fi
dc.description.abstract	Motivated by the recently grown political and commercial interest in high-growth firms (HGF)—in this master’s thesis—I study whether common machine learning (ML) techniques are useful in predicting which privately owned companies become HGFs in the near future. I employ the Eurostat-OECD definition of HGFs and study this question with a high-dimensional 2005–2016 panel data set of 13,602 unique Finnish firms, of which roughly 5% are defined as HGFs. I also study, which of the 24 predictors included matter the most for prediction. Finally, I examine whether an alternative definition of HGFs, predictors of expert information or studying a sample of young firms only will make a difference in predictive performance. I tackle the questions by developing a predictive scheme similar to a real forecasting scenario, where past values are used to train a set of classifiers, that can be employed to predict unknown future outcomes. Predictive performance is assessed in a separate test sample. My findings indicate that most ML methods offer moderate but statistically significant improvements over benchmarks, depending on the measure of interest. With an out-of-sample area under the ROC curve (AUC) of 0.6422 (equivalent to a 9.4% improvement over benchmark), the best working ML classifier—random forest (RF)—identifies 17.07% of the HGFs with only a 2.19% chance of misclassifying a non-HGF as an HGF. My analysis on variable importance and partial dependence suggests that the current values and past changes in firm size indicators alongside with firm age, contribute the most to predictive performance. Measuring the target variable in turnover rather than in employment improves prediction accuracy, where adding indicators of expert investor information as predictors does not yield any improvements. Finally, the prediction task seems to be considerably more difficult in a sample of young firms. In conclusion, ML methods should be considered for the challenging task of identifying HGFs, when computational costs and model interpretation are of secondary interest to prediction accuracy.	en
dc.format.extent	67
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.rights	In Copyright	en
dc.subject.other	high growth firms
dc.subject.other	Finland
dc.title	Predicting high-growth firms with machine learning methods
dc.type	master thesis
dc.identifier.urn	URN:NBN:fi:jyu-201903251944
dc.type.ontasot	Pro gradu -tutkielma	fi
dc.type.ontasot	Master’s thesis	en
dc.contributor.tiedekunta	Kauppakorkeakoulu	fi
dc.contributor.tiedekunta	School of Business and Economics	en
dc.contributor.laitos	Taloustieteet	fi
dc.contributor.laitos	Business and Economics	en
dc.contributor.yliopisto	Jyväskylän yliopisto	fi
dc.contributor.yliopisto	University of Jyväskylä	en
dc.contributor.oppiaine	Taloustiede	fi
dc.contributor.oppiaine	Economics	en
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc
dc.type.publication	masterThesis
dc.contributor.oppiainekoodi	2041
dc.subject.yso	ennusteet
dc.subject.yso	ennustettavuus
dc.subject.yso	kasvu
dc.subject.yso	yritykset
dc.subject.yso	koneoppiminen
dc.subject.yso	forecasts
dc.subject.yso	predictability
dc.subject.yso	growth
dc.subject.yso	enterprises
dc.subject.yso	machine learning
dc.format.content	fulltext
dc.rights.url	https://rightsstatements.org/page/InC/1.0/
dc.type.okm	G2

Aineistoon kuuluvat tiedostot

Nimi:: URN:NBN:fi:jyu-201903251944.pdf
Koko:: 2.996Mb
Tiedostomuoto:: PDF

Katso/Avaa

Aineisto kuuluu seuraaviin kokoelmiin

Pro gradu -tutkielmat [29743]

Näytä suppeat kuvailutiedot

Ellei muuten mainita, aineiston lisenssi on In Copyright

Näytetään aineistoja, joilla on samankaltainen nimeke tai asiasanat.

Comparing the forecasting performance of logistic regression and random forest models in criminal recidivism

Aaltonen, Olli-Pekka (2016)

Rikosseuraamusalalla on viime vuosina kehitetty uusintarikollisuutta ennustavia malleja (Tyni, 2015), jotka perustuvat tyypillisesti rekisteripohjaisiin mittareihin, jotka mittaavat mm. tuomitun sukupuolta, ikää, rikostaustaa ...
Domain-specific transfer learning in the automated scoring of tumor-stroma ratio from histopathological images of colorectal cancer

Petäinen, Liisa; Väyrynen, Juha P.; Ruusuvuori, Pekka; Pölönen, Ilkka; Äyrämö, Sami; Kuopio, Teijo (Public Library of Science (PLoS), 2023)

Tumor-stroma ratio (TSR) is a prognostic factor for many types of solid tumors. In this study, we propose a method for automated estimation of TSR from histopathological images of colorectal cancer. The method is based on ...
Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia : A retrospective study

Zheng, Dongying; Hao, Xinyu; Khan, Muhanmmad; Wang, Lixia; Li, Fan; Xiang, Ning; Kang, Fuli; Hamalainen, Timo; Cong, Fengyu; Song, Kedong; Qiao, Chong (Frontiers Media SA, 2022)

Introduction: Preeclampsia, one of the leading causes of maternal and fetal morbidity and mortality, demands accurate predictive models for the lack of effective treatment. Predictive models based on machine learning ...
Predicting ACL Injury Using Machine Learning on Data From an Extensive Screening Test Battery of 880 Female Elite Athletes

Jauhiainen, Susanne; Kauppi, Jukka-Pekka; Krosshaug, Tron; Bahr, Roald; Bartsch, Julia; Äyrämö, Sami (SAGE Publications, 2022)

Background: Injury risk prediction is an emerging field in which more research is needed to recognize the best practices for accurate injury risk assessment. Important issues related to predictive machine learning need to ...
The “Seili-index” for the Prediction of Chlorophyll-α Levels in the Archipelago Sea of the northern Baltic Sea, southwest Finland

Hänninen, Jari; Mäkinen, Katja; Nordhausen, Klaus; Laaksonlaita, Jussi; Loisa, Olli; Virta, Joni (Springer Science and Business Media LLC, 2022)

To build a forecasting tool for the state of eutrophication in the Archipelago Sea, we fitted a Generalized Additive Mixed Model (GAMM) to marine environmental monitoring data, which were collected over the years 2011–2019 ...

Predicting high-growth firms with machine learning methods

Aineistoon kuuluvat tiedostot

Aineisto kuuluu seuraaviin kokoelmiin

Samankaltainen aineisto

Comparing the forecasting performance of logistic regression and random forest models in criminal recidivism ﻿

Domain-specific transfer learning in the automated scoring of tumor-stroma ratio from histopathological images of colorectal cancer ﻿

Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia : A retrospective study ﻿

Predicting ACL Injury Using Machine Learning on Data From an Extensive Screening Test Battery of 880 Female Elite Athletes ﻿

The “Seili-index” for the Prediction of Chlorophyll-α Levels in the Archipelago Sea of the northern Baltic Sea, southwest Finland ﻿

Comparing the forecasting performance of logistic regression and random forest models in criminal recidivism

Domain-specific transfer learning in the automated scoring of tumor-stroma ratio from histopathological images of colorectal cancer

Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia : A retrospective study

Predicting ACL Injury Using Machine Learning on Data From an Extensive Screening Test Battery of 880 Female Elite Athletes

The “Seili-index” for the Prediction of Chlorophyll-α Levels in the Archipelago Sea of the northern Baltic Sea, southwest Finland