Few-shot learning for speaker recognition

Jokinen, Ville

dc.contributor.advisor	Kärkkäinen, Tommi
dc.contributor.advisor	Hämäläinen, Joonas
dc.contributor.author	Jokinen, Ville
dc.date.accessioned	2021-06-28T13:30:50Z
dc.date.available	2021-06-28T13:30:50Z
dc.date.issued	2021
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/76866
dc.description.abstract	Tutkielman tavoitteena on vertailla uusimpia koneoppimiseen pohjautuvia menetelmiä puhujan tunnistamiseen vähäisellä datan määrällä. Puhujan tunnistamisessa tavoitteena on tunnistaa eri puhujat äänidatasta, sen käyttötarkoituksiin sisältyy mm. puhujan diarioiminen ja biometrinen tunnistus äänen avulla. Tutkielma rajoittuu puhujan tapaukseen, jossa käytettävissä on kaksi lyhyttä nauhoitetta, joko yhdeltä tai kahdelta, ennestään tuntemattomalta puhujalta. Joiden pohjalta pyritään tunnistamaan, sisältävätkö nauhoitteet puhetta samalta puhujalta. Lisäksi tutkielmassa tutkitaan Englanninkielisellä puheella koulutettujen neuroverkkojen tarkkuutta Suomenkieliseen puheeseen sovellettuna. Johon kehitetään sopiva datasetti Suomenkielisen puhekorpuksen pohjalta. Tutkielman tulokset osoittavat uusimpien menetelmien suoriutuvan erinomaisesti. Vaikkakin parhaiden tuloksien saavuttaminen osoittautui vaativan enemmän koulutusdataa kuin mitä tutkielmassa käytetään. Menetelmät yleistyvät hyvin myös suomenkieliselle puheelle siitä huolimatta, että koulutuksessa käytettiin vain englanninkielistä puhetta. Lisäksi tuloksien pohjalta tehdään mielenkiintoisia huomioita vertailuun valittujen muuttujien osalta, joita käytetään neuroverkkojen koulutuksessa. Vertailussa oli menetelmien lisäksi koulutusdatan puhujien määrä, puhe esimerkkin pituus ja äänidatan augmentointi.	fi
dc.description.abstract	This thesis sets out to compare recent methods in speaker recognition, from a small amount of data. Speaker recognition aims to distinguish speakers from within audio data containing speech, the use cases include for example speaker diarization and voice biometric authentication. The scope is limited to identification, two samples from one or two distinct previously unknown speakers are provided. With the aim being to identify whether the two samples are spoken by the same speaker. Additionally, the accuracy of networks trained on English speech on Finnish speech is also measured. For which a new dataset, suitable for benchmarking speaker recognition, consisting of Finnish speech was developed from an existing speech recognition dataset. The results show that the latest methods perform very well. However, to achieve the best results it is apparent that more training data is required, than what was used in this thesis. The methods generalized to Finnish speech, despite being trained with English speech. Additionally, interesting observations are made regarding the parameters chosen for training. In addition to comparing different methods, the effects of different number of speakers used for training, various sample lengths and data augmentation are also compared.	en
dc.format.extent	69
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject.other	speaker identification
dc.subject.other	few-shot learning
dc.title	Few-shot learning for speaker recognition
dc.identifier.urn	URN:NBN:fi:jyu-202106284056
dc.type.ontasot	Pro gradu -tutkielma	fi
dc.type.ontasot	Master’s thesis	en
dc.contributor.tiedekunta	Informaatioteknologian tiedekunta	fi
dc.contributor.tiedekunta	Faculty of Information Technology	en
dc.contributor.laitos	Informaatioteknologia	fi
dc.contributor.laitos	Information Technology	en
dc.contributor.yliopisto	Jyväskylän yliopisto	fi
dc.contributor.yliopisto	University of Jyväskylä	en
dc.contributor.oppiaine	Tietotekniikka	fi
dc.contributor.oppiaine	Mathematical Information Technology	en
dc.rights.copyright	Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.	fi
dc.rights.copyright	This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.	en
dc.type.publication	masterThesis
dc.contributor.oppiainekoodi	602
dc.subject.yso	koneoppiminen
dc.subject.yso	puhujantunnistus
dc.subject.yso	neuroverkot
dc.subject.yso	machine learning
dc.subject.yso	speaker recognition
dc.subject.yso	neural networks (information technology)
dc.format.content	fulltext
dc.type.okm	G2

Files in this item

Name:: URN:NBN:fi:jyu-202106284056.pdf
Size:: 1.691Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Pro gradu -tutkielmat [29564]

Show simple item record

Few-shot learning for speaker recognition

Files in this item

This item appears in the following Collection(s)

Related items

Taxonomy-Informed Neural Networks for Smart Manufacturing ﻿

Domain‐specific neural networks improve automated bird sound recognition already with small amount of local data ﻿

Assessment of microalgae species, biomass, and distribution from spectral images using a convolution neural network ﻿

Node co-activations as a means of error detection : Towards fault-tolerant neural networks ﻿

Quantification of Errors Generated by Uncertain Data in a Linear Boundary Value Problem Using Neural Networks ﻿

Taxonomy-Informed Neural Networks for Smart Manufacturing

Domain‐specific neural networks improve automated bird sound recognition already with small amount of local data

Assessment of microalgae species, biomass, and distribution from spectral images using a convolution neural network

Node co-activations as a means of error detection : Towards fault-tolerant neural networks

Quantification of Errors Generated by Uncertain Data in a Linear Boundary Value Problem Using Neural Networks