Multitask deep learning for native language identification

Habic, Vuk; Semenov, Alexander; Pasiliao, Eduardo L.

doi:10.1016/j.knosys.2020.106440

dc.contributor.author	Habic, Vuk
dc.contributor.author	Semenov, Alexander
dc.contributor.author	Pasiliao, Eduardo L.
dc.date.accessioned	2020-10-06T04:54:21Z
dc.date.available	2020-10-06T04:54:21Z
dc.date.issued	2020
dc.identifier.citation	Habic, V., Semenov, A., & Pasiliao, E. L. (2020). Multitask deep learning for native language identification. <i>Knowledge-Based Systems</i>, <i>209</i>, Article 106440. <a href="https://doi.org/10.1016/j.knosys.2020.106440" target="_blank">https://doi.org/10.1016/j.knosys.2020.106440</a>
dc.identifier.other	CONVID_42129041
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/72022
dc.description.abstract	Identifying the native language of a person by their text written in English (L1 identification) plays an important role in such tasks as authorship profiling and identification. With the current proliferation of misinformation in social media, these methods are especially topical. Most studies in this field have focused on the development of supervised classification algorithms, that are trained on a single L1 dataset. Although multiple labeled datasets are available for L1 identification, they contain texts authored by speakers of different languages and do not completely overlap. Current approaches achieve high accuracy on available datasets, but this is attained by training an individual classifier for each dataset. Studies show that joint training of multiple classifiers on different datasets can result in sharing information between the classifiers, leading to an increase in the accuracy of both tasks. In this study, we develop a novel deep neural network (DNN) architecture for L1 classification; it is based on an adversarial multitask learning method that integrates shared knowledge from multiple L1 datasets. We propose several variants of the architecture and rigorously evaluate their performance on multiple datasets. Our results indicate the proposed multitask architecture is more efficient in terms of classification accuracy than previously proposed methods.	en
dc.format.mimetype	application/pdf
dc.language	eng
dc.language.iso	eng
dc.publisher	Elsevier BV
dc.relation.ispartofseries	Knowledge-Based Systems
dc.rights	CC BY-NC-ND 4.0
dc.subject.other	multitask learning
dc.subject.other	text classification
dc.subject.other	natural language processing
dc.subject.other	deep learning
dc.title	Multitask deep learning for native language identification
dc.type	article
dc.identifier.urn	URN:NBN:fi:jyu-202010066078
dc.contributor.laitos	Informaatioteknologian tiedekunta	fi
dc.contributor.laitos	Faculty of Information Technology	en
dc.contributor.oppiaine	Tietojärjestelmätiede	fi
dc.contributor.oppiaine	Information Systems Science	en
dc.type.uri	http://purl.org/eprint/type/JournalArticle
dc.type.coar	http://purl.org/coar/resource_type/c_2df8fbb1
dc.description.reviewstatus	peerReviewed
dc.relation.issn	0950-7051
dc.relation.volume	209
dc.type.version	acceptedVersion
dc.rights.copyright	© 2020 Published by Elsevier B.V.
dc.rights.accesslevel	openAccess	fi
dc.relation.grantnumber	FA9550-17-1-0030
dc.subject.yso	luonnollinen kieli
dc.subject.yso	englannin kieli
dc.subject.yso	tekstinlouhinta
dc.subject.yso	koneoppiminen
dc.subject.yso	äidinkieli
dc.format.content	fulltext
jyx.subject.uri	http://www.yso.fi/onto/yso/p26762
jyx.subject.uri	http://www.yso.fi/onto/yso/p2573
jyx.subject.uri	http://www.yso.fi/onto/yso/p27112
jyx.subject.uri	http://www.yso.fi/onto/yso/p21846
jyx.subject.uri	http://www.yso.fi/onto/yso/p10957
dc.rights.url	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.relation.doi	10.1016/j.knosys.2020.106440
dc.relation.funder	Air Force Office of Scientific Research	en
dc.relation.funder	Air Force Office of Scientific Research	fi
jyx.fundingprogram	Others	en
jyx.fundingprogram	Muut	fi
jyx.fundinginformation	This work was funded in part by the US Air Force Research Laboratory (AFRL) European Office of Aerospace Research and Development (grant no. FA9550-17-1-0030), and the AFRL Mathematical Modeling and Optimization Institute.
dc.type.okm	A1

Files in this item

Name:: habicym.pdf
Size:: 948.3Kb
Format:: PDF
Description:: Final Draft

View/Open

This item appears in the following Collection(s)

Informaatioteknologian tiedekunta [2131]

Show simple item record

Except where otherwise noted, this item's license is described as CC BY-NC-ND 4.0

Multitask deep learning for native language identification

Files in this item

This item appears in the following Collection(s)

Related items

Natural language processing In chatbot development : how does a chatbot process language? ﻿

Part-of-speech tagging in written slang ﻿

Towards Automated Classification of Firmware Images and Identification of Embedded Devices ﻿

Improving search engine results using different machine learning models and tools ﻿

Automatic training data labeling for Finnish clinical narrative NLP tasks ﻿

Natural language processing In chatbot development : how does a chatbot process language?

Part-of-speech tagging in written slang

Towards Automated Classification of Firmware Images and Identification of Embedded Devices

Improving search engine results using different machine learning models and tools

Automatic training data labeling for Finnish clinical narrative NLP tasks