Show simple item record

dc.contributor.authorHabic, Vuk
dc.contributor.authorSemenov, Alexander
dc.contributor.authorPasiliao, Eduardo L.
dc.date.accessioned2020-10-06T04:54:21Z
dc.date.available2020-10-06T04:54:21Z
dc.date.issued2020
dc.identifier.citationHabic, V., Semenov, A., & Pasiliao, E. L. (2020). Multitask deep learning for native language identification. <i>Knowledge-Based Systems</i>, <i>209</i>, Article 106440. <a href="https://doi.org/10.1016/j.knosys.2020.106440" target="_blank">https://doi.org/10.1016/j.knosys.2020.106440</a>
dc.identifier.otherCONVID_42129041
dc.identifier.urihttps://jyx.jyu.fi/handle/123456789/72022
dc.description.abstractIdentifying the native language of a person by their text written in English (L1 identification) plays an important role in such tasks as authorship profiling and identification. With the current proliferation of misinformation in social media, these methods are especially topical. Most studies in this field have focused on the development of supervised classification algorithms, that are trained on a single L1 dataset. Although multiple labeled datasets are available for L1 identification, they contain texts authored by speakers of different languages and do not completely overlap. Current approaches achieve high accuracy on available datasets, but this is attained by training an individual classifier for each dataset. Studies show that joint training of multiple classifiers on different datasets can result in sharing information between the classifiers, leading to an increase in the accuracy of both tasks. In this study, we develop a novel deep neural network (DNN) architecture for L1 classification; it is based on an adversarial multitask learning method that integrates shared knowledge from multiple L1 datasets. We propose several variants of the architecture and rigorously evaluate their performance on multiple datasets. Our results indicate the proposed multitask architecture is more efficient in terms of classification accuracy than previously proposed methods.en
dc.format.mimetypeapplication/pdf
dc.languageeng
dc.language.isoeng
dc.publisherElsevier BV
dc.relation.ispartofseriesKnowledge-Based Systems
dc.rightsCC BY-NC-ND 4.0
dc.subject.othermultitask learning
dc.subject.othertext classification
dc.subject.othernatural language processing
dc.subject.otherdeep learning
dc.titleMultitask deep learning for native language identification
dc.typearticle
dc.identifier.urnURN:NBN:fi:jyu-202010066078
dc.contributor.laitosInformaatioteknologian tiedekuntafi
dc.contributor.laitosFaculty of Information Technologyen
dc.contributor.oppiaineTietojärjestelmätiedefi
dc.contributor.oppiaineInformation Systems Scienceen
dc.type.urihttp://purl.org/eprint/type/JournalArticle
dc.description.reviewstatuspeerReviewed
dc.relation.issn0950-7051
dc.relation.volume209
dc.type.versionacceptedVersion
dc.rights.copyright© 2020 Published by Elsevier B.V.
dc.rights.accesslevelopenAccessfi
dc.relation.grantnumberFA9550-17-1-0030
dc.subject.ysoluonnollinen kieli
dc.subject.ysoenglannin kieli
dc.subject.ysotekstinlouhinta
dc.subject.ysokoneoppiminen
dc.subject.ysoäidinkieli
dc.format.contentfulltext
jyx.subject.urihttp://www.yso.fi/onto/yso/p26762
jyx.subject.urihttp://www.yso.fi/onto/yso/p2573
jyx.subject.urihttp://www.yso.fi/onto/yso/p27112
jyx.subject.urihttp://www.yso.fi/onto/yso/p21846
jyx.subject.urihttp://www.yso.fi/onto/yso/p10957
dc.rights.urlhttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.relation.doi10.1016/j.knosys.2020.106440
dc.relation.funderAir Force Office of Scientific Researchfi
dc.relation.funderAir Force Office of Scientific Researchen
jyx.fundingprogramMuutfi
jyx.fundingprogramOthersen
jyx.fundinginformationThis work was funded in part by the US Air Force Research Laboratory (AFRL) European Office of Aerospace Research and Development (grant no. FA9550-17-1-0030), and the AFRL Mathematical Modeling and Optimization Institute.


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

CC BY-NC-ND 4.0
Except where otherwise noted, this item's license is described as CC BY-NC-ND 4.0