dc.contributor.author | Habic, Vuk | |
dc.contributor.author | Semenov, Alexander | |
dc.contributor.author | Pasiliao, Eduardo L. | |
dc.date.accessioned | 2020-10-06T04:54:21Z | |
dc.date.available | 2020-10-06T04:54:21Z | |
dc.date.issued | 2020 | |
dc.identifier.citation | Habic, V., Semenov, A., & Pasiliao, E. L. (2020). Multitask deep learning for native language identification. <i>Knowledge-Based Systems</i>, <i>209</i>, Article 106440. <a href="https://doi.org/10.1016/j.knosys.2020.106440" target="_blank">https://doi.org/10.1016/j.knosys.2020.106440</a> | |
dc.identifier.other | CONVID_42129041 | |
dc.identifier.uri | https://jyx.jyu.fi/handle/123456789/72022 | |
dc.description.abstract | Identifying the native language of a person by their text written in English (L1 identification) plays an important role in such tasks as authorship profiling and identification. With the current proliferation of misinformation in social media, these methods are especially topical. Most studies in this field have focused on the development of supervised classification algorithms, that are trained on a single L1 dataset. Although multiple labeled datasets are available for L1 identification, they contain texts authored by speakers of different languages and do not completely overlap. Current approaches achieve high accuracy on available datasets, but this is attained by training an individual classifier for each dataset. Studies show that joint training of multiple classifiers on different datasets can result in sharing information between the classifiers, leading to an increase in the accuracy of both tasks. In this study, we develop a novel deep neural network (DNN) architecture for L1 classification; it is based on an adversarial multitask learning method that integrates shared knowledge from multiple L1 datasets. We propose several variants of the architecture and rigorously evaluate their performance on multiple datasets. Our results indicate the proposed multitask architecture is more efficient in terms of classification accuracy than previously proposed methods. | en |
dc.format.mimetype | application/pdf | |
dc.language | eng | |
dc.language.iso | eng | |
dc.publisher | Elsevier BV | |
dc.relation.ispartofseries | Knowledge-Based Systems | |
dc.rights | CC BY-NC-ND 4.0 | |
dc.subject.other | multitask learning | |
dc.subject.other | text classification | |
dc.subject.other | natural language processing | |
dc.subject.other | deep learning | |
dc.title | Multitask deep learning for native language identification | |
dc.type | article | |
dc.identifier.urn | URN:NBN:fi:jyu-202010066078 | |
dc.contributor.laitos | Informaatioteknologian tiedekunta | fi |
dc.contributor.laitos | Faculty of Information Technology | en |
dc.contributor.oppiaine | Tietojärjestelmätiede | fi |
dc.contributor.oppiaine | Information Systems Science | en |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | |
dc.type.coar | http://purl.org/coar/resource_type/c_2df8fbb1 | |
dc.description.reviewstatus | peerReviewed | |
dc.relation.issn | 0950-7051 | |
dc.relation.volume | 209 | |
dc.type.version | acceptedVersion | |
dc.rights.copyright | © 2020 Published by Elsevier B.V. | |
dc.rights.accesslevel | openAccess | fi |
dc.relation.grantnumber | FA9550-17-1-0030 | |
dc.subject.yso | luonnollinen kieli | |
dc.subject.yso | englannin kieli | |
dc.subject.yso | tekstinlouhinta | |
dc.subject.yso | koneoppiminen | |
dc.subject.yso | äidinkieli | |
dc.format.content | fulltext | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p26762 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p2573 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p27112 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p21846 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p10957 | |
dc.rights.url | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.relation.doi | 10.1016/j.knosys.2020.106440 | |
dc.relation.funder | Air Force Office of Scientific Research | en |
dc.relation.funder | Air Force Office of Scientific Research | fi |
jyx.fundingprogram | Others | en |
jyx.fundingprogram | Muut | fi |
jyx.fundinginformation | This work was funded in part by the US Air Force Research Laboratory (AFRL) European Office of Aerospace Research and Development (grant no. FA9550-17-1-0030), and the AFRL Mathematical Modeling and Optimization Institute. | |
dc.type.okm | A1 | |