Multitask deep learning for native language identification
Habic, V., Semenov, A., & Pasiliao, E. L. (2020). Multitask deep learning for native language identification. Knowledge-Based Systems, 209, Article 106440. https://doi.org/10.1016/j.knosys.2020.106440
Published in
Knowledge-Based SystemsDate
2020Copyright
© 2020 Published by Elsevier B.V.
Identifying the native language of a person by their text written in English (L1 identification) plays an important role in such tasks as authorship profiling and identification. With the current proliferation of misinformation in social media, these methods are especially topical. Most studies in this field have focused on the development of supervised classification algorithms, that are trained on a single L1 dataset. Although multiple labeled datasets are available for L1 identification, they contain texts authored by speakers of different languages and do not completely overlap. Current approaches achieve high accuracy on available datasets, but this is attained by training an individual classifier for each dataset. Studies show that joint training of multiple classifiers on different datasets can result in sharing information between the classifiers, leading to an increase in the accuracy of both tasks. In this study, we develop a novel deep neural network (DNN) architecture for L1 classification; it is based on an adversarial multitask learning method that integrates shared knowledge from multiple L1 datasets. We propose several variants of the architecture and rigorously evaluate their performance on multiple datasets. Our results indicate the proposed multitask architecture is more efficient in terms of classification accuracy than previously proposed methods.
...
Publisher
Elsevier BVISSN Search the Publication Forum
0950-7051Keywords
Publication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/42129041
Metadata
Show full item recordCollections
Related funder(s)
Air Force Office of Scientific ResearchFunding program(s)
OthersAdditional information about funding
This work was funded in part by the US Air Force Research Laboratory (AFRL) European Office of Aerospace Research and Development (grant no. FA9550-17-1-0030), and the AFRL Mathematical Modeling and Optimization Institute.License
Related items
Showing items with similar title or keywords.
-
Natural language processing In chatbot development : how does a chatbot process language?
Heikkilä, Arttu (2020)Chatbotit ovat yleistyvä ratkaisu ihmisen ja tietokoneen väliseen vuorovaikutukseen. Tarve rakentaa ylläpidettäviä ja skaalautuvia keskustelevia ratkaisuja on kasvava, mutta ymmärrys perustavanlaatuisista teknologioista ... -
Automatic image‐based identification and biomass estimation of invertebrates
Ärje, Johanna; Melvad, Claus; Jeppesen, Mads Rosenhøj; Madsen, Sigurd Agerskov; Raitoharju, Jenni; Rasmussen, Maria Strandgård; Iosifidis, Alexandros; Tirronen, Ville; Gabbouj, Moncef; Meissner, Kristian; Høye, Toke Thomas (Wiley, 2020)Understanding how biological communities respond to environmental changes is a key challenge in ecology and ecosystem management. The apparent decline of insect populations necessitates more biomonitoring but the time-consuming ... -
Part-of-speech tagging in written slang
Korolainen, Valtteri (2014)Erilaiset kieliteknologiasovellukset ovat olleet jo vuosikymmeniä arkipäiväises-sä käytössä. Esimerkiksi ennustava tekstinsyöttö ja automaattinen korjaus ovat olleet käytössä jo vuosikymmeniä. Puheen tunnistus ja kielen ... -
Improving search engine results using different machine learning models and tools
Ambaye, Michael (2020)The aim of this thesis is to provide viable methods that can be used to improve the return position (RP) of a relevant document when a natural language query (NLQ) is applied by a user. For the purpose of demonstration, ... -
Automatic training data labeling for Finnish clinical narrative NLP tasks
Ihalainen, Simo (2022)Terveydenhuollossa suuri määrä dataa on tallennettuna elektronisiin potilastietojärjestelmiin potilaskertomusten muodossa. Potilaskertomustekstien tehokas hyödyntäminen päivittäisessä hoitotyössä ja kliinisessä tutkimuksessa ...