Disentangling homonyms- using artificial neural networks to separate the cream from the crop in large text corpora

Roll, Uri; Correia, Ricardo; Berger-Tal, Oded

doi:10.17011/conference/eccb2018/107550

dc.contributor.author	Roll, Uri
dc.contributor.author	Correia, Ricardo
dc.contributor.author	Berger-Tal, Oded
dc.date.accessioned	2019-01-09T21:37:29Z
dc.date.available	2019-01-09T21:37:29Z
dc.date.issued	2018
dc.identifier.citation	Roll, U., Correia, R. and Berger-Tal, O. (2018). Disentangling homonyms- using artificial neural networks to separate the cream from the crop in large text corpora. 5th European Congress of Conservation Biology. doi: 10.17011/conference/eccb2018/107550
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/61975
dc.description.abstract	Recent years have seen a great influx in scientific publications as well other sources of text corpora that are used for conservation research. This surge holds much promise in promoting great advancements in science, but also presents new challenges. One of the great issues of utilizing this plethora of information is how to efficiently sort through it and retain only its relevant sections. Homonyms - terms that share spelling but differ in meaning - present a unique challenge within this respect as they do not contain inherent information that can aid in their classification across narratives. This issue is of relevance for an array of different conservation culturomics studies, as homonyms add a lot of noise to results which cannot be easily identified. In this work we constructed a semi-automated approach that can aid in the classification of homonyms between narratives. We used a combination of automated content analysis and artificial neural networks to quickly and accurately sift through large corpora of academic texts and classify them to distinct topics. As an example, we explore the use of the word 'reintroduction' in academic texts. Reintroduction is used within the conservation context to indicate the release of organisms to their former native habitat, however an 'ISI' search using this word returns thousands of publications that use this term with other meanings and contexts. Using our method, we were able to quickly and correctly classify thousands of academic texts with more than 99% accuracy between conservation related and unrelated publications. Our approach can be easily used with any other homonym terms and can greatly facilitate sorting data in cases where homonyms hinder the harnessing of large text corpora. Beyond homonyms we see great promise in the combination of automated content analyses and machine learning methods in handling and screening big data for relevant information.
dc.format.mimetype	text/html
dc.language.iso	eng
dc.publisher	Open Science Centre, University of Jyväskylä
dc.relation.uri	https://peerageofscience.org/conference/eccb2018/107550/
dc.rights	CC BY 4.0
dc.title	Disentangling homonyms- using artificial neural networks to separate the cream from the crop in large text corpora
dc.type	conference paper not in proceedings
dc.type.uri	http://purl.org/eprint/type/ConferenceItem
dc.identifier.doi	10.17011/conference/eccb2018/107550
dc.type.coar	http://purl.org/coar/resource_type/c_18cp
dc.description.reviewstatus	peerReviewed
dc.type.version	publishedVersion
dc.rights.copyright	© the Authors, 2018
dc.rights.accesslevel	openAccess
dc.type.publication	conferenceObject
dc.relation.conference	ECCB2018: 5th European Congress of Conservation Biology. 12th - 15th of June 2018, Jyväskylä, Finland
dc.format.content	fulltext
dc.rights.url	http://creativecommons.org/licenses/by/4.0/

Aineistoon kuuluvat tiedostot

Nimi:: article107550.html
Koko:: 11.64Kb
Tiedostomuoto:: HTML
Kuvaus:: FULLTEXT

Katso/Avaa

Aineisto kuuluu seuraaviin kokoelmiin

ECCB 2018 [712]
5th European Congress of Conservation Biology. 12th - 15th of June 2018, Jyväskylä, Finland

Näytä suppeat kuvailutiedot

Ellei muuten mainita, aineiston lisenssi on CC BY 4.0

Disentangling homonyms- using artificial neural networks to separate the cream from the crop in large text corpora

Aineistoon kuuluvat tiedostot

Aineisto kuuluu seuraaviin kokoelmiin

Samankaltainen aineisto

Using deep neural networks for kinematic analysis : challenges and opportunities ﻿

Application of artificial neural network and genetic algorithm to forecasting of wind power output ﻿

The Impact of Regularization on Convolutional Neural Networks ﻿

Neural Mechanisms of Joint Action in Musical Ensembles : Disentangling Self and Other Integration ﻿

Process‐Informed Neural Networks : A Hybrid Modelling Approach to Improve Predictive Performance and Inference of Neural Networks in Ecology and Beyond ﻿

Using deep neural networks for kinematic analysis : challenges and opportunities

Application of artificial neural network and genetic algorithm to forecasting of wind power output

The Impact of Regularization on Convolutional Neural Networks

Neural Mechanisms of Joint Action in Musical Ensembles : Disentangling Self and Other Integration

Process‐Informed Neural Networks : A Hybrid Modelling Approach to Improve Predictive Performance and Inference of Neural Networks in Ecology and Beyond