Näytä suppeat kuvailutiedot

dc.contributor.authorRoll, Uri
dc.contributor.authorCorreia, Ricardo
dc.contributor.authorBerger-Tal, Oded
dc.date.accessioned2019-01-09T21:37:29Z
dc.date.available2019-01-09T21:37:29Z
dc.date.issued2018
dc.identifier.citationRoll, U., Correia, R. and Berger-Tal, O. (2018). Disentangling homonyms- using artificial neural networks to separate the cream from the crop in large text corpora. 5th European Congress of Conservation Biology. doi: 10.17011/conference/eccb2018/107550
dc.identifier.urihttps://jyx.jyu.fi/handle/123456789/61975
dc.description.abstractRecent years have seen a great influx in scientific publications as well other sources of text corpora that are used for conservation research. This surge holds much promise in promoting great advancements in science, but also presents new challenges. One of the great issues of utilizing this plethora of information is how to efficiently sort through it and retain only its relevant sections. Homonyms - terms that share spelling but differ in meaning - present a unique challenge within this respect as they do not contain inherent information that can aid in their classification across narratives. This issue is of relevance for an array of different conservation culturomics studies, as homonyms add a lot of noise to results which cannot be easily identified. In this work we constructed a semi-automated approach that can aid in the classification of homonyms between narratives. We used a combination of automated content analysis and artificial neural networks to quickly and accurately sift through large corpora of academic texts and classify them to distinct topics. As an example, we explore the use of the word 'reintroduction' in academic texts. Reintroduction is used within the conservation context to indicate the release of organisms to their former native habitat, however an 'ISI' search using this word returns thousands of publications that use this term with other meanings and contexts. Using our method, we were able to quickly and correctly classify thousands of academic texts with more than 99% accuracy between conservation related and unrelated publications. Our approach can be easily used with any other homonym terms and can greatly facilitate sorting data in cases where homonyms hinder the harnessing of large text corpora. Beyond homonyms we see great promise in the combination of automated content analyses and machine learning methods in handling and screening big data for relevant information.
dc.format.mimetypetext/html
dc.language.isoeng
dc.publisherOpen Science Centre, University of Jyväskylä
dc.relation.urihttps://peerageofscience.org/conference/eccb2018/107550/
dc.rightsCC BY 4.0
dc.titleDisentangling homonyms- using artificial neural networks to separate the cream from the crop in large text corpora
dc.typeArticle
dc.type.urihttp://purl.org/eprint/type/ConferenceItem
dc.identifier.doi10.17011/conference/eccb2018/107550
dc.type.coarconference paper not in proceedings
dc.description.reviewstatuspeerReviewed
dc.type.versionpublishedVersion
dc.rights.copyright© the Authors, 2018
dc.rights.accesslevelopenAccess
dc.type.publicationconferenceObject
dc.relation.conferenceECCB2018: 5th European Congress of Conservation Biology. 12th - 15th of June 2018, Jyväskylä, Finland
dc.format.contentfulltext
dc.rights.urlhttp://creativecommons.org/licenses/by/4.0/


Aineistoon kuuluvat tiedostot

Thumbnail

Aineisto kuuluu seuraaviin kokoelmiin

  • ECCB 2018 [712]
    5th European Congress of Conservation Biology. 12th - 15th of June 2018, Jyväskylä, Finland

Näytä suppeat kuvailutiedot

CC BY 4.0
Ellei muuten mainita, aineiston lisenssi on CC BY 4.0