Näytä suppeat kuvailutiedot

dc.contributor.authorČermáková, Ann
dc.contributor.authorJantunen, Jarmo
dc.contributor.authorJauhiainen, Tommi
dc.contributor.authorKirk, John
dc.contributor.authorKřen, Michal
dc.contributor.authorKupietz, Marc
dc.contributor.authorUí Dhonnchadha, Elaine
dc.date.accessioned2022-02-07T06:29:40Z
dc.date.available2022-02-07T06:29:40Z
dc.date.issued2021
dc.identifier.citationČermáková, A., Jantunen, J., Jauhiainen, T., Kirk, J., Křen, M., Kupietz, M., & Uí Dhonnchadha, E. (2021). The International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora. <i>Research in Corpus Linguistics</i>, <i>9</i>(1), 89-103. <a href="https://doi.org/10.32714/ricl.09.01.06" target="_blank">https://doi.org/10.32714/ricl.09.01.06</a>
dc.identifier.otherCONVID_98442746
dc.identifier.urihttps://jyx.jyu.fi/handle/123456789/79643
dc.description.abstractThis paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.en
dc.format.mimetypeapplication/pdf
dc.language.isoeng
dc.publisherAsociacion Espanola de Linguistica de Corpus
dc.relation.ispartofseriesResearch in Corpus Linguistics
dc.relation.urihttp://ricl.aelinco.es/first-view/155-Article%20Text-1147-1-10-20210618.pdf
dc.rightsCC BY 4.0
dc.subject.otherICC corpus
dc.subject.othercontrastive linguistics
dc.subject.othercomparable corpus
dc.subject.otherICE corpus
dc.subject.otherdata sustainability
dc.subject.othercopyright
dc.titleThe International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora
dc.typearticle
dc.identifier.urnURN:NBN:fi:jyu-202202071401
dc.contributor.laitosKieli- ja viestintätieteiden laitosfi
dc.contributor.laitosDepartment of Language and Communication Studiesen
dc.contributor.oppiaineSuomen kielifi
dc.contributor.oppiaineFinnishen
dc.type.urihttp://purl.org/eprint/type/JournalArticle
dc.type.coarhttp://purl.org/coar/resource_type/c_2df8fbb1
dc.description.reviewstatuspeerReviewed
dc.format.pagerange89-103
dc.relation.issn2243-4712
dc.relation.numberinseries1
dc.relation.volume9
dc.type.versionpublishedVersion
dc.rights.copyright© 2021 Research in Corpus Linguistics
dc.rights.accesslevelopenAccessfi
dc.subject.ysokielitiede
dc.subject.ysotekijänoikeus
dc.subject.ysokontrastiivinen tutkimus
dc.subject.ysokorpukset
dc.subject.ysovertaileva kielitiede
dc.format.contentfulltext
jyx.subject.urihttp://www.yso.fi/onto/yso/p1631
jyx.subject.urihttp://www.yso.fi/onto/yso/p2346
jyx.subject.urihttp://www.yso.fi/onto/yso/p1773
jyx.subject.urihttp://www.yso.fi/onto/yso/p22933
jyx.subject.urihttp://www.yso.fi/onto/yso/p7962
dc.rights.urlhttps://creativecommons.org/licenses/by/4.0/
dc.relation.doi10.32714/ricl.09.01.06
dc.type.okmA1


Aineistoon kuuluvat tiedostot

Thumbnail

Aineisto kuuluu seuraaviin kokoelmiin

Näytä suppeat kuvailutiedot

CC BY 4.0
Ellei muuten mainita, aineiston lisenssi on CC BY 4.0