The International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora

Čermáková, Ann; Jantunen, Jarmo; Jauhiainen, Tommi; Kirk, John; Křen, Michal; Kupietz, Marc; Uí Dhonnchadha, Elaine

doi:10.32714/ricl.09.01.06

dc.contributor.author	Čermáková, Ann
dc.contributor.author	Jantunen, Jarmo
dc.contributor.author	Jauhiainen, Tommi
dc.contributor.author	Kirk, John
dc.contributor.author	Křen, Michal
dc.contributor.author	Kupietz, Marc
dc.contributor.author	Uí Dhonnchadha, Elaine
dc.date.accessioned	2022-02-07T06:29:40Z
dc.date.available	2022-02-07T06:29:40Z
dc.date.issued	2021
dc.identifier.citation	Čermáková, A., Jantunen, J., Jauhiainen, T., Kirk, J., Křen, M., Kupietz, M., & Uí Dhonnchadha, E. (2021). The International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora. <i>Research in Corpus Linguistics</i>, <i>9</i>(1), 89-103. <a href="https://doi.org/10.32714/ricl.09.01.06" target="_blank">https://doi.org/10.32714/ricl.09.01.06</a>
dc.identifier.other	CONVID_98442746
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/79643
dc.description.abstract	This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.	en
dc.format.mimetype	application/pdf
dc.language.iso	eng
dc.publisher	Asociacion Espanola de Linguistica de Corpus
dc.relation.ispartofseries	Research in Corpus Linguistics
dc.relation.uri	http://ricl.aelinco.es/first-view/155-Article%20Text-1147-1-10-20210618.pdf
dc.rights	CC BY 4.0
dc.subject.other	ICC corpus
dc.subject.other	contrastive linguistics
dc.subject.other	comparable corpus
dc.subject.other	ICE corpus
dc.subject.other	data sustainability
dc.subject.other	copyright
dc.title	The International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora
dc.type	article
dc.identifier.urn	URN:NBN:fi:jyu-202202071401
dc.contributor.laitos	Kieli- ja viestintätieteiden laitos	fi
dc.contributor.laitos	Department of Language and Communication Studies	en
dc.contributor.oppiaine	Suomen kieli	fi
dc.contributor.oppiaine	Finnish	en
dc.type.uri	http://purl.org/eprint/type/JournalArticle
dc.type.coar	http://purl.org/coar/resource_type/c_2df8fbb1
dc.description.reviewstatus	peerReviewed
dc.format.pagerange	89-103
dc.relation.issn	2243-4712
dc.relation.numberinseries	1
dc.relation.volume	9
dc.type.version	publishedVersion
dc.rights.copyright	© 2021 Research in Corpus Linguistics
dc.rights.accesslevel	openAccess	fi
dc.subject.yso	kielitiede
dc.subject.yso	tekijänoikeus
dc.subject.yso	kontrastiivinen tutkimus
dc.subject.yso	korpukset
dc.subject.yso	vertaileva kielitiede
dc.format.content	fulltext
jyx.subject.uri	http://www.yso.fi/onto/yso/p1631
jyx.subject.uri	http://www.yso.fi/onto/yso/p2346
jyx.subject.uri	http://www.yso.fi/onto/yso/p1773
jyx.subject.uri	http://www.yso.fi/onto/yso/p22933
jyx.subject.uri	http://www.yso.fi/onto/yso/p7962
dc.rights.url	https://creativecommons.org/licenses/by/4.0/
dc.relation.doi	10.32714/ricl.09.01.06
dc.type.okm	A1

Aineistoon kuuluvat tiedostot

Nimi:: 155-Article%20Text-1147-1-10-2 ...
Koko:: 291.8Kb
Tiedostomuoto:: PDF
Kuvaus:: Publisher's PDF

Katso/Avaa

Aineisto kuuluu seuraaviin kokoelmiin

Humanistis-yhteiskuntatieteellinen tiedekunta [6455]

Näytä suppeat kuvailutiedot

Ellei muuten mainita, aineiston lisenssi on CC BY 4.0

The International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora

Aineistoon kuuluvat tiedostot

Aineisto kuuluu seuraaviin kokoelmiin

Samankaltainen aineisto

Étude comparative du futur simple dans un corpus littéraire finno-français ﻿

Between context and comparability : Exploring new solutions for a familiar methodological challenge in qualitative comparative research ﻿

Do concepts and methods have ethics? ﻿

Corpora, phraseology and dictionaries : How does corpus research intersect language teaching and learning? ﻿

Word clouds and beyond : corpus linguistic self-study material package for English for academic purposes ﻿

Étude comparative du futur simple dans un corpus littéraire finno-français

Between context and comparability : Exploring new solutions for a familiar methodological challenge in qualitative comparative research

Do concepts and methods have ethics?

Corpora, phraseology and dictionaries : How does corpus research intersect language teaching and learning?

Word clouds and beyond : corpus linguistic self-study material package for English for academic purposes