The International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora
Čermáková, A., Jantunen, J., Jauhiainen, T., Kirk, J., Křen, M., Kupietz, M., & Uí Dhonnchadha, E. (2021). The International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora. Research in Corpus Linguistics, 9(1), 89-103. https://doi.org/10.32714/ricl.09.01.06
Published in
Research in Corpus LinguisticsAuthors
Date
2021Copyright
© 2021 Research in Corpus Linguistics
This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.
...
Publisher
Asociacion Espanola de Linguistica de CorpusISSN Search the Publication Forum
2243-4712Keywords
Original source
http://ricl.aelinco.es/first-view/155-Article%20Text-1147-1-10-20210618.pdfPublication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/98442746
Metadata
Show full item recordCollections
License
Related items
Showing items with similar title or keywords.
-
Étude comparative du futur simple dans un corpus littéraire finno-français
Karhunen, Anna (2008) -
Between context and comparability : Exploring new solutions for a familiar methodological challenge in qualitative comparative research
Kosmützky, Anna; Nokkala, Terhi; Diogo, Sara (Wiley-Blackwell, 2020)Finding the balance between adequately describing the uniqueness of the context of studied phenomena and maintaining sufficient common ground for comparability and analytical generalisation has widely been recognised as a ... -
Corpora, phraseology and dictionaries : How does corpus research intersect language teaching and learning?
Jantunen, Jarmo Harri (Uusfilologinen Yhdistys, 2016)This article discusses the role of corpus data in language learning and teaching as well as the benefits of using authentic language data in learner dictionary writing. It has been argued that acquiring and teaching ... -
Word clouds and beyond : corpus linguistic self-study material package for English for academic purposes
Hokkanen, Jere (2019)Englanti on akateemisen maailman yleiskieli, lingua franca. Tässä yhteisössä englannin kieli on välttämätön vaatimus täyteen osallistumiseen kansainvälisissä konteksteissa. Kolmannen asteen oppija tarvitsee akateemista englantia ... -
La domestication et l’étrangéisation des noms propres dans un corpus de traductions françaises de la littérature pour enfants finnoise
Miettinen, Sandra (2020)Tämän kontrastiivisen tutkimuksen tavoitteena on tarkastella kahdeksan suomalaisen lastenkirjan henkilöhahmojen nimien ranskankielisiä vastineita kotouttavan ja vieraannuttavan käännöksen näkökulmasta. Tutkimuksessa ...