The International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora
Čermáková, A., Jantunen, J., Jauhiainen, T., Kirk, J., Křen, M., Kupietz, M., & Uí Dhonnchadha, E. (2021). The International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora. Research in Corpus Linguistics, 9(1), 89-103. https://doi.org/10.32714/ricl.09.01.06
Published in
Research in Corpus LinguisticsAuthors
Date
2021Copyright
© 2021 Research in Corpus Linguistics
This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.
...


Publisher
Asociacion Espanola de Linguistica de CorpusISSN Search the Publication Forum
2243-4712Keywords
Original source
http://ricl.aelinco.es/first-view/155-Article%20Text-1147-1-10-20210618.pdfPublication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/98442746
Metadata
Show full item recordCollections
License
Related items
Showing items with similar title or keywords.
-
Between context and comparability : Exploring new solutions for a familiar methodological challenge in qualitative comparative research
Kosmützky, Anna; Nokkala, Terhi; Diogo, Sara (Wiley-Blackwell, 2020)Finding the balance between adequately describing the uniqueness of the context of studied phenomena and maintaining sufficient common ground for comparability and analytical generalisation has widely been recognised as a ... -
Do concepts and methods have ethics?
Laihonen, Petteri (Language on the Move, 2020) -
Corpora, phraseology and dictionaries : How does corpus research intersect language teaching and learning?
Jantunen, Jarmo Harri (Uusfilologinen Yhdistys, 2016)This article discusses the role of corpus data in language learning and teaching as well as the benefits of using authentic language data in learner dictionary writing. It has been argued that acquiring and teaching ... -
The prevalence of core vocabulary in World of Warcraft’s written in-game quest instruction
Finne, Miso (2021)Tutkielmassa tarkastellaan massiviisten nettiroolipelien pelaamisen yhteyttä vieraan kielen sanaston oppimiseen. Verkkopelaamisen yleistymisen myötä myös siihen liittyvä tutkimus on lisääntynyt. Yleisimpinä tutkimuskohteina ... -
Comma or no comma : two case studies on present-day English corpora
Jormalainen, Maija (2012)Pilkun käyttö englannin kielessä nähdään yleensä yksioikoisena. Tietyt pilkkusäännöt ovat juurtuneet ihmisten käytäntöön ja usein on vaikea nähdä pilkkua niinkin monimutkaisena ja epäselvänä kuin se todellisuudessa on. ...