Establishing a Standardised Procedure for Building Learner Corpora
Glaznieks, A., Nicolas, L. Stemle, E. Abel & A. Lyding, V. (2014). Establishing a Standardised Procedure for Building Learner Corpora. Apples: journal of applied language studies, 8 (3), 5-20. Retrieved from http://apples.jyu.fi
Published in
Apples : Journal of Applied Language StudiesDate
2014Copyright
© The Author(s)
Decisions at the outset of preparing a learner corpus are of crucial importance for how the corpus can be built and how it can be analysed later on. This paper presents a generic workflow to build learner corpora while taking into account the needs of the users. The workflow results from an extensive collaboration between linguists that annotate and use the corpus and computer linguists that are responsible for providing technical support. The paper addresses the linguists’ research needs as well as the availability and usability of language technology tools necessary to meet them. We demonstrate and illustrate the relevance of the workflow using results and examples from our L1 learner corpus of German (“KoKo”).
Publisher
Centre for Applied Language Studies, University of JyväskyläISSN Search the Publication Forum
1457-9863
Original source
http://apples.jyu.fiMetadata
Show full item recordCollections
Related items
Showing items with similar title or keywords.
-
The Corpus of Advanced Learner Finnish (LAS2): Database and toolkit to study academic learner Finnish
Ivaska, Ilmari (Centre for Applied Language Studies, University of Jyväskylä, 2014)This paper introduces the Corpus of Advanced Learner Finnish (LAS2), one of the existing corpora of learner Finnish. The corpus was started at the University of Turku in 2007, and the initial motivation for its collection ... -
Using Automatic Morphological Tools to Process Data from a Learner Corpus of Hungarian
Durst, Péter; Szabó, Martina Katalin; Vincze, Veronica; Zsibrita, János (Centre for Applied Language Studies, University of Jyväskylä, 2014)The aim of this article is to show how automatic morphological tools originally used to analyze native speaker data can be applied to process data from a learner corpus of Hungarian. We collected written data from 35 ... -
Corpora, phraseology and dictionaries : How does corpus research intersect language teaching and learning?
Jantunen, Jarmo Harri (Uusfilologinen Yhdistys, 2016)This article discusses the role of corpus data in language learning and teaching as well as the benefits of using authentic language data in learner dictionary writing. It has been argued that acquiring and teaching ... -
The International Comparable Corpus : Challenges in building multilingual spoken and written comparable corpora
Čermáková, Ann; Jantunen, Jarmo; Jauhiainen, Tommi; Kirk, John; Křen, Michal; Kupietz, Marc; Uí Dhonnchadha, Elaine (Asociacion Espanola de Linguistica de Corpus, 2021)This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. ... -
How do the Common European Framework levels differ in terms of linguistic features? : analysing English language learners’ written corpora by using Natural Language Processing tools
Khushik, Ghulam Abbas (Jyväskylän yliopisto, 2023)The Common European Framework of Reference (CEFR) for language learning, teaching, and assessment developed by the Council of Europe (CoE, 2001) is a widely used reference source to increase transparency in language education ...