New data, benchmark and baseline for L2 speaking assessment for low-resoure languages
Kurimo, M., Getman, Y., Voskoboinik, E., Al-Ghezi, R., Kallio, H., Kuronen, M., von Zansen, A., Hilden, R., Kronholm, S., Huhta, A., & Linden, K. (2023). New data, benchmark and baseline for L2 speaking assessment for low-resoure languages. In Proceedings of the 9th Workshop on Speech and Language Technology in Education (SLaTE) (pp. 166-170). International Speech Communication Association. https://doi.org/10.21437/SLaTE.2023-32
Tekijät
Päivämäärä
2023Oppiaine
Soveltava kielentutkimusRuotsin kieliHyvinvoinnin tutkimuksen yhteisöSuomen kieliApplied language studiesSwedishSchool of WellbeingFinnishTekijänoikeudet
© 2023 International Speech Communication Association
The development of large multilingual speech models provides the possibility to construct high-quality speech technology even for low-resource languages. In this paper, we present the speech data of L2 learners of Finnish and Finland Swedish that we have recently collected for training and evaluation of automatic speech recognition (ASR) and speaking assessment (ASA). It includes over 4000 recordings by over 300 students per language in short read-aloud and free-form tasks. The recordings have been manually transcribed and assessed for pronunciation, fluency, range, accuracy, task achievement, and a holistic proficiency level. We present also an ASR and ASA benchmarking setup we have constructed using this data and include results from our baseline systems built by fine-tuning self-supervised multilingual model for the target language. In addition to benchmarking, our baseline system can be used by L2 students and teachers for online self-training and evaluation of oral proficiency.
Julkaisija
International Speech Communication AssociationKonferenssi
Workshop on Speech and Language Technology in EducationKuuluu julkaisuun
Proceedings of the 9th Workshop on Speech and Language Technology in Education (SLaTE)Asiasanat
Julkaisu tutkimustietojärjestelmässä
https://converis.jyu.fi/converis/portal/detail/Publication/184436350
Metadata
Näytä kaikki kuvailutiedotKokoelmat
Rahoittaja(t)
Suomen AkatemiaRahoitusohjelmat(t)
Akatemiahanke, SALisätietoja rahoituksesta
This work was done and the data were collected as part of the Academy of Finland grants number 322619, 322625, 322965 and 337073.Lisenssi
Samankaltainen aineisto
Näytetään aineistoja, joilla on samankaltainen nimeke tai asiasanat.
-
Aletaan alusta : luku- ja kirjoitustaidottomat aikuiset uutta kieltä oppimassa
Tammelin-Laine, Taina (University of Jyväskylä, 2014) -
Creaky voice and utterance fluency measures in predicting perceived fluency and oral proficiency of spontaneous L2 Finnish
Kallio, Heini; Suviranta, Rosa; Kuronen, Mikko; Zansen, Anna von (International Speech Communication Association, 2022)While utterance fluency measures are often studied in rela tion to perceived L2 fluency and proficiency, the effect of creaky voice remains ignored. However, creaky voice is frequent in a number of languages, including ... -
Venäjänkielisten suomen puhumisen taito taustamuuttujien valossa
Ahola, Sari; Hirvelä, Tuija (Venäjän ja Itä-Euroopan tutkimuksen seura, 2024)The study examines Russian-speaking examinees’ (n=8412) oral proficiency in Finnish and changes in proficiency between 2012–2021 using the register data from a national high-stakes language test (National Certificate of ... -
Inlärning och behärskning av svenskans verb- och adjektivböjning samt negationens placering hos finska grundskoleelever
Paavilainen, Marika (University of Jyväskylä, 2015) -
The role of pause location in perceived fluency and proficiency in L2 Finnish
Kallio, Heini; Kuronen, Mikko; Koivusalo, Liisa (International Speech Communication Association, 2022)Fluency is a commonly used descriptor of second language (L2) speaking skills. Unplanned and too frequent pauses, hesitations, and repetitions disrupt the flow of speech and can cause temporal irregularities at all levels ...
Ellei toisin mainittu, julkisesti saatavilla olevia JYX-metatietoja (poislukien tiivistelmät) saa vapaasti uudelleenkäyttää CC0-lisenssillä.