Open Source Language Models Can Provide Feedback : Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge
Koutcheme, C., Dainese, N., Sarsa, S., Hellas, A., Leinonen, J., & Denny, P. (2024). Open Source Language Models Can Provide Feedback : Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge. In ITiCSE 2024 : Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1 (pp. 52-58). ACM. https://doi.org/10.1145/3649217.3653612
Authors
Date
2024Copyright
© 2024 the Authors
Large language models (LLMs) have shown great potential for the automatic generation of feedback in a wide range of computing contexts. However, concerns have been voiced around the privacy and ethical implications of sending student work to proprietary models. This has sparked considerable interest in the use of open source LLMs in education, but the quality of the feedback that such open models can produce remains understudied. This is a concern as providing flawed or misleading generated feedback could be detrimental to student learning. Inspired by recent work that has utilised very powerful LLMs, such as GPT-4, to evaluate the outputs produced by less powerful models, we conduct an automated analysis of the quality of the feedback produced by several open source models using a dataset from an introductory programming course. First, we investigate the viability of employing GPT-4 as an automated evaluator by comparing its evaluations with those of a human expert. We observe that GPT-4 demonstrates a bias toward positively rating feedback while exhibiting moderate agreement with human raters, showcasing its potential as a feedback evaluator. Second, we explore the quality of feedback generated by several leading open-source LLMs by using GPT-4 to evaluate the feedback. We find that some models offer competitive performance with popular proprietary LLMs, such as ChatGPT, indicating opportunities for their responsible use in educational settings.
...
Publisher
ACMParent publication ISBN
979-8-4007-0600-4Conference
Conference on Innovation and Technology in Computer Science EducationIs part of publication
ITiCSE 2024 : Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1Keywords
Publication in research information system
https://converis.jyu.fi/converis/portal/detail/Publication/221088392
Metadata
Show full item recordCollections
Additional information about funding
This research was partially supported by the Research Council of Finland (Academy Research Fellow grant number 356114).License
Related items
Showing items with similar title or keywords.
-
Evaluating Contextually Personalized Programming Exercises Created with Generative AI
Logacheva, Evanfiya; Hellas, Arto; Prather, James; Sarsa, Sami; Leinonen, Juho (ACM, 2024)Programming skills are typically developed through completing various hands-on exercises. Such programming problems can be contextualized to students’ interests and cultural backgrounds. Prior research in educational ... -
"Like a Nesting Doll" : Analyzing Recursion Analogies Generated by CS Students Using Large Language Models
Bernstein, Seth; Denny, Paul; Leinonen, Juho; Kan, Lauren; Hellas, Arto; Littlefield, Matt; Sarsa, Sami; Macneil, Stephen (ACM, 2024)Grasping complex computing concepts often poses a challenge for students who struggle to anchor these new ideas to familiar experiences and understandings. To help with this, a good analogy can bridge the gap between ... -
How do Finnish and Chinese students’ diverse pedagogical experiences shape feedback interpretation?
Liontou, Magdalini (Suomen soveltavan kielitieteen yhdistys ry, 2023)Due to the dissemination of joint degree programmes in higher education, more students from different educational backgrounds are exposed to the same teaching and assessment without sharing a common pedagogical culture. ... -
Unfolding principles for student peer feedback : a comparative analysis of examples across higher education contexts
Ellegaard, Marianne; Niss, Maritn; Bruun, Jesper; Lämsä, Joni; Voetman Christiansen, Frederik; Linell, Gry Green; Fogh Larsen, Camilla; Nyman, Rimma; Johannsen, Bjørn Friis (Cappelen Damm AS - Cappelen Damm Akademisk, 2022)In this paper we conceptualize formative peer feedback principles by analyzing and comparing six empirical examples of formative peer feedback in a set of international STEM (science, technology, engineering, and mathematics) ... -
The Relevance of Versatile Learning Online Assessment Feedback for University Student
Maunula, Minna; Maunumäki, Minna; Harju-Luukkainen, Heidi (International Journal of Multidisciplinary Perspectives in Higher Education, 2023)In the process of learning, assessment is relevant from multiple perspectives. Learning assessment guides student learning and teaching either knowingly or unconsciously. This study takes a closer look at the meanings given ...