Skip to main content Skip to navigation

Towards LLM-based Autograding for Short Textual Answers

Project Overview

The document explores the use of large language models (LLMs), particularly ChatGPT, in the realm of education, focusing on their application for autograding short textual responses. It emphasizes the potential advantages of integrating LLMs as a supplementary tool for grading, which can enhance efficiency and provide additional perspectives. However, it also raises critical issues such as inherent biases, ethical dilemmas, and the need for human oversight, especially in high-stakes assessments. The findings indicate that while LLMs can be valuable in assisting educators, their ability to independently and accurately grade responses is currently limited, necessitating a cautious approach to their implementation in educational settings. Overall, the document underscores the promise of generative AI in education while highlighting the importance of addressing associated challenges to ensure fair and effective use.

Key Applications

Automatic Short Answer Grading (ASAG) using LLMs such as ChatGPT

Context: Educational context includes exams from two distinct courses: a master's level data science course and a bachelor's level information systems course, with a target audience of students in these programs.

Implementation: LLMs were used to assess students' answers to exam questions and provide feedback on both student and educator answers.

Outcomes: The LLM offered a second opinion that could help identify grading flaws and provided a more general perspective on student answers, potentially supporting educators in the grading process.

Challenges: The LLMs showed biases, sensitivity to minor changes in answers, and often failed to provide concise reasoning, leading to significant discrepancies with human graders.

Implementation Barriers

Technical

LLMs can exhibit biases, discrimination, and factual inaccuracies, impacting their effectiveness in grading. The implementation of LLMs for grading requires significant adjustments in prompt design and understanding of the system's limitations.

Proposed Solutions: Incorporating human oversight in the grading process, improving LLMs through fine-tuning or in-context learning, and developing a more explicit grading scheme while providing context-specific training data for the LLMs.

Ethical

Using LLMs for grading raises ethical concerns related to decision-making bias and the potential impact on students' educational outcomes.

Proposed Solutions: Establishing guidelines for the ethical use of LLMs in education and ensuring that LLMs serve as support tools rather than sole decision-makers.

Project Team

Johannes Schneider

Researcher

Bernd Schenk

Researcher

Christina Niklaus

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Johannes Schneider, Bernd Schenk, Christina Niklaus

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies