Assessing instructor-AI cooperation for grading essay-type questions in an introductory sociology course
Project Overview
The document examines the use of generative AI, particularly GPT models, in grading essay-type questions within higher education, focusing on an introductory sociology course. It highlights the potential of AI as a supportive tool alongside human grading, aiming to enhance fairness and minimize biases in assessments. The findings reveal a significant correlation between the grading outcomes of GPT models and human evaluators, especially when the AI is supplied with template answers, indicating that GPT can effectively replicate human-like grading patterns. However, the study also notes persistent discrepancies in certain areas, reinforcing the idea that while AI can assist in the grading process, it should not fully replace human evaluators. The overall implication is that generative AI has valuable applications in education, particularly in assessment practices, but its integration should be approached with caution, ensuring that it complements rather than substitutes human judgment.
Key Applications
Generative pre-trained transformers (GPT) for grading essay-type questions
Context: Higher education, specifically in an introductory sociology course
Implementation: AI models were tested on 70 handwritten exams, transcribing and scoring students' responses. Various settings and prompts were used to assess consistency and accuracy.
Outcomes: High similarity in transcriptions between human and GPT; strong correlation between GPT and human grading when template answers were provided.
Challenges: Discrepancies in grading, potential biases in human grading, and the need for human verification.
Implementation Barriers
Technical barrier
GPT models cannot directly process common image formats, requiring conversion of images to a compatible format.
Proposed Solutions: Utilize automated processes for image conversion and ensure educators have access to necessary tools for integration.
Implementation barrier
Inconsistencies in grading between human graders and GPT models, leading to potential unfair evaluations. Human grading is prone to biases, and AI should not be viewed as a perfect replacement for human judgment.
Proposed Solutions: Use GPT as a complementary tool to flag inconsistencies for human review instead of relying solely on AI grading. Leverage AI to enhance fairness in assessments rather than replace human evaluators; promote deidentification of assessments to reduce biases.
Ethical barrier
AI should not be viewed as a perfect replacement for human judgment, as human grading is prone to biases.
Proposed Solutions: Leverage AI to enhance fairness in assessments rather than replace human evaluators; promote deidentification of assessments to reduce biases.
Project Team
Francisco Olivos
Researcher
Tobias Kamelski
Researcher
Sebastián Ascui-Gac
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Francisco Olivos, Tobias Kamelski, Sebastián Ascui-Gac
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18