Can ChatGPT and Bard Generate Aligned Assessment Items? A Reliability Analysis against Human Performance
Project Overview
The document explores the integration of generative AI tools such as ChatGPT and Google Bard in educational settings, emphasizing their applications in automated assessments and item generation. It outlines the promising potential of these AI technologies to streamline the creation of assessment items, thereby enhancing efficiency in educational processes. However, it also addresses significant limitations in the reliability of these AI-generated assessments when compared to traditional human raters, suggesting that while AI can assist in educational tasks, it still falls short in certain areas of evaluation quality. Overall, the findings indicate that while generative AI holds considerable promise for transforming assessment practices in education, ongoing improvements are necessary to ensure their effectiveness and reliability.
Key Applications
Automated Item Generation (AIG)
Context: Educational assessment for language arts, mathematics, and sciences
Implementation: AI tools like ChatGPT and Google Bard were tested against human raters for their ability to generate and assess writing prompts' complexity.
Outcomes: AI tools show potential in creating assessment items but require further training to match human performance.
Challenges: Low reliability compared to human raters; AI tools need to be fine-tuned for better accuracy in understanding complexity.
Implementation Barriers
Technical Barrier
Generative AI tools currently lack the reliability of human scorers, especially in understanding the complexity of writing prompts.
Proposed Solutions: Further training and fine-tuning of AI models are needed to enhance their performance in educational contexts.
Project Team
Abdolvahab Khademi
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Abdolvahab Khademi
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai