Skip to main content Skip to navigation

6. Performance of AI Models and Urgency for Institutional Action

Performance of AI Models

The assessment of GPT-4o's performance on university-level mathematics and statistics assignments provided critical insights into its capabilities and limitations:

  • Methodology: Assignments from Years 1 to 4, covering both proof-based and applied problems, were processed through GPT-4o using a zero-shot approach—without additional guidance or advanced prompting techniques. The AI-generated responses were marked against established mark schemes. Lecturers also evaluated the outputs based on correctness, similarity to student work, detectability as AI-generated, and adaptability into student work.
  • Correctness of Solutions: GPT-4o performed exceptionally well on Year 1 assignments, achieving high correctness scores, particularly in applied questions. For Years 2 to 4, the AI's performance declined. Lecturers noted that answers to proof questions were often vague, lacked detailed reasoning, or contained significant errors. The AI struggled with the complex reasoning and multi-step logical processes required in higher-level tasks.
  • Similarity to Student Work and Detectability: In higher years, the AI's answers were moderately similar to student submissions. Lecturers observed that AI-generated responses sometimes included unusual phrasing, excessive verbosity, or atypical grammar—features that could indicate AI authorship. However, these characteristics could be easily modified by students to resemble their own writing style, reducing detectability.
  • Adaptability into Student Work: Lecturers expressed concern that students could readily adapt AI-generated answers to pass as independent work. By correcting errors, adjusting language, and removing tell-tale signs, students might integrate AI outputs into their assignments without detection, challenging the effectiveness of traditional plagiarism checks and raising ethical issues.
  • Combined Data Analysis: When combining this data with previous evaluations, it became evident that GPT-4o excels at first-year level questions, with 75.81% of answers falling into the highest scoring category. Performance on second-year assignments showed a higher percentage of low-scoring answers, and in Years 3 and 4, the AI's expected scores decreased further. This pattern indicates that while the AI handles foundational concepts well, it is less proficient with advanced material requiring deeper understanding.

These results highlight the nuanced capabilities of GPT-4o. While it demonstrates strong performance on simpler tasks, its limitations in complex reasoning do not necessarily prevent misuse. The ease with which students can adapt AI-generated content poses significant challenges for maintaining academic integrity and underscores the need for effective strategies to address this issue.

Urgency for Institutional Action

The findings from this study emphasise an urgent need for universities to adapt their assessment strategies and policies in response to rapid AI advancements. However, addressing these challenges is complex and requires careful consideration.

  • Re-evaluating Assessment Methods: Traditional take-home assignments are increasingly susceptible to AI-assisted cheating. While options such as in-person examinations, oral assessments, and assignments requiring personal reflection or creativity have been suggested, our research indicates that these are not foolproof solutions. AI's capabilities have advanced to a point where it can assist students with many of these tasks, potentially undermining their effectiveness. Educators must recognise that AI is now an integral part of the educational landscape, and entirely "AI-proof" assessments may not be feasible.
  • Adapting to the Reality of AI: Institutions need to shift from attempting to avoid AI entirely to finding ways to work alongside it. This involves acknowledging AI's capabilities and limitations, and integrating it into learning in a manner that enhances education while maintaining academic integrity. Alternative assessment methods should be developed collaboratively with educators and students, fostering innovation and ownership. However, these methods alone are insufficient, and a combination of strategies is necessary.
  • Challenges with Alternative Assessments: The implementation of alternative assessment methods, such as project-based learning, presentations, and in-class tests, presents its own set of challenges. These include resource constraints, ensuring fairness and equity, aligning with learning objectives, and overcoming resistance to change. Moreover, AI's rapidly evolving capabilities mean that even these methods are not immune to misuse and advanced technologies could facilitate real-time assistance during live assessments.
  • Promoting AI Literacy and Ethical Awareness: There is a clear gap in understanding AI's capabilities and ethical implications among students and staff. Universities should implement educational initiatives to enhance AI literacy, demystify the technology, and foster an ethical culture around its use. This includes providing guidance on acceptable use, highlighting the importance of academic integrity, and educating students on the potential consequences of misuse.
  • Developing Comprehensive Policies and Guidelines: Institutions need to establish robust, clear policies that address the ethical challenges posed by AI in academia. These policies should define what constitutes permissible use of AI tools, outline expectations for academic honesty, and specify penalties for violations. Regular updates to these policies are essential to keep pace with technological advancements and evolving academic practices.
  • Facilitating Open Dialogue and Collaboration: Encouraging ongoing conversations among students, educators, and administrators is crucial to address concerns and misconceptions about AI. Open dialogue can help align perceptions, set shared expectations, and collaboratively develop strategies to uphold academic integrity while leveraging AI's potential benefits for learning.
  • Implementing Adaptive Strategies: Recognising that no single solution will fully prevent AI misuse, a multifaceted approach is necessary. Strategies may include:
    • Combination of Assessment Types: Incorporating a variety of assessment methods to evaluate different competencies and reduce reliance on any single format that could be compromised by AI.
    • Verification Processes: Implementing verification steps such as oral defences, live problem-solving sessions, or reflective components that require students to demonstrate understanding beyond written submissions. However, scalability and resource limitations must be considered.
    • Emphasis on Learning Processes: Focusing on the development of critical thinking, problem-solving skills, and ethical understanding, rather than solely on final answers. This may involve continuous assessment and feedback loops.
    • Collaborative Development: Involving educators and students in the co-creation of assessment methods and policies to ensure they are practical, effective, and aligned with educational goals.

The swift progression from GPT-3.5 to GPT-4o, the introduction of GPT-o1, and the anticipated arrival of even more advanced models like GPT-5 highlight the pressing nature of these challenges. Educational institutions are uniquely positioned to lead the examination of AI's impact on learning and assessment. Proactive and immediate action is essential to ensure that the integration of AI enhances education without compromising academic standards and the credibility of qualifications.