2. The Emergence of ChatGPT and Limitations of GPT-3.5

Home
1.Formal Report
2. Student Conversations
- 2.1 Exploring Student Survey on AI
  - 2.1.1 Ethical Concerns
  - 2.1.2 Impact on Academic Standing and Degree Value
  - 2.1.3 Student Attitudes and Acceptance of AI
  - 2.1.4 Integration of AI in Academic Assignments
  - 2.1.5 Usage Patterns and Trends
  - 2.1.6 Concerns and Future Directions
- 2.2 Student Perspectives on AI in Education
  - 2.2.1 Experiences and Attitudes Towards AI
  - 2.2.2 Ethical Considerations and Academic Integrity
  - 2.2.3 Impact on Learning and Skill Development
  - 2.2.4 The Future of AI in Education: Hopes and Fears
  - 2.2.5 Recommendations for AI Integration
3. How ChatGPT Performed on University-Level Work
- 3.1 Evaluating the Proficiency of Generative Artificial Intelligence in University-Level Mathematics and Statistics Problem-Solving
- 3.2 Assessment of Generative AI Answers From Module Leaders
4. Suggested Changes and Future Direction of Regulations
5 Opportunities AI Presents
- 5.1 Critically Analysing Outputs
6 Tips For Markers on Spotting Potential AI Usage

The Emergence of ChatGPT and Its Impact

ChatGPT, released on 30 November 2022, fundamentally altered the educational landscape overnight. Students could suddenly, instantly, and for free, access answers that far exceeded the general public's expectations of AI assistants at the time, which were primarily viewed as basic task managers or search tools like Siri or Google Assistant. This significant shift raised immediate concerns about the integrity of academic assessments, particularly in essay-based subjects where students could easily generate large portions or even entire assignments with minimal effort.

At the time of its release, ChatGPT was powered by a single Large Language Model (LLM): GPT-3.5. This model quickly became synonymous with the ChatGPT brand and remains, according to our study, the most popular version used by students nearly two years later, despite being shortly replaced by GPT-4o mini. GPT-3.5, like GPT-4o mini, was always offered for free with usage limits. It is an advanced large language model capable of generating and processing natural language. Trained on vast amounts of text data, primarily sourced from the internet, GPT-3.5 can address a wide range of topics with depth and fluency unseen in the rule-based systems that most were accustomed to for virtual assistants. It operates by predicting the next word in a sequence based on learned statistical patterns from training, which allows it to generate human-like text. However, it is important to note that while highly proficient at producing coherent responses, GPT-3.5 functioned, at least initially, as a sophisticated chatbot prediction engine rather than a tool optimised for accurate reasoning or mathematical computation. Its abilities in mathematics and coding were somewhat emergent, influenced by patterns seen in data and reinforced through fine-tuning with human feedback.

Limitations of GPT-3.5 in Mathematical Contexts

Despite the impressive scale of GPT-3.5 (175 billion parameters) and the diversity of its training data, its capabilities, like those of all LLMs, are bounded by the quality and scope of its training data, which often includes both accurate and inaccurate information. This limitation particularly affected its performance in mathematical contexts, where rigorous logic and structured reasoning are required through multiple steps. Essentially, LLMs are not calculators—they are neural network prediction engines. This distinction is crucial in understanding why, despite some competence, GPT-3.5 was not particularly suited for university-level mathematics, even with later improvements (GPT-3.5-turbo). Unlike a calculator, which processes mathematical operations through precise algorithms, GPT-3.5 and all LLMs approach mathematical questions in the same way they handle any other query—by predicting the most likely next word or token based on their training data. This method means that the model generates answers to mathematical problems through the same mechanism it uses for generating all text, often leading to errors and inconsistencies for what counterintuitively may seem like trivial problems, such as counting or basic mathematical operations.

As a result, many students who initially experimented with GPT-3.5 developed a negative perception of the capabilities of LLMs broadly, but particularly in mathematics. This sentiment arose from their experience with the model's frequent inaccuracies and "hallucinations", leading to a lack of trust in its outputs. Consequently, this eroded their confidence in the broader potential of AI tools in academic settings, especially for tasks requiring precise and reliable answers, reinforcing the perception that AI might be fundamentally flawed for such applications. OpenAI did little to alleviate these concerns, offering only a generic warning at the bottom of all chats that "ChatGPT can make mistakes", which hardly provided the clarity or reassurance students needed to confidently use the tool in their academic work.

Despite these limitations and worries, the majority of students in the Maths and Stats department admitted to using these AI models for help with their assignments, with most relying on GPT-3.5 at the time. GPT-3.5 and its subsequent refinements have proven to be competent in many instances. Anecdotally, mathematics students have noted its ability to not only write effective code but also execute it, although, in reality, the model is merely predicting the output based on learned patterns. Similarly, it could often "guess" calculations correctly when prompted, but it was not performing actual internal calculations as a calculator would. This phenomenon, where the model produces responses that seem accurate or correct but are underpinned by fundamentally flawed reasoning, is known as a "hallucination". As a result, students frequently expressed discomfort and found it challenging to trust the model’s outputs, particularly in mathematical and technical contexts where precision is crucial.

WIHEA Warwick International Higher Education Academy

2. The Emergence of ChatGPT and Limitations of GPT-3.5

The Emergence of ChatGPT and Its Impact

Limitations of GPT-3.5 in Mathematical Contexts

Next: LLMs and the Evolution of AI

Previous: Introduction to the Project

References