Skip to main content Skip to navigation

Formal Report

Exploring the Use of AI in Mathematics and Statistics Assessments: At a Glance

1. Introduction to the Project

The project explores the impact of advanced AI technologies, specifically Large Language Models (LLMs) like GPT-4o, on mathematics and statistics assessments in higher education. The project aims to understand how AI can assist in education while maintaining academic integrity. Read more...

2. ChatGPT and Limitations

The launch of ChatGPT in November 2022 significantly transformed academic environments, enabling students to more easily incorporate AI into their assignments. However, this shift also raised concerns about academic integrity and fostered a general distrust of AI models like GPT-3.5, particularly due to its unreliability in solving advanced mathematical problems. Read more...

3. LLMs and Evolution

LLMs, such as GPT-3.5 and GPT-4, use pattern prediction rather than mathematical computation, leading to errors in complex tasks. Over time, scaling and algorithmic advancements, including GPT-4o’s release, have improved their capabilities. Read more...

4. Capabilities and System 2 Thinking

GPT-o1 marked a significant advancement by integrating “System 2” thinking, enabling models to reason through complex problems before responding. This offers improved performance for more difficult tasks like advanced mathematics and coding. Read more...

5. A Step Change in AI Capabilities and Key Findings

The introduction of ChatGPT o1 represents a pivotal moment in AI development, especially for mathematical and technical tasks. By incorporating advanced reasoning capabilities, o1 overcomes limitations of prior models, enabling complex, multi-step reasoning processes. This shift necessitates further re-evaluation of assessment methods to maintain academic integrity while leveraging AI's benefits.

The study revealed significant findings regarding students' understanding of AI capabilities and ethical concerns. While many students use AI tools, there is widespread apprehension about academic integrity. The findings highlight the need for clear guidelines, education on ethical AI use, and open dialogue within academic institutions. Read more...

6. Performance, Urgency, and Action

AI integration requires universities to re-evaluate assessment methods. Traditional exams may not be enough to prevent misuse. Institutions need to shift from avoidance to thoughtful incorporation of AI in learning environments, promoting AI literacy and ethical awareness. Read more...

7. The Problem and Regulations

Institutions must recognise AI’s capabilities and develop clear policies. It's essential to implement guidelines around AI use, define what is permissible, and educate students and faculty on ethical AI practices. A collaborative approach is necessary to address the challenges effectively. Read more...

8. Recommendations and Conclusion

The report provides recommendations around AI in education, focusing on three core principles: verification, transparency, and ownership. Clear regulations and proactive measures can help balance AI's benefits with maintaining academic integrity. Read more...

References

  1. OpenAI. (2022). Introducing ChatGPT. Retrieved from https://openai.com/index/chatgpt/Link opens in a new window
  2. Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2023). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv. https://doi.org/10.48550/arXiv.2311.05232Link opens in a new window
  3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. arXiv. https://arxiv.org/abs/1706.03762Link opens in a new window
  4. OpenAI. (2023). GPT-4: OpenAI’s Most Advanced System. Retrieved from https://openai.com/index/gpt-4/Link opens in a new window
  5. OpenAI. (2024). New Models and Developer Products Announced at DevDay. Retrieved from https://openai.com/index/new-models-and-developer-products-announced-at-devday/Link opens in a new window
  6. OpenAI. (2024). Hello GPT-4o: Our New Flagship Model. Retrieved from https://openai.com/index/hello-gpt-4o/Link opens in a new window
  7. Meta AI. (2024). Introducing Llama 3.1: Our Most Capable Models to Date. Retrieved from https://ai.meta.com/blog/meta-llama-3-1/Link opens in a new window
  8. OpenAI. (2024). Code Interpreter Beta. Retrieved from https://platform.openai.com/docs/assistants/tools/code-interpreterLink opens in a new window
  9. Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv. https://arxiv.org/abs/2312.10997Link opens in a new window
  10. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv. https://arxiv.org/abs/2201.11903Link opens in a new window
  11. OpenAI. (2024). Introducing OpenAI o1-preview: A New Series of Reasoning Models for Solving Hard Problems. Retrieved from https://openai.com/index/introducing-openai-o1-preview/Link opens in a new window
  12. Tao, T. (2024). Experiments with GPT-o1. Retrieved from https://mathstodon.xyz/@tao/113132502735585408Link opens in a new window
  13. Altman, S. (2024). The Intelligence Age. Retrieved from https://ia.samaltman.com/Link opens in a new window