Critically Analysing Outputs

Home
1.Formal Report
2. Student Conversations
- 2.1 Exploring Student Survey on AI
  - 2.1.1 Ethical Concerns
  - 2.1.2 Impact on Academic Standing and Degree Value
  - 2.1.3 Student Attitudes and Acceptance of AI
  - 2.1.4 Integration of AI in Academic Assignments
  - 2.1.5 Usage Patterns and Trends
  - 2.1.6 Concerns and Future Directions
- 2.2 Student Perspectives on AI in Education
  - 2.2.1 Experiences and Attitudes Towards AI
  - 2.2.2 Ethical Considerations and Academic Integrity
  - 2.2.3 Impact on Learning and Skill Development
  - 2.2.4 The Future of AI in Education: Hopes and Fears
  - 2.2.5 Recommendations for AI Integration
3. How ChatGPT Performed on University-Level Work
- 3.1 Evaluating the Proficiency of Generative Artificial Intelligence in University-Level Mathematics and Statistics Problem-Solving
- 3.2 Assessment of Generative AI Answers From Module Leaders
4. Suggested Changes and Future Direction of Regulations
5 Opportunities AI Presents
- 5.1 Critically Analysing Outputs
6 Tips For Markers on Spotting Potential AI Usage

Case Study: Critically Analysing Outputs

Below is an example of an LLM output from the analysis of LLM's ability to answer mathematics and statistics questions. This is a question from first-year statistics.

Question

The LLM was asked the question above and gave the output seen on the right.

A student could be given the questions and the LLM output and then asked to critically evaluate the LLM output. Critically evaluating an LLM's output is not just checking if it is right but also evaluating the logic (approaches taken and steps).

LLM Output

Below is an example of a critical evaluation of the LLM output above.

For question 1, the first thing that should stand out is the fact the probability given is 25/12, which is wrong as it is bigger than 1, and also approximates 25/12 as 0.6944 which is lower than 1. The approach to calculating the probability is incorrect, it appears to be attempting to evaluate the probability of one dice being any value and the other two being any value other than the first, but this does not consider the fact that one dice needs to be a unique highest value.

For question 2, the error of the probability of a unique highest number is carried over. It does correctly identify to use of the complement of the event to calculate the probability of no unique highest number occurring, but 1 - 25/12 = -13/12 not 11/36, which is not a feasible value. Whilst it did correctly state that for there to be no unique highest number for n trials, it is the probability of no unique highest number occurring to the power n, it did not identify the fact that it was using the independence of the events as asked to in the question. The steps taken were large and key information is missing.

For question 3, it appears to have taken the probability of a unique highest number occurring now as 25/36, by calculating the complement of the event that no unique highest number occurs, which was given as 11/36. The geometric distribution is not covered in the module so is not an appropriate method to use. It would have been more appropriate to use the independence of the events to show that the joint probability is equal to the product of the probabilities. Each trial where no unique highest number is obtained is the same, this can be re-written as the probability of no unique highest number occurring in the power n-1.

For question 4, the probability of not all the same is correct but includes the possibility of a unique highest value occurring, hence the final answer is wrong. The logic is correct but is very poorly explained, with large steps missing and not stating definitions and theorems not stated where applied as would be expected.

For the answers as a whole, for a lot of steps the jump taken from one step to the next was too large and more steps/detail is required. Further to this the question stated to make clear where independence was used which it was not. Additionally, definitions and theorems used were not identified when they were being used. The notation, logic and layout of the answer are not of the standard of the module or a university student.

Additionally, a student should be able to answer the question correctly and appropriately to the module at an undergraduate level. This could be given as an additional question.