3. Understanding LLMs and Evolution of AI Models

Understanding How LLMs Work

To further illustrate the nuances of how Large Language Models like GPT-3.5 answer questions, consider the analogy of System 1 and System 2 thinking from psychology. LLMs operate almost like System 1 thinking—constantly generating text based on the input they receive. They process this input by converting each token (which can be loosely thought of as a sub-word) into vectors and then compute through their neural network weights to predict the next token in the sequence.

After predicting one token, the model repeats this entire process to predict the next token, using the previous output as part of the new input. This cycle of generating, processing, and predicting continues with each successive token, ensuring that each word in a sentence is interconnected and draws from the context established by all previous tokens. The model’s attention mechanism plays a crucial role here, allowing it to dynamically focus on different parts of the input text to maintain coherence and relevance across longer sequences.

This repetitive process, driven by the model's learned patterns from the training data, enables the LLM to generate coherent and contextually appropriate text across a range of topics. However, this same mechanism also explains why LLMs can struggle with tasks requiring precise reasoning, such as complex mathematical calculations. Since the model approaches every problem through probabilistic text prediction rather than deterministic computation, it often leads to errors and inconsistencies, particularly in domains where accuracy is critical.

You can think of an LLM's intelligence as being in a vacuum that the human must guide. Each prompt you provide samples a probability distribution within the model’s weights, and it is up to the user to explore and determine whether the output matches their needs and use case. It is understandable that students struggled with these nuances, especially since GPT-3.5 was the first capable and widely available general-purpose chatbot that was not rule-based, making it unparalleled in comparison, especially for academic applications both practically and ethically.

For example, it is not uncommon for LLMs to be unaware of their own capabilities and limitations. Due to reinforcement learning from human feedback, they often exhibit a strong tendency to prioritise pleasing the user over providing accurate responses. Unlike a human expert, who might push back against flawed ideas or suggestions, LLMs may struggle to do so, making it crucial for the user to critically evaluate the model's outputs. This is particularly important because the models themselves do not inherently understand their limitations or the context of their responses beyond what is provided by the user and OpenAI in the system prompt.

Evolution of AI Models

AI models have evolved through two primary paradigms: scaling up the model's size and enhancing algorithmic efficiencies.

1. Scaling Up

The approach of scaling up focuses on increasing the number of parameters within a model. This method enhances the model's capacity to capture and represent complex patterns within the data. The larger the model, the more detailed and nuanced its understanding of the data can be. For instance, GPT-4, released in March 2023, is speculated to have around 1.2 trillion parameters (8 x 220 billion), though not all are used simultaneously, as the Mixture of Experts (MoE) architecture dynamically selects which "experts" or parameters are active based on the task at hand. This architecture allows GPT-4 to handle complex reasoning tasks and process large amounts of contextual information more effectively than smaller models. In contrast, most commercial AI models typically operate with far fewer parameters, with 70 billion being a common upper limit. Despite this, open-weight models like LLaMA 3.1, which has 405 billion parameters, have demonstrated competitive GPT-4-level performance. However, these models require substantial hardware resources to run independently. Although 70-billion-parameter models could feasibly be run offline locally by students with as little as 64 GB of RAM, performance would suffer due to quantisation, and it is unlikely that many students have access to the necessary hardware, as VRAM is crucial for optimal inference speeds, and RAM alone is slower.

2. Enhancing Algorithmic Efficiencies

Enhancing algorithmic efficiencies involves optimising the way models process and generate information, allowing them to achieve high performance without necessarily increasing their size. In many cases, these techniques are used to train smaller models that can deliver similar performance to larger ones. Models such as GPT-4o and GPT-4-Turbo have incorporated these efficiencies by leveraging advancements like synthetic data, optimised training processes, and improved computational and fine-tuning methods. These improvements have led to faster processing times and more accurate results, all while keeping the model size manageable.

Due to the scale of these models, GPT-4 level models were typically offered as a paid service at $20 per month. However, with the application of these algorithmic efficiencies, GPT-4o was made available to free users with a trial, democratising access to advanced AI capabilities. Similarly, GPT-4o mini, which embodies these algorithmic advancements, delivers impressive performance despite its presumably smaller parameter size.

It is important to note that GPT-4o was the most advanced model available during our study and was the one used to assess AI performance on assignments.

References

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. arXiv. https://arxiv.org/abs/1706.03762Link opens in a new window
OpenAI. (2023). GPT-4: OpenAI’s Most Advanced System. Retrieved from https://openai.com/index/gpt-4/Link opens in a new window
OpenAI. (2024). New Models and Developer Products Announced at DevDay. Retrieved from https://openai.com/index/new-models-and-developer-products-announced-at-devday/Link opens in a new window
OpenAI. (2024). Hello GPT-4o: Our New Flagship Model. Retrieved from https://openai.com/index/hello-gpt-4o/Link opens in a new window
Meta. (2024). Introducing Llama 3.1: Our Most Capable Models to Date. Retrieved from https://ai.meta.com/blog/meta-llama-3-1/Link opens in a new window

WIHEA Warwick International Higher Education Academy