Primary Supervisor: Professor Elliot Ludvig, Department of Psychology
Secondary supervisors: Emmanouil Konstantinidis
PhD project title: Reinforcement Learning in the Brain and Behaviour
University of Registration: University of Warwick
Humans and other animals are very efficient at learning from rewards in their environment and choosing accordingly. A popular approach for understanding how creature make such reward-based decisions is Reinforcement Learning (RL), a formalism from computer science that has been used to create artificially intelligent agents that have proven remarkably successfully at games, such as Chess or Go, and real-world problems, like navigation and protein folding.
RL models learn through trial and error. In animals, this error term is thought to be encoded by dopamine neurons, which project widely through the brain. Through this error signal, animals can learn the values of different options and outcomes, which are then encoded in the striatum and several areas of the frontal cortex. These RL models are particularly effective at predicting reward-based behaviours, including the time course of associative learning, the transition from goal-directed to habitual behaviour, and exploration during probabilistic reward learning (e.g., Miller et al., 2019).
This project will look to apply some of the newest developments in RL as a way of understanding new aspects of human decision-making. For example, in distributional RL, animals learn about the full distribution of possible outcomes, instead of simply the average value. The dopamine system does indeed seem to encode a variety of prediction errors that span the expected distribution (Dabney et al., 2020). The full implications of such a distributional code, however, for human behaviour has not been assessed. Another potential angle is recent work enhancing RL models with episodic memories—such models allow individual instances of past outcomes to greatly influence ongoing behaviour.
On the behavioural side, most work examining human decision-making focuses on the situation where people are explicitly told about the possible odds, outcomes, and delays for the rewarding options. When people learn from experience about the rewarding options, as an RL model would, their behaviour often differs substantially from when they are told explicitly. For example, in experience, when making risky choices, people are less sensitive to rare events, but more sensitive to extreme outcomes (e.g., Madan et al., 2019; Wulff et al., 2018). This fundamental gap between choosing based on described outcomes or personal experience has thus far eluded explanation with RL models.
This project will entail computational modelling, including the simulation of existing RL models, fitting those models to behavioural and neural data (from human or other animals), creating and refining new RL models, and potentially creating and running behavioural experiments with human participants to test those models. The exact set of behaviours in question will be driven by student interest, but recent work in the lab has focused on modelling habits, curiosity, exploration, and memory-based choice.
- Dabney, W., Kurth-Nelson, Z., Uchida, N. et al. (2020) A distributional code for value in dopamine-based reinforcement learning. Nature, 577, 671–675.
- Madan, C. R., Ludvig, E. A., & Spetch, M. L. (2019). Comparative inspiration: From puzzles with pigeons to novel discoveries with humans in risky choice. Behavioural Processes, 160, 10-19.
- Miller, K., Shenhav, A., & Ludvig, E. A. (2019). Habits without values. Psychological Review, 126, 292-311.
- Wulff, D. U., Mergenthaler-Canseco, M., & Hertwig, R. (2018). A meta-analytic review of two modes of learning and the description-experience gap. Psychological Bulletin, 144(2), 140.
BBSRC Strategic Research Priority: Understanding the Rules of Life: Neuroscience and behaviour
Techniques that will be undertaken during the project:
- Behavioural Testing with Humans
- Computational Modeling
- Developing and Coding Behavioural Experiments
- Statistical analysis of behavioural data
Contact: Professor Elliot Ludvig, University of Warwick