Richard Fox
Research Interests
My interests lie in modelling the emergent behaviour of humans, this involves using reinforcement learning (RL) techniques to model the individual decision making process for performing actions and its effects on the emergent behaviour. In particular, combining elements from social science, game theory and multi-agent RL to look for new solutions to/descriptions of observed phenomena in traffic scenarios that pertain to pedestrians and autonomous vehicles, i.e the human-AI interface.
Supervised by Prof. Elliot Ludvig
Masters Project
Model Comparison for the Two Stage Decision Task - Presented here are several reinforcement learning models that propose to model sequential action choice behaviour, based on explanatory theory from neuroscience and psychology. Three recent models are focused on, each of which builds on the others. However, none of these models perform particularly well, with likelihoods of ∼ O(10 −80 ), neither do they outperform each other with any significance. There are a multiplicitude of factors that can contribute to their performance, which are explored from computational underpinnings, to form of the likelihood estimation function |
PhD Projects
Implicit Agent Modelling for Explicit Others - There are several ways to induce the other agent inference we are looking for, but there are 3 main concepts: - Introduce rest/idle time penalties, as opposed to time penalties for taking long routes, this can be used to incentivise the agent to fill their time productively until all agents have completed their task |
PaAVI - Pedestrian and Autonomous Vehicle Interactions - Much work has been done concerning autonomous vehicle and human driver interactions, and the need for intrinsic models or at least categorisations of human behaviour have been shown to improve performance. We are proposing to look at the human autonomous vehicle (AV) interface instead through interactions with pedestrians in non controlled settings. Taking the form of an AV controlled by an algorithm that has used a form of reinforcement learning to develop a method for navigating pedestrian-road intersections. We surveyed 136 participants on the behaviour and safety of a Deep Q Network model trained in the SMARTS simulator and then given slight variations of a reward filter that negatively impacted reward based on proximity to the pedestrian. Analysis of results currently ongoing. |
In uncontrolled situations there is an interplay between strategy and safety constrained optimisation. I.e. an algorithm that puts the safety of others first is vulnerable to strategic exploitation; if it is guaranteed to give way or halt at a certain risk factor, then a pedestrian can exploit this by raising the risk factor, as perceived by the AV, and gaining right of way consistently. Whilst also posing no risk increase to themselves as the AV is guaranteed to allow them to pass. We propose to model this as a Stackelberg game where the AV would be the leader and "announce" their action, assuming that the pedestrian will respond in the way that benefits them most. Within this paradigm there still exists a couple of problems to be addressed. By extending the simulator to allow participants to interact in the scenario as the pedestrian, we hope to gather data on the degree of exploitative behaviour and discrepancies from reported actions to actions actually taken. |
Future Work |
Contact |
|
Research Interests |
|
|
Undergraduate |
|