Shreya Sinha Roy
I am a 4th year PhD student working under the supervision of Dr. Ritabrata Dutta, Dr. Richard Everitt, and Prof Christian Robert. My primary research interests include computational statistics, Bayesian inference, sampling for high-dimensional parameter spaces, and reinforcement learning.
Bayesian Deep Generative Reinforcement Learning:

Bayesian deep RL: The above diagram shows an episodic posterior update followed by a policy update routine for th episode 
. We have assumed the episode length to be 
, and 
 denotes the prior on the model parameters. Prequential Scoring Rule (
) is used to compute the generalized posterior which is based on true interaction data(
) and a simulation(
) from the deep generative model 
. We obtain the samples, 
 from the generalized posterior, 
 via Sequential Monte Carlo (SMC) samplers. These samples are used to simulate 
 trajectories of interaction 
 from the model 
. Optimal policy is trained by maximizing the averaged value of 
 function computed from the 
 simulated trajectories. The new policy, 
 is then used to interact with the true Environment in the next episode.
Publication:
Sinha Roy, S., Everitt, R., Robert, C., Dutta, R. (2024) Generalized Bayesian deep reinforcement learning, arXiv preprint arXiv:2412.11743Link opens in a new window

Contact
Shreya.Sinha-Roy@warwick.ac.uk