Skip to main content Skip to navigation

Shreya Sinha Roy

I am a 2nd year PhD student working under the supervision of Dr. Ritabrata Dutta, Dr. Richard Everitt, and Prof Christian Robert. My primary research interests are Computational Statistics and Reinforcement Learning. Currently, I am working on Bayesian Reinforcement Learning - exploring several Sampling Techniques for Model-based Reinforcement Learning.

Bayesian Deep Generative Reinforcement Learning:


Bayesian deep RL: The above diagram shows an episodic posterior update followed by a policy update routine for jth episode j=1,2, .... We have assumed the episode length to be tau, and pit denotes the prior on the model parameters. Prequential Scoring Rule (s) is used to compute the generalized posterior which is based on true interaction data(x) and a simulation(xt) from the deep generative model m. We obtain the samples, thetas from the generalized posterior, post via Sequential Monte Carlo (SMC) samplers. These samples are used to simulate n trajectories of interaction sim from the model m. Optimal policy is trained by maximizing the averaged value of q function computed from the n simulated trajectories. The new policy, mu is then used to interact with the true Environment in the next episode.

Publication:

Sinha Roy, S., Everitt, R., Robert, C., Dutta, R., "Bayesian Deep Generative Reinforcement Learning", to be submitted to NeurIPS, 2024.

Shreya Sinha Roy

Contact

Shreya.Sinha-Roy@warwick.ac.uk