Devising Strategies for multiple iterations of Newcomb's problem

Newcomb's Problem is an interesting problem that has caused great controversy in the field of decision theory. In Newcomb's Problem there are two boxes and two agents. One box (A) contains always contains a small amount of money (utility) while the other (B) contains either a large amount of money or nothing. Each round of the game, one agent, the selector, can either choose to take the contents of both boxes or just box B. Prior to each choice however, the other agent, the predictor, attempts to predict the choice of the selector. A prediction of both boxes means that nothing is placed in box B while a prediction of choosing a single box means the large amount is placed in box B instead.

The aim of the selector is to acquire as much money as possible, while that of the predictor is to simply most accurately predict the choices of the selector. Thus this game provides an interesting case for applied game theory. This is because understanding the game is very easy, but identifying the optimal strategies for the selector and the predictor across a number of games is nontrivial. This project would like benefit from reinforcement learning, and thus it is possible that this WARP could be associated or integrated with the reinforcement learning WARP.