Skip to main content Skip to navigation

Warwick Annual Retreat Projects 2017

The Warwick Anuual Retreat Projects are back! The Warwick Anuual Retreat Projects, or WARPS as the cool crowd calls them, are study group style projects that we will work on during the retreat. The aim of these projects is to foster a collaborative atmosphere within the department, particularly between year groups. This format was debuted last year, and we hope that this year it will be as enjoyable and informative as the previous one.

If you wish to propose a project then please add a brief description of the problem you want to tackle, as a topic in the forum below. We hope that we can start a discussion on these problems before the retreat starts. Hence, others can express their interest and give comments on the projects. The deadline for project proposals is 7th April 2017.

If you like any of the proposed projects and want to join the group working on it, please specify your preference below, so we can allocate the project groups before the retreat starts. If you hate all of the projects below, then please propose one you like! This time we want to allocate projects before the start of the retreat so we allow for more time to work on the projects.


Warwick Annual Retreat Projects 2017 Some Reinforcment Learning Problem

You need to be logged in to post in this topic.
  1. I am quite interested in learning more about Reinforcement Learning. Reincforcement Learning is a field that combines machine learning and game theory (link). I think it will be a good idea to explore the basics of reinforcement learning during the retreat, in the form of a simple (but cool!) project. At the moment, all I have in mind is the multi-armed bandit problem, which Sophie kindly sent me a few resources on. However, I am open to any suggestions for other interesting problems, provided they fit the timescale allocated to the WARPS (i.e. approx 7 hours). I want to start the conversation about possible projects on this forum or in person in the department, so we can have a clear idea on what to do during the retreat.

     

    Looking forward for suggestions!

     
  2. I found there RL environemts we can try out:

    • OpenAI Gym: this seems like the easiest and most traight forward to use.
    • DeepMind Lab: this one focuses on 3D navigation puzzles.
    • Project Malmo: this one focuses on building RL agents for Minecraft.

    We could either try to play around with one of these, or come up with our own problem to solve.

     
  3. Ehi Ayman, we were thinking together with Giovanni of trying to teach a very simple game to a computer. The game is Spoof, a drinking game where everyone picks a number of coin from 0 to 3 and in turn people try to guess the total amount available. Game should be simple enough, but at the same time the incomplete information components makes it very interesting!

     
  4. It should be a feasible/interesting task for reinforcement learning!

     
  5. This sounds good!I am guessing the game isn't too difficult to code up? We could try it and see how we go, if it is an easy problem, we can look at other games like it to try. I think this should be fun and we can learn a lot since RL and Game Theory are share a lot of ideas.

     
  6. I have added a WARP topic about Newcomb's problem, which is potentially a reinforcement learning problem.  It's possible that we could integrate the two WARPs, depending what is decided for this WARP.

     

     
  7. Hey Robert, this was a very interesting read! Thanks for sharing this! 

    On the nose of it, it feels that the any paradoxical result rests on the definition of the game and I would tend to side with those claimnig that its ill posedeness is the source of paradox. But even if that's not the case, it seems that it might be a defining exactly what kind of game should be played. I'd say we'd like to keep this as tractable as possible so that we can code something and get some results in the short time available at the retreat. I might be wrong though and would love to chat about this at the retreat.

    Also, I wanted to add: a summary of the game of spoof can be found here

    Discussin with Ayman we might actually go the way of making the guessing simultaneous as a starter to make it even easier.

     
  8. *it might be hard definng exactly...

    Apologies, I like my English like I like my operative system: obscure and unfriendly to the user.

     
  9. *defining

    This is getting out of hand quickly.

     
  10. In this post Gian Lorenzo Spisso wrote:

    Hey Robert, this was a very interesting read! Thanks for sharing this! 

    On the nose of it, it feels that the any paradoxical result rests on the definition of the game and I would tend to side with those claimnig that its ill posedeness is the source of paradox. But even if that's not the case, it seems that it might be a defining exactly what kind of game should be played. I'd say we'd like to keep this as tractable as possible so that we can code something and get some results in the short time available at the retreat. I might be wrong though and would love to chat about this at the retreat.

    Also, I wanted to add: a summary of the game of spoof can be found here

    Discussin with Ayman we might actually go the way of making the guessing simultaneous as a starter to make it even easier.

     

    Yes, often the problem has often been ill-posed.  My intention would be to avoid the philosophical confusion by avoiding having a "perfect" predictor and simply having the predictor as another agent with their own strategy who makes a secret prediction.  Reading up on spoof, I would say that is a similar type of problem, but with a greater number of variables (it seems like there should be a way to make two-player spoof equivalent to the Newcomb problem, the number of coins you put in your hand is like the prediction stage).

     
  11. Hi all,

    Found a theoretic paper (Schwartz, 1959) on the solution to 2-player spoof (generally with n coins each player, all other rules the same).

    I have attached it in case you want to have a look.

    Otherwise, here is the keypoints you may want to take home (with potentially any possible misunderstanding or intrinsic bias of mine):

    1. it is a fair game.

    2. both players would play mixed strategies and they are optimal as if you try to change it, you would suffer from loss (in the sense of expectation).

    3. the proof entirely uses the very classical (old-fasioned) game theoretical approach, which is just listing all possible strategies with payoff functions.

    At least, hopefully you could see something similar in the convergent of your AI's strategies in the game of two players.

    1 attachment

     
  12. It's great you've found this, was quite interesting to read. It is also nice that we basically got all the feature of the game right (altough we were a bit reinventing the wheel). Player 1 seemed quite obvious, but apparently player 2 has several mixed strategies he could follow. We would expect any computer learner program to be able to settle down in the optimal strategy for player 1 and on one of the mixed strategies of player 2. 

    It is interesting how the author mentions that Player 2 is the advantaged one (as he can choose more then one strategy which might yield some advantage if P1 makes a mistake (at least that is my understanding of its final comment).

    It would be interesting to see if our simple learner can:

    1) Actually obtain the optimal strategies

    2) Win consistently against a human

    Since we have to present these results tomorrow:

    1) I can lay down today an outline of the content of the slides

    2) Can someone else put it into some slide format?

    3) Robert: is the code up and running? Do you think by tomorrow it would be feasible to have a run of the two players playing against each other?

     

Are you sure?

Are you sure?

Forum followers

Follower data is not currently available.

Search results

Details:

  • The problem should not be part of your PhD!
  • The problem proposed should ideally be simple so that someone without domain specific knowledge could contribute.
  • Approximately 7 hours are allocated for these projects so progress should be possible within this time-frame.
  • Groups will be expected to give an informal summary during the forum session on the Wednesday following the retreat (13:00, 10th May).
  • If you propose a project that involves working with data, please make sure you have a copy of the data on a physical drive in order to save time from downloading on slow wifi.