You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An implementation of the paper $RL^2$: Fast Reinforcement Learning via Slow Reinforcement Learning is available here. We used GRUs to train our model, and the policy is optimized using a simple actor-critic algorithm, with the actor being the RNN agent and the baseline/critic network being a simple MLP. The RNN agent is trained with 20,000 instances of bandit environments each lasting for different number of episodes. We also test the performance of our said model with other State of the Art algorithms available, and found that the algorithm sufferes with increasing dimensionality of the problem.
To reproduce the results following dependencies are required:
torch
numpy
matplotlib
gym
About
An implementation of Meta RL submitted as a course project for the course EE675A (Introduction to Reinforcement Learning)