EE 675A Course Project

An implementation of the paper $RL^2$: Fast Reinforcement Learning via Slow Reinforcement Learning is available here. We used GRUs to train our model, and the policy is optimized using a simple actor-critic algorithm, with the actor being the RNN agent and the baseline/critic network being a simple MLP. The RNN agent is trained with 20,000 instances of bandit environments each lasting for different number of episodes. We also test the performance of our said model with other State of the Art algorithms available, and found that the algorithm sufferes with increasing dimensionality of the problem.

To reproduce the results following dependencies are required:

torch
numpy
matplotlib
gym

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitattributes		.gitattributes
10 bandits, 10 epi.png		10 bandits, 10 epi.png
10 bandits, 100 epi.png		10 bandits, 100 epi.png
5 bandits, 10 epi.png		5 bandits, 10 epi.png
5 bandits, 100 epi.png		5 bandits, 100 epi.png
50 bandits, 10 epi.png		50 bandits, 10 epi.png
50 bandits, 100 epi.png		50 bandits, 100 epi.png
CP.ipynb		CP.ipynb
Implementation.jpg		Implementation.jpg
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EE 675A Course Project

To reproduce the results following dependencies are required:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EE 675A Course Project

To reproduce the results following dependencies are required:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages