PPO algorithm for Car Racing in OpenAI gym

This project implements the Proximal Policy Optimization (PPO) algorithm with PyTorch, applied to the CarRacing-v2 Box2D environment from the OpenAI Gym library.

DEMO

To run the code:

cd to the gym_car_racing file in terminal. then run this:

python3.10 main_car_racing.py

What is OpenAI Gym?

OpenAI gym is a research environment for reinforcement learning(RL) which was created by OpenAI. It prevent a simulation environment to make the researcher design, train or even test any of the RL algorithm more convenient.

Gym supports many kinds of classical control problem. For instance, Rotary Inverted Pendulum, Discrete Action Space(like CartPole, MountainCar). Gym even includes high level vision and continuous control mission(like Atari and MuJoCo).

The main idea of Gym is to emphasize the simplicity and scalability. It's API is extremely intuitive, it contains four main steps: initialize environment, reset, interaction(step) and render. Makes user focus on algorithm's develop and improvment since the developers do not need to spent time on handling the environment detail.

Why do this project?

Since OpenAI Gym provides well-resourced environment, it is not only be used in research. It can also be a fantastic environment to people which just started study RL.

However, in this project I use the CarRacing_v2 environment of gym. Using the Python language with Pytorch to create PPO algorithm with no library for the reinforcement learning.

Hope this project can make more people which is new to reinforcement learning some inspiration and provide a simple reference.

What is PPO algorithm?

In short, PPO (Proximal Policy Optimization) is a reinforcemnet learning structure base on Actor-Critic structure. Which was performed by OpenAI. PPO is build by two main network: Actor decides action policy, Critic evaluates the current policy.

Why use PPO in Car Racing?

In OpenAI Gym's Car Racing environment, the action space is continuous. Which means traditional discrete control algorithm (like DQN) is not suitable.

PPO can stably learn continuous control policy by the Actor-Critic structure. Although PPO has a well performance in high demention observe and stochatic environment, it is often be the algorithm for mission as CarRacing.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
plots		plots
tmp/ppo		tmp/ppo
.DS_Store		.DS_Store
README.md		README.md
main_car_racing.py		main_car_racing.py
ppo_torch.py		ppo_torch.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO algorithm for Car Racing in OpenAI gym

DEMO

To run the code:

What is OpenAI Gym?

Why do this project?

What is PPO algorithm?

Why use PPO in Car Racing?

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PPO algorithm for Car Racing in OpenAI gym

DEMO

To run the code:

What is OpenAI Gym?

Why do this project?

What is PPO algorithm?

Why use PPO in Car Racing?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages