A curated list of reinforcement learning resources, including books, courses, topics, repos, websites, communities, research groups, and modern RL directions such as offline RL, world models, RLHF, and agent RL.
Note:
- I keep the original structure of this repository as much as possible.
- Legacy entries are preserved whenever possible, even if some are old.
- Newer items are added into the existing structure instead of replacing it.
-
English
- Reinforcement Learning: An Introduction [Book] [Code] [Preferred] [old version] [newest version]
- Algorithm of Reinforcement Learning [Official]
- OpenAI Spinning Up
- Reinforcement Learning for Sequential Decision and Optimal Control
- Dynamic programming and optimal control
- Deep-Reinforcement-Learning-Hands-On [pdf 2 edition]
- Reinforcement Learning and Optimal Control
- Reinforcement Learning: Theory and Algorithms
- Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin Puterman.
- Neuro-Dynamic Programming, by Dimitri Bertsekas and John Tsitsiklis.
- Multi-Agent Reinforcement Learning: Foundations and Modern Approaches
- Safe Reinforcement Learning
- Probabilistic Machine Learning: Advanced Topics (useful for model-based RL and sequential decision making)
-
Chinese
- 动手学强化学习 张伟楠
- 深度强化学习落地指南 魏宁
- 深度强化学习 王树森 [PDF]
- EasyRL 强化学习教程
- 强化学习实战: 强化学习在阿里的技术演进和业务创新 笪庆 [PDF]
- 深度强化学习 董豪 [PDF]
- 深入浅出强化学习 郭宪 [PDF]
- 神经网络与深度学习 邱锡鹏 [PDF]
- 机器学习 周志华 [PDF]
- 统计强化学习: 现代机器学习方法 杉山将 [PDF]
- 深度强化学习核心算法与应用 陈世勇 [PDF]
- 深度强化学习边做边学 小川雄太郎 [PDF]
- 强化学习 邹伟 [PDF]
- 强化学习精要: 核心算法与TensorFlow实现 冯超 [PDF]
- 强化学习入门: 从原理到实践 叶强 [PDF]
- 强化学习与决策控制相关中文教材(待补充)
note: 作者均只列举第一人
- UCL. Reinforcement Learning. David Silver. Difficulty: [★]
- UCL. Advanced Topics. David Silver.
- Tencent. Reinforcement Learning. MoFan. Difficulty: [★]
- National Taiwan University. DRL. Hung-Yi LEE. [Preferred]. Difficulty: [★]
- Deep Reinforcement Learning. Shusen Wang. [Bilibili]
- UCLA. Intro to Reinforcement Learning. Bolei Zhou. Difficulty: [★]
- UC Berkeley CS294 (before), CS285 Sergey Levine
- Stanford CS234 RL Emma Brunskill [Bilibili] [Official]
- MIT RL Dimitri Bertsekas
- RL and control THU
- CMU Deep Reinforcement Learning Katerina Fragkiadaki [Link]
- Udacity
- Lex Fridman
- ETHz Dynamic Programming and Optimal Control Raffaello D'Andrea
- Pieter Abbeel
- 高级机器学习 唐杰
- 李升波
- UIUC, CS 542, CS 443, Nan Jiang.
- R. Srikant. UIUC ECE 586.
- Ron Parr. Duke CompSci 590.2.
- Ben Van Roy. Stanford MS&E 338.
- Ambuj Tewari and Susan Murphy. U Michigan STATS 710.
- Susan Murphy. Harvard Stat 234.
- Alekh Agarwal and Alex Slivkins. Columbia COMS E6998.001.
- Daniel Russo. Columbia B9140-001.
- Shipra Agrawal. Columbia IEOR 8100.
- Emma Brunskill CMU 15-889e.
- Philip Thomas. U Mass CMPSCI 687.
- Michael Littman. Brown CSCI2951-F.
- NJU. IntroRL. Yang Yu.
- CMU 16 745
- CSE 691 asu
- UCLA, Reinforcement Learning of Large Language Models, Spring 2025 Ernest K. Ryu
- Berkeley CS285 Deep RL
- OpenAI Spinning Up Education
-
Approximate Dynamic Programming (ADP) concerns obtaining approximate solutions to large planning problems, often with the help of sampling and function approximation. Many ADP methods can be considered as prototype algorithms for popular value-based RL algorithms used today, especially in the offline setting, so it is important to understand their behaviors and guarantees.
-
- Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
- Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
- Hybrid rl: Using both offline and online data can make rl efficient
-
- Batch-constrained learning / support constraint
- Conservative value learning
- Sequence modeling for decision making
- Off-policy evaluation and selection under dataset shift
-
-
- Cooperative MARL
- Mixed cooperative-competitive learning
- Value decomposition and factorization
- Emergent communication and social learning
- Population-based training and league systems
-
How to estimate the performance of a policy using data collected from a different policy? This question has important implications in safety and real-world applications of RL.
-
- Latent dynamics modeling
- Planning with learned models
- PETS / MBPO / Dreamer / MuZero / TD-MPC line
- Simulators, control, and long-horizon rollout stability
-
- Policy gradient / actor-critic
- PPO / TRPO / natural policy gradient
- Entropy regularization and trust region methods
- Credit assignment and variance reduction
-
- Return distribution modeling
- Risk-sensitive RL
- Quantile-based value estimation
-
- Constraint satisfaction under uncertainty
- Shielding, recovery, and risk control
- Safe exploration
-
- Options framework
- Skill discovery
- Temporal abstraction
-
- Intrinsic motivation
- Curiosity-driven exploration
- Count-based and uncertainty-aware methods
-
- Locomotion and manipulation
- Sim2real
- MPC + RL
- Industrial control and autonomous driving
-
- Long-term user value
- Slate recommendation
- Dynamic pricing, scheduling, routing, resource allocation
-
- RLHF / RLAIF / constitutional preferences
- DPO / IPO / ORPO / preference optimization family
- Verifier-based RL
- Code, math, reasoning, and tool-use RL
- Web agents, environment interaction, and long-horizon agent training
-
- Sample complexity
- Regret minimization
- Bellman rank, function approximation, and generalization
- Partial observability and identifiability
- rlcode
- Deep-Reinforcement-Learning-Algorithms-with-PyTorch
- Deep-reinforcement-learning-with-pytorch
- reinforcement-learning [most stars]
- Stable-Baselines3
- Google Dopamine
- Intel Coach
- CleanRL
- RLlib
- Acme
- Tianshou
- DI-engine
- PettingZoo
- Gymnasium
- D4RL
- PyMARL
- MARLlib
- Sample Factory
- TRL
- OpenRLHF
- verl
- Mujoco
- robosuite
- ManiSkill
- PaperWithCode
- 从零开始推导贝尔曼方程
- 强化学习知识大讲堂 郭宪
- 智能单元
- 神经网络与强化学习
- 强化学习基础David Silver笔记 陈雄辉
- 博客园 刘建平
- Farama Foundation
- OpenDILab
- Lilian Weng Blog
- Spinning Up
- AI Alignment Forum
- BAIR Blog
- Tencent AIArena
- AWS DeepRacer
- MineRL Competition
- NetHack Challenge
- Kaggle / KDD / RecSys related sequential decision competitions
- Quant
- Gaming
- Optimization
- Recommendation
- LLMs
- Distribution
- Robotics
- Autonomous Driving
- Recommender System / Ads
- Operations Research
- Alignment / Agent / Tool Use
Conference: NIPS, ICML, ICLR, AAAI, IJCAI, AAMAS, IROS, CoRL, RSS, etc.
Journal: JMLR, JAIR, JAAMAS, TMLR, etc.
- Asia
- CASIA
- Haifeng Zhang [Homepage] [Group]
- Zhiqiang Pu [Homepage]
- Dongbin Zhao [Homepage]
- Junliang Xing
- NJU
- LAMDA Group [Group]
- Yang Yu [Homepage]
- Yinghuan Shi [Homepage]
- Yang Gao [Homepage]
- Zongzhang Zhang [Homepage]
- SJTU
- APEX Data and Knowledge Management Lab [Group]
- Yong Yu [Homepage]
- Weinan Zhang [Homepage]
- Kai Yu
- Ying Wen [Homepage]
- PKU
- PKU Alignment / Multi-Agent RL and Decision-Making
- Yaodong Yang [Homepage]
- Zongqing Lu [Homepage]
- Hao Dong [Homepage]
- Zhihua Zhang [Homepage]
- THU
- Chongjie Zhang
- Yi Wu [Homepage]
- Shengbo Li
- USTC
- Feng Wu [Homepage]
- Houqiang Li [Homepage]
- CUHK-Shenzhen
- Baoxiang Wang [Homepage]
- Hongyuan Zha [Homepage]
- CUHK
- Baoxiang Wang [Homepage]
- TJU
- Jianye Hao
- SIAT
- Yunduan Cui [Homepage]
- HIT-SZ
- Yanjie Li [Homepage]
- NTU Singapore
- Bo An [Homepage]
- SYSU
- Chao Yu [Homepage]
- CASIA
- North America
- McGill University
- Doina Precup [Homepage]
- Joelle Pineau [Homepage]
- University of Alberta / Amii
- RLAI Lab [Group]
- Michael Bowling [Homepage]
- Richard Sutton [Homepage]
- Martha White [Homepage]
- Adam White [Homepage]
- UCLA
- Bolei Zhou [Homepage]
- MIT
- Improbable AI Lab [Group]
- Robot Locomotion Group [Group]
- Pulkit Agrawal [Homepage]
- Leslie Kaelbling [Homepage]
- Russ Tedrake [Homepage]
- Nicholas Roy [Homepage]
- Dimitri Bertsekas [Homepage]
- CMU
- Geoffrey Gordon [Homepage]
- Jeff Schneider [Homepage]
- Andrew Moore [Homepage]
- Jessica K. Hodgins [Homepage]
- Wen Sun [Homepage]
- UC Berkeley
- Berkeley AI Research [Group]
- RAIL Lab [Group]
- AUTOLAB [Group]
- Sergey Levine [Homepage]
- Pieter Abbeel [Homepage]
- Anca Dragan [Homepage]
- Ken Goldberg [Homepage]
- Stuart Russell [Homepage]
- Stanford University
- Stanford AI Lab [Group]
- IRIS Lab [Group]
- ILIAD Lab [Group]
- Benjamin Van Roy [Homepage]
- Emma Brunskill [Homepage]
- Mykel Kochenderfer [Homepage]
- Dorsa Sadigh [Homepage]
- Tengyu Ma [Homepage]
- Chelsea Finn [Homepage]
- Andrew Ng [Homepage]
- UIUC
- Nan Jiang [Homepage]
- Duke University
- Ronald Parr [Homepage]
- Brown University
- Michael Littman [Homepage]
- Columbia University
- Daniel Russo [Homepage]
- Shipra Agrawal [Homepage]
- Alekh Agarwal [Homepage]
- Alex Slivkins [Homepage]
- University of Toronto
- Jimmy Ba [Homepage]
- Sheila McIlraith [Homepage]
- UT Austin
- Learning Agents Research Group [Group]
- Peter Stone [Homepage]
- UMass Amherst
- Autonomous Learning Laboratory [Group]
- Philip Thomas [Homepage]
- Scott Niekum [Homepage]
- McGill University
- Europe
- INRIA
- Flowers Team [Homepage]
- ETH Zurich
- Learning and Adaptive Systems Group [Group]
- Andreas Krause [Homepage]
- University of Oxford
- Foerster Lab [Group]
- WhiRL Lab [Group]
- Jakob Foerster [Homepage]
- Shimon Whiteson [Homepage]
- University of Cambridge
- Machine Learning Group [Group]
- Carl Edward Rasmussen [Homepage]
- Imperial College London
- Robot Learning and Control
- Edward Johns [Homepage]
- UCL
- Centre for Artificial Intelligence [Group]
- Jun Wang [Homepage]
- David Silver [Homepage]
- Marc Deisenroth [Homepage]
- University of Amsterdam
- Frans Oliehoek [Homepage]
- TU Delft
- Delft AI Lab [Group]
- INRIA
- Useful inequalities cheat sheet
- Concentration of measure
- dalmia/David-Silver-Reinforcement-learning: Notes for the Reinforcement Learning course by David Silver along with implementation of various algorithms. (github.com)
- 强化学习路线推荐及资料整理 - 知乎 (zhihu.com)
- PacktPublishing/Mastering-Reinforcement-Learning-with-Python: Mastering Reinforcement Learning with Python, published by Packt (github.com)
- Farama Docs
- BAIR Blog
- Policy-based vs. Value-based [ZhiHu]
- Philosophy of Reinforcement Learning
- Offline RL vs. Online RL vs. Hybrid RL
- World Models vs. Model-Free RL
- RLHF / Preference Optimization / Agent RL
This is an active repository and it is time-consuming to maintain the content. So your contributions really matter!
If you find it helpful, please vote for it by adding 👍.
If you have any question about this list, do not hesitate to contact me at 1546631808@qq.com.
Preferred ways to contribute:
- preserve the existing structure and add missing resources;
- fix broken links;
- add newer official references for old entries;
- expand topic pages under
./doc/.