Learning deeply. Rewriting clearly. Sharing openly.
🎯 Project Goal
Revisit, re-implement, and deeply understand key RL concepts from Stanford CS234/XCS234. This is a personal study + teaching repository designed to:
- ✍️ Reinforce my own understanding of RL foundations
- 📚 Share readable summaries and assignments for other learners
- 🧱 Serve as a reference base for future RL-related projects
| Folder | Description |
|---|---|
lectures/ |
Rewritten lecture notes with diagrams and core formulas |
assignments/ |
Key CS234 homework revisited, clarified, and modularized |
summaries/ |
My personal takeaways, analogies, and simplified concepts |
references/ |
Curated readings, papers, tools, and extra materials |
utils/ |
Shared plotting or training tools (optional) |
CS224R (Deep Reinforcement Learning) and XCS234 (Reinforcement Learning) offer complementary perspectives on RL education. For a detailed analysis, see cs224r-vs-cs234.md.
Key Differences:
| Aspect | XCS234 | CS224R |
|---|---|---|
| Approach | Theory-first with mathematical rigor | Research implementation-focused |
| Coverage | Broad foundations + bandits + human alignment | Cutting-edge robotics + offline RL + meta-learning |
| Environments | Multi-domain (games, healthcare, recommendations) | Primarily advanced robotics (MuJoCo, manipulation) |
| Learning Path | MDPs → Value methods → Policy methods → Modern topics | Imitation → Model-based → Offline → Meta-learning |
Perfect Complementarity:
- XCS234 builds theoretical foundations and covers classical RL progression
- CS224R applies cutting-edge methods to complex real-world robotics problems
- Together they provide comprehensive RL education from theory to research frontiers
Recommendation: Take XCS234 first for mathematical foundations, then CS224R for advanced implementation skills.
- CS234 Homework Summaries: Clear, concise explanations and solutions for key CS234 assignments, rewritten for better understanding and modularity.
- CS224R Homework Summaries: Summaries and annotated solutions for CS224R assignments, focusing on deep RL and advanced topics.
| Environment | Where you type tmux |
Typical Usage & Actions |
|---|---|---|
| MacBook Pro (local) | Not needed | Use VS Code's integrated terminal for quick prototyping. |
| Workstation (Ubuntu 24.04, RTX 5090) | Yes — open terminal in VS Code Remote-SSH, run tmux new -s dev |
Keeps long-running jobs alive if VS Code disconnects. |
| AWS EC2 GPU VM (Ubuntu 22.04 or 24.04) | Yes — SSH in, then tmux new -s train |
Essential for spot instances; training survives network drops. |
For a step-by-step guide to setting up a remote development environment (including VM provisioning, SSH, Git, and VS Code Remote-SSH), see:
This guide covers:
- Creating and configuring a cloud VM (e.g., AWS EC2)
- Setting up SSH keys and secure access
- Installing essential packages (Python, CUDA, etc.)
- Cloning this repository via Git
- Using VS Code Remote-SSH for seamless development
Recommended for anyone running code or training on remote servers!
graph TD
Agent -->|Action a| Environment
Environment -->|State s, Reward r| Agent
- 🔁 Agent selects actions based on policy π
- 🎯 Environment provides new state and reward
- 🧮 Agent updates value functions / policy using learning algorithm
For license details and attributions, see LICENSING.md:
- Code (MIT License)
- Content (CC BY-NC 4.0, excluding official Stanford course materials)
- Stanford Course Materials (© Stanford University, under Stanford's license terms)
🔍 Off-policy learning method using value function
Q(s, a)
# Tabular Q-Learning update rule
Q[s, a] += alpha * (r + gamma * max_a_prime(Q[s_next, a_prime]) - Q[s, a])- Learns optimal action-value function via experience
- Basis of DQN and other deep RL methods
| Category | Links |
|---|---|
| 🎓 CS234 | CS234 Website |
| Purpose | Hardware | OS / Drivers | Python | DL Stack |
|---|---|---|---|---|
| Dev / CPU | MacBook Pro (M3 Pro) | macOS 15.5 | 3.12 | PyTorch 2.7 • CPU |
| Prod / GPU | Ryzen 9 9950X + RTX 5090 | Ubuntu 24.04 · Driver 570 · CUDA 12.8+ | 3.12 | PyTorch 2.7 • CUDA |
| Cloud / Burst | AWS EC2 (Tesla V100) | Ubuntu 22.04 · CUDA 12.x | 3.12 | PyTorch 2.7 • CUDA |
| Prototype locally → train on 5090 workstation → scale up on EC2 as needed. |
I prototype on macOS, run large-scale GPU workloads on the Ubuntu 24.04 workstation (RTX 5090), and use AWS EC2 for additional compute when required.
Thanks to:
- CS234 staff for high-quality lectures and assignments
- Stanford Continuing Studies & XCS234
- Open source contributors (CleanRL, SpinningUp, etc.)
- Thanks to Prof. Emma Brunskill and the CS234 team for their foundational work in RL education
