A comprehensive reinforcement learning project covering single-agent, multi-agent, and advanced RL concepts through a simulated colony environment.
The Autonomous Colony is a multi-agent RL environment where agents learn to:
- ๐ Navigate a 2D grid world
- ๐ Collect resources (food, water, materials) to survive
- ๐ง Learn using various RL algorithms (Q-Learning, DQN, PPO, MA-PPO)
- ๐ค Cooperate through communication and coordination
- ๐ฑ Explore using curiosity-driven learning
- ๐ Adapt through curriculum and meta-learning
- Tabular Q-Learning - Classic value-based RL
- Deep Q-Network (DQN) - Function approximation with experience replay
- Proximal Policy Optimization (PPO) - State-of-the-art policy gradient
- Multi-Agent PPO (MAPPO) - Centralized training, decentralized execution
- Communication Networks - Learned message passing between agents
- Cooperation Rewards - Proximity, sharing, and joint success bonuses
- Value Decomposition - Individual contributions to team success
- Curiosity-Driven Exploration - Intrinsic Curiosity Module (ICM)
- Hierarchical RL - Temporal abstraction with meta-controllers
- World Models - Model-based RL with predictive models
- Meta-Learning - MAML-style adaptation to new tasks
- Curriculum Learning - Progressive difficulty adjustment
```bash git clone https://github.com/ritikkumarv/autonomous-colony.git cd autonomous-colony pip install -r requirements.txt ```
```bash
python train.py --agent ppo --episodes 1000
python train.py --agent ma_ppo --n_agents 4 --episodes 2000 --communication
python train.py --agent ppo --episodes 2000 --curiosity --curriculum
python train.py --agent ma_ppo --n_agents 4 --episodes 3000 \ --communication --curiosity --curriculum --world_model ```
```bash
python visualize.py --model models/ppo_latest/model.pt --episodes 5
python visualize.py --model models/ppo_latest/model.pt --plot_training ```
```bash
python evaluate.py --model models/ppo_latest/model.pt --episodes 100 ```
``` autonomous-colony/ โ โโโ train.py # Main training script โโโ visualize.py # Visualization tool โโโ evaluate.py # Evaluation script โโโ download_models.py # Download pre-trained models โ โโโ notebooks/ # Learning notebooks โ โโโ part1_environment.ipynb # Environment setup โ โโโ part2_agents.ipynb # Single-agent RL โ โโโ part3_multiagent.ipynb # Multi-agent RL โ โโโ part4_advanced.ipynb # Advanced features โ โโโ src/ โ โโโ environment/ # Grid world environment โ โ โโโ colony_env.py # Main environment class โ โ โโโ resources.py # Resource spawning โ โ โโโ rendering.py # Visualization โ โ โ โโโ agents/ # RL agents โ โ โโโ tabular_q.py # Q-Learning โ โ โโโ dqn.py # Deep Q-Network โ โ โโโ ppo.py # PPO โ โ โโโ base_agent.py # Base agent class โ โ โ โโโ multiagent/ # Multi-agent systems โ โ โโโ ma_ppo.py # Multi-agent PPO โ โ โโโ communication.py # Communication networks โ โ โโโ coordination.py # Cooperation rewards โ โ โ โโโ advanced/ # Advanced RL features โ โ โโโ curiosity.py # ICM & RND โ โ โโโ hierarchical.py # Hierarchical RL โ โ โโโ world_model.py # Model-based RL โ โ โโโ meta_learning.py # MAML โ โ โโโ curriculum.py # Curriculum learning โ โ โ โโโ utils/ # Utilities โ โโโ training.py # Training helpers โ โโโ logging.py # Logging utilities โ โโโ checkpointing.py # Model checkpointing โ โโโ models/ # Saved models โโโ logs/ # Training logs โโโ results/ # Evaluation results โโโ visualizations/ # Generated plots ```
``` --agent {q_learning,dqn,ppo,ma_ppo,hierarchical} --episodes N Number of training episodes --n_agents N Number of agents (for multi-agent) --env_size N Grid size (default: 16) --max_steps N Max steps per episode (default: 200) ```
``` --communication Enable communication networks --cooperation Add cooperation rewards --value_decomposition Use value decomposition networks ```
``` --curiosity Enable curiosity-driven exploration --curriculum Use curriculum learning --world_model Enable world model learning --meta_learning Use meta-learning (MAML) ```
``` --lr FLOAT Learning rate (default: 3e-4) --gamma FLOAT Discount factor (default: 0.99) --no_render Disable live rendering during training --checkpoint_freq N Save checkpoint every N episodes ```
Training metrics are logged to TensorBoard:
```bash tensorboard --logdir logs/ ```
Metrics include:
- Episode rewards (mean, min, max)
- Success rate
- Episode length
- Loss values (policy, value, entropy)
- Curiosity rewards (if enabled)
- Communication patterns (if enabled)
```bash
python train.py --agent q_learning --episodes 1000 python train.py --agent dqn --episodes 1000 python train.py --agent ppo --episodes 1000 ```
```bash
python train.py --agent ppo --episodes 2000 # baseline python train.py --agent ppo --episodes 2000 --curiosity # with ICM
python train.py --agent ppo --episodes 2000 # baseline python train.py --agent ppo --episodes 2000 --curriculum # adaptive ```
```bash
python train.py --agent ma_ppo --n_agents 4 --episodes 2000 # baseline python train.py --agent ma_ppo --n_agents 4 --episodes 2000 --communication # with comm
python train.py --agent ma_ppo --n_agents 4 --episodes 2000 # baseline python train.py --agent ma_ppo --n_agents 4 --episodes 2000 --cooperation # with coop ```
Explore the concepts step-by-step:
- Part 1: Environment - Build the grid world, understand MDP formulation
- Part 2: Agents - Implement Q-Learning, DQN, and PPO
- Part 3: Multi-Agent - Add communication and coordination
- Part 4: Advanced - Explore curiosity, hierarchical RL, and meta-learning
Each notebook is self-contained with:
- Theory explanations
- Code implementations
- Visualizations
- Exercises
```bash
pytest tests/unit/
pytest tests/integration/ ```
- Environment: Custom Gymnasium environment with partial observability
- Agents: Modular agent implementations with common base class
- Training: Unified training loop supporting all agent types
- Visualization: Multiple rendering modes (grid, trajectories, heatmaps)
- Use smaller environments: `--env_size 8` for quick experiments
- Reduce agents: Start with `--n_agents 1` or `2`
- Disable rendering: Use `--no_render` flag
- Adjust episode length: Use `--max_steps 100` for faster iterations
- More episodes: Train for `--episodes 3000+`
- Tune learning rate: Try `--lr 1e-4` or `--lr 5e-4`
- Enable features: Use `--curiosity --curriculum` for sparse rewards
- Multiple runs: Average results over 3-5 random seeds
Contributions are welcome! Areas for improvement:
- Additional RL algorithms (A3C, SAC, TD3)
- More advanced features (transformer agents, graph networks)
- Better curriculum strategies
- Improved visualizations
- Documentation and tutorials
This project is licensed under the MIT License - see the LICENSE file for details.
Built as a comprehensive learning project covering:
- Single-agent RL (Q-Learning, DQN, PPO)
- Multi-agent RL (MAPPO, communication, cooperation)
- Advanced RL (curiosity, hierarchical, world models, meta-learning)
Inspired by research in multi-agent systems, curriculum learning, and intrinsic motivation.
Happy Learning! ๐