Skip to content

ritikkumarv/autonomous-colony

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

17 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒ The Autonomous Colony

Multi-Agent Reinforcement Learning in a Grid World

Python 3.8+ PyTorch License: MIT

A comprehensive reinforcement learning project covering single-agent, multi-agent, and advanced RL concepts through a simulated colony environment.


๐ŸŽฏ Overview

The Autonomous Colony is a multi-agent RL environment where agents learn to:

  • ๐Ÿƒ Navigate a 2D grid world
  • ๐ŸŽ Collect resources (food, water, materials) to survive
  • ๐Ÿง  Learn using various RL algorithms (Q-Learning, DQN, PPO, MA-PPO)
  • ๐Ÿค Cooperate through communication and coordination
  • ๐ŸŒฑ Explore using curiosity-driven learning
  • ๐Ÿ“ˆ Adapt through curriculum and meta-learning

๐Ÿง  RL Concepts Implemented

Core Algorithms

  • Tabular Q-Learning - Classic value-based RL
  • Deep Q-Network (DQN) - Function approximation with experience replay
  • Proximal Policy Optimization (PPO) - State-of-the-art policy gradient
  • Multi-Agent PPO (MAPPO) - Centralized training, decentralized execution

Multi-Agent Features

  • Communication Networks - Learned message passing between agents
  • Cooperation Rewards - Proximity, sharing, and joint success bonuses
  • Value Decomposition - Individual contributions to team success

Advanced Features

  • Curiosity-Driven Exploration - Intrinsic Curiosity Module (ICM)
  • Hierarchical RL - Temporal abstraction with meta-controllers
  • World Models - Model-based RL with predictive models
  • Meta-Learning - MAML-style adaptation to new tasks
  • Curriculum Learning - Progressive difficulty adjustment

๐Ÿš€ Quick Start

Installation

```bash git clone https://github.com/ritikkumarv/autonomous-colony.git cd autonomous-colony pip install -r requirements.txt ```

Training

```bash

Single agent PPO

python train.py --agent ppo --episodes 1000

Multi-agent with communication

python train.py --agent ma_ppo --n_agents 4 --episodes 2000 --communication

With curiosity and curriculum learning

python train.py --agent ppo --episodes 2000 --curiosity --curriculum

All features combined

python train.py --agent ma_ppo --n_agents 4 --episodes 3000 \ --communication --curiosity --curriculum --world_model ```

Visualization

```bash

Visualize trained agent

python visualize.py --model models/ppo_latest/model.pt --episodes 5

Create training plots

python visualize.py --model models/ppo_latest/model.pt --plot_training ```

Evaluation

```bash

Evaluate trained agent

python evaluate.py --model models/ppo_latest/model.pt --episodes 100 ```


๐Ÿ“ Project Structure

``` autonomous-colony/ โ”‚ โ”œโ”€โ”€ train.py # Main training script โ”œโ”€โ”€ visualize.py # Visualization tool โ”œโ”€โ”€ evaluate.py # Evaluation script โ”œโ”€โ”€ download_models.py # Download pre-trained models โ”‚ โ”œโ”€โ”€ notebooks/ # Learning notebooks โ”‚ โ”œโ”€โ”€ part1_environment.ipynb # Environment setup โ”‚ โ”œโ”€โ”€ part2_agents.ipynb # Single-agent RL โ”‚ โ”œโ”€โ”€ part3_multiagent.ipynb # Multi-agent RL โ”‚ โ””โ”€โ”€ part4_advanced.ipynb # Advanced features โ”‚ โ”œโ”€โ”€ src/ โ”‚ โ”œโ”€โ”€ environment/ # Grid world environment โ”‚ โ”‚ โ”œโ”€โ”€ colony_env.py # Main environment class โ”‚ โ”‚ โ”œโ”€โ”€ resources.py # Resource spawning โ”‚ โ”‚ โ””โ”€โ”€ rendering.py # Visualization โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ agents/ # RL agents โ”‚ โ”‚ โ”œโ”€โ”€ tabular_q.py # Q-Learning โ”‚ โ”‚ โ”œโ”€โ”€ dqn.py # Deep Q-Network โ”‚ โ”‚ โ”œโ”€โ”€ ppo.py # PPO โ”‚ โ”‚ โ””โ”€โ”€ base_agent.py # Base agent class โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ multiagent/ # Multi-agent systems โ”‚ โ”‚ โ”œโ”€โ”€ ma_ppo.py # Multi-agent PPO โ”‚ โ”‚ โ”œโ”€โ”€ communication.py # Communication networks โ”‚ โ”‚ โ””โ”€โ”€ coordination.py # Cooperation rewards โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ advanced/ # Advanced RL features โ”‚ โ”‚ โ”œโ”€โ”€ curiosity.py # ICM & RND โ”‚ โ”‚ โ”œโ”€โ”€ hierarchical.py # Hierarchical RL โ”‚ โ”‚ โ”œโ”€โ”€ world_model.py # Model-based RL โ”‚ โ”‚ โ”œโ”€โ”€ meta_learning.py # MAML โ”‚ โ”‚ โ””โ”€โ”€ curriculum.py # Curriculum learning โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ utils/ # Utilities โ”‚ โ”œโ”€โ”€ training.py # Training helpers โ”‚ โ”œโ”€โ”€ logging.py # Logging utilities โ”‚ โ””โ”€โ”€ checkpointing.py # Model checkpointing โ”‚ โ”œโ”€โ”€ models/ # Saved models โ”œโ”€โ”€ logs/ # Training logs โ”œโ”€โ”€ results/ # Evaluation results โ””โ”€โ”€ visualizations/ # Generated plots ```


๐ŸŽฎ Training Arguments

Basic Options

``` --agent {q_learning,dqn,ppo,ma_ppo,hierarchical} --episodes N Number of training episodes --n_agents N Number of agents (for multi-agent) --env_size N Grid size (default: 16) --max_steps N Max steps per episode (default: 200) ```

Multi-Agent Options

``` --communication Enable communication networks --cooperation Add cooperation rewards --value_decomposition Use value decomposition networks ```

Advanced Options

``` --curiosity Enable curiosity-driven exploration --curriculum Use curriculum learning --world_model Enable world model learning --meta_learning Use meta-learning (MAML) ```

Training Options

``` --lr FLOAT Learning rate (default: 3e-4) --gamma FLOAT Discount factor (default: 0.99) --no_render Disable live rendering during training --checkpoint_freq N Save checkpoint every N episodes ```


๐Ÿ“Š Monitoring Training

Training metrics are logged to TensorBoard:

```bash tensorboard --logdir logs/ ```

Metrics include:

  • Episode rewards (mean, min, max)
  • Success rate
  • Episode length
  • Loss values (policy, value, entropy)
  • Curiosity rewards (if enabled)
  • Communication patterns (if enabled)

๐Ÿ”ฌ Experiments

Baseline Comparisons

```bash

Compare different algorithms

python train.py --agent q_learning --episodes 1000 python train.py --agent dqn --episodes 1000 python train.py --agent ppo --episodes 1000 ```

Ablation Studies

```bash

Test impact of curiosity

python train.py --agent ppo --episodes 2000 # baseline python train.py --agent ppo --episodes 2000 --curiosity # with ICM

Test impact of curriculum

python train.py --agent ppo --episodes 2000 # baseline python train.py --agent ppo --episodes 2000 --curriculum # adaptive ```

Multi-Agent Studies

```bash

Test communication

python train.py --agent ma_ppo --n_agents 4 --episodes 2000 # baseline python train.py --agent ma_ppo --n_agents 4 --episodes 2000 --communication # with comm

Test cooperation rewards

python train.py --agent ma_ppo --n_agents 4 --episodes 2000 # baseline python train.py --agent ma_ppo --n_agents 4 --episodes 2000 --cooperation # with coop ```


๐ŸŽ“ Learning Notebooks

Explore the concepts step-by-step:

  1. Part 1: Environment - Build the grid world, understand MDP formulation
  2. Part 2: Agents - Implement Q-Learning, DQN, and PPO
  3. Part 3: Multi-Agent - Add communication and coordination
  4. Part 4: Advanced - Explore curiosity, hierarchical RL, and meta-learning

Each notebook is self-contained with:

  • Theory explanations
  • Code implementations
  • Visualizations
  • Exercises

๐Ÿ› ๏ธ Development

Running Tests

```bash

Unit tests (coming soon)

pytest tests/unit/

Integration tests (coming soon)

pytest tests/integration/ ```

Code Structure

  • Environment: Custom Gymnasium environment with partial observability
  • Agents: Modular agent implementations with common base class
  • Training: Unified training loop supporting all agent types
  • Visualization: Multiple rendering modes (grid, trajectories, heatmaps)

๐Ÿ“ˆ Performance Tips

For Faster Training

  1. Use smaller environments: `--env_size 8` for quick experiments
  2. Reduce agents: Start with `--n_agents 1` or `2`
  3. Disable rendering: Use `--no_render` flag
  4. Adjust episode length: Use `--max_steps 100` for faster iterations

For Better Results

  1. More episodes: Train for `--episodes 3000+`
  2. Tune learning rate: Try `--lr 1e-4` or `--lr 5e-4`
  3. Enable features: Use `--curiosity --curriculum` for sparse rewards
  4. Multiple runs: Average results over 3-5 random seeds

๏ฟฝ๏ฟฝ Contributing

Contributions are welcome! Areas for improvement:

  • Additional RL algorithms (A3C, SAC, TD3)
  • More advanced features (transformer agents, graph networks)
  • Better curriculum strategies
  • Improved visualizations
  • Documentation and tutorials

๐Ÿ“š Resources

Reinforcement Learning

Multi-Agent RL

Implementation References


๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


โœจ Acknowledgments

Built as a comprehensive learning project covering:

  • Single-agent RL (Q-Learning, DQN, PPO)
  • Multi-agent RL (MAPPO, communication, cooperation)
  • Advanced RL (curiosity, hierarchical, world models, meta-learning)

Inspired by research in multi-agent systems, curriculum learning, and intrinsic motivation.


Happy Learning! ๐Ÿš€

About

A comprehensive reinforcement learning project that covers ALL major RL concepts through building a multi-agent simulated world - completely free and open source!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors