Skip to content

martinabeleda/nanogpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nanogpt

A production-style implementation of nanoGPT — a GPT-2 language model for training and text generation in PyTorch.

Features:

  • Full GPT-2 architecture with Flash Attention
  • Training from scratch, resuming from checkpoints, or fine-tuning pretrained GPT-2 weights
  • Distributed Data Parallel (DDP) for multi-GPU training
  • Mixed precision training (bfloat16/float16)
  • Cosine learning rate schedule with warmup
  • Gradient accumulation and gradient clipping
  • Weights & Biases logging and artifact tracking
  • Hydra configuration management
  • Dataset preparation for Shakespeare (BPE + char-level) and OpenWebText

Development

Prerequisites

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Setup

uv sync

Quick Start: Shakespeare (char-level)

Prepare the dataset:

uv run python data/shakespeare_char/prepare.py

Train:

uv run python train.py --config-name train_shakespeare_char

Sample from the trained model:

uv run python sample.py --out_dir=out-shakespeare-char

Training GPT-2 on OpenWebText

Prepare the dataset (~54GB download):

uv run python data/openwebtext/prepare.py

Train on 8x A100 GPUs:

torchrun --standalone --nproc_per_node=8 train.py --config-name train_gpt2

Training Max GPT-2 (429M) on OpenWebText — RTX 4070

A maxed-out 429M parameter model (30 layers, 16 heads, 1024 embedding) designed to fit on a single RTX 4070 12GB GPU. Trains for ~2 epochs (~551k iterations, ~14 days).

Prepare the dataset (~54GB download, if not already done):

uv run python data/openwebtext/prepare.py

Train from scratch:

uv run python train.py --config-name train_max_owt

Resume from checkpoint (loads out-max-owt/ckpt.pt):

uv run python train.py --config-name train_max_owt init_from=resume

The W&B run ID is saved in the checkpoint, so resuming automatically continues the same W&B run. To override with a different run ID:

uv run python train.py --config-name train_max_owt init_from=resume wandb_run_id=<RUN_ID>

Sample from the trained model:

uv run python sample.py --out_dir=out-max-owt

Fine-tuning GPT-2 on Shakespeare

Prepare the Shakespeare dataset (BPE tokenized):

uv run python data/shakespeare/prepare.py

Fine-tune:

uv run python train.py --config-name finetune_shakespeare

Sampling

# From checkpoint
uv run python sample.py --out_dir=out-shakespeare-char

# From pretrained GPT-2
uv run python sample.py --init_from=gpt2-xl

# With custom prompt
uv run python sample.py --init_from=gpt2 --start="To be or not to be"

Configuration

Training configs are in configs/ and use Hydra. Override any parameter from the command line:

uv run python train.py --config-name train_shakespeare_char model.dropout=0.1

Available configs:

  • config.yaml — Default GPT-2 (124M) on OpenWebText
  • train_shakespeare_char.yaml — Character-level Shakespeare (small, fast)
  • train_gpt2.yaml — Full GPT-2 training on OpenWebText
  • train_max_owt.yaml — Max GPT-2 (429M) on OpenWebText for RTX 4070 12GB
  • finetune_shakespeare.yaml — Fine-tune GPT-2-XL on Shakespeare

About

Implementation of GPT in pytorch 🤖

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages