nanogpt

A production-style implementation of nanoGPT — a GPT-2 language model for training and text generation in PyTorch.

Features:

Full GPT-2 architecture with Flash Attention
Training from scratch, resuming from checkpoints, or fine-tuning pretrained GPT-2 weights
Distributed Data Parallel (DDP) for multi-GPU training
Mixed precision training (bfloat16/float16)
Cosine learning rate schedule with warmup
Gradient accumulation and gradient clipping
Weights & Biases logging and artifact tracking
Hydra configuration management
Dataset preparation for Shakespeare (BPE + char-level) and OpenWebText

Development

Prerequisites

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Setup

uv sync

Quick Start: Shakespeare (char-level)

Prepare the dataset:

uv run python data/shakespeare_char/prepare.py

Train:

uv run python train.py --config-name train_shakespeare_char

Sample from the trained model:

uv run python sample.py --out_dir=out-shakespeare-char

Training GPT-2 on OpenWebText

Prepare the dataset (~54GB download):

uv run python data/openwebtext/prepare.py

Train on 8x A100 GPUs:

torchrun --standalone --nproc_per_node=8 train.py --config-name train_gpt2

Training Max GPT-2 (429M) on OpenWebText — RTX 4070

A maxed-out 429M parameter model (30 layers, 16 heads, 1024 embedding) designed to fit on a single RTX 4070 12GB GPU. Trains for ~2 epochs (~551k iterations, ~14 days).

Prepare the dataset (~54GB download, if not already done):

uv run python data/openwebtext/prepare.py

Train from scratch:

uv run python train.py --config-name train_max_owt

Resume from checkpoint (loads out-max-owt/ckpt.pt):

uv run python train.py --config-name train_max_owt init_from=resume

The W&B run ID is saved in the checkpoint, so resuming automatically continues the same W&B run. To override with a different run ID:

uv run python train.py --config-name train_max_owt init_from=resume wandb_run_id=<RUN_ID>

Sample from the trained model:

uv run python sample.py --out_dir=out-max-owt

Fine-tuning GPT-2 on Shakespeare

Prepare the Shakespeare dataset (BPE tokenized):

uv run python data/shakespeare/prepare.py

Fine-tune:

uv run python train.py --config-name finetune_shakespeare

Sampling

# From checkpoint
uv run python sample.py --out_dir=out-shakespeare-char

# From pretrained GPT-2
uv run python sample.py --init_from=gpt2-xl

# With custom prompt
uv run python sample.py --init_from=gpt2 --start="To be or not to be"

Configuration

Training configs are in configs/ and use Hydra. Override any parameter from the command line:

uv run python train.py --config-name train_shakespeare_char model.dropout=0.1

Available configs:

config.yaml — Default GPT-2 (124M) on OpenWebText
train_shakespeare_char.yaml — Character-level Shakespeare (small, fast)
train_gpt2.yaml — Full GPT-2 training on OpenWebText
train_max_owt.yaml — Max GPT-2 (429M) on OpenWebText for RTX 4070 12GB
finetune_shakespeare.yaml — Fine-tune GPT-2-XL on Shakespeare

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
configs		configs
data		data
nanogpt		nanogpt
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
sample.py		sample.py
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nanogpt

Development

Prerequisites

Setup

Quick Start: Shakespeare (char-level)

Training GPT-2 on OpenWebText

Training Max GPT-2 (429M) on OpenWebText — RTX 4070

Fine-tuning GPT-2 on Shakespeare

Sampling

Configuration

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nanogpt

Development

Prerequisites

Setup

Quick Start: Shakespeare (char-level)

Training GPT-2 on OpenWebText

Training Max GPT-2 (429M) on OpenWebText — RTX 4070

Fine-tuning GPT-2 on Shakespeare

Sampling

Configuration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages