Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

ViT MFU Benchmark

Measures Model FLOP Utilization (MFU) for Vision Transformer training across three storage backends: LanceDB (OSS + Enterprise), raw S3 via Boto3, and Parquet on S3.

The benchmark uses synthetic JPEG images at a fixed resolution to isolate data pipeline throughput as the variable. All three scripts run the same ViT model with the same training loop, so the MFU difference is entirely a function of how fast each backend can feed batches.

Scripts

Script Purpose
mfu_bench_fp16/bench.py Training loop + all three storage backends
mfu_bench_fp16/dataloaders.py LanceDB, Boto3, and Parquet DataLoader implementations
create_data.py Generates the dataset and stages it in all three backends

Setup

Requires a CUDA GPU. Configure your AWS credentials and LanceDB connection details in each script (or via environment variables).

# Generate and stage data
uv run python examples/ViT/create_data.py

# Run benchmark (all three backends in one script)
uv run python examples/ViT/mfu_bench_fp16/bench.py

Configuration

Each benchmark script shares the same knobs at the top of the file:

  • MODEL_NAME: "vit_h_14", "vit_l_16", or "vit_b_16"
  • EXPECT_IMAGE_SIZE: input resolution, default (224, 224)
  • BATCH_SIZE: images per training step
  • WARMUP_STEPS / BENCH_STEPS: steps to discard / steps to time
  • NUM_WORKERS: DataLoader workers
  • PEAK_FLOPS: your GPU's peak bfloat16 FLOPS (default set for H100/H200)

Output

Each script prints throughput, achieved TFLOPS, and GPU MFU after the timed window.

H200 — vit_h_14 @ 224×224

--- Synthetic Pure-GPU Baseline (vit_h_14) / (224, 224) ---
batch_size=350, warmup=5, steps=50
Time Taken:     43.349 sec
Throughput:     403.70 images/sec
Achieved FLOPS: 405.221 TFLOPS
GPU MFU:        40.97%

--- LanceDB OSS Training (vit_h_14) / (224, 224) ---
batch_size=350, workers=8, steps=50
Time Taken:     47.597 sec
Throughput:     367.67 images/sec
Achieved FLOPS: 369.060 TFLOPS
GPU MFU:        37.32%

--- LanceDB Enterprise Training (vit_h_14) / (224, 224) ---
batch_size=350, workers=8, steps=50
Time Taken:     45.773 sec
Throughput:     382.32 images/sec
Achieved FLOPS: 383.762 TFLOPS
GPU MFU:        38.80%

--- Boto3 S3 Training (vit_h_14) / (224, 224) ---
batch_size=350, workers=8, steps=50
Time Taken:     137.291 sec
Throughput:     127.47 images/sec
Achieved FLOPS: 127.948 TFLOPS
GPU MFU:        12.94%

--- S3 Parquet Training (vit_h_14) / (224, 224) ---
batch_size=350, workers=8, steps=50
Time Taken:     72.393 sec
Throughput:     207.20 images/sec
Achieved FLOPS: 207.984 TFLOPS
GPU MFU:        21.03%