Skip to content

Pre-built load path index & Compiled Gem Bundles#533

Closed
bdevel wants to merge 10 commits intorails:mainfrom
bdevel:optimize/prebuilt-index
Closed

Pre-built load path index & Compiled Gem Bundles#533
bdevel wants to merge 10 commits intorails:mainfrom
bdevel:optimize/prebuilt-index

Conversation

@bdevel
Copy link
Copy Markdown

@bdevel bdevel commented Mar 24, 2026

Pre-built load path index + per-gem ISeq bundles for faster boot

Rationale

Gem libraries don't change between boots. Once you run bundle install, the files under /gems/nokogiri-1.16.0/lib/ are static until the next bundle install. Yet on every boot, bootsnap re-resolves File.realpath for every $LOAD_PATH entry, checks mtimes on every cached directory, rebuilds a hash index of every requirable file, and opens 5+ files per require to validate and load the compile cache. For a Rails app with 200+ gems, that's hundreds of unnecessary syscalls and ~220ms of wasted work on the load path alone, plus thousands of redundant file operations for bytecode loading.

The core insight: if $LOAD_PATH hasn't changed, the entire index can be reused as-is. And if a gem's path hasn't changed (which includes its version), its compiled bytecode can be served from a single pre-built bundle instead of thousands of individual cache files.

This PR applies that insight in two layers:

  1. Save the final resolved index -- not the per-directory file listings, but the actual feature -> path mapping that bootsnap builds on every boot. On warm boot, load it in one MessagePack read and skip all the directory scanning.

  2. Pre-compile gems into per-gem bundles -- instead of one cache file per .rb file (requiring open + stat + read per require), pack each gem's compiled ISeq binaries into a single file. One file open per gem instead of hundreds. The gem's path includes its version, so version changes naturally invalidate only that gem's bundle.

The result: gem code is treated as the immutable artifact it is. Scanning and revalidation only happen when something actually changes.

Summary

  • Pre-built load path index: Caches the final @index hash (feature -> path mapping) and resolved realpaths in a separate file. On warm boot, skips all File.realpath calls, mtime checks, and hash rebuilding. Bootsnap.setup() drops from ~220ms to ~8ms (27x) for a 300-gem app.
  • Per-gem ISeq bundles: Packs each gem's compiled Ruby bytecode into a single bundle file instead of thousands of individual cache files. Reduces ~13,500 syscalls to ~300 file reads during Bundler.require. Each bundle is keyed to its $LOAD_PATH entry (which includes gem version), so upgrading a gem only invalidates that gem's bundle.
  • Auto-build in production mode: Bundles are built lazily on first boot when development_mode: false. No manual precompile step needed -- but bootsnap precompile --bundle is available for pre-warming in Docker builds.

Real-world results

Tested on a Rails 7.2 app with 236 gems and 2,700 loaded features:

macOS development (where the biggest gains are):

Metric Before After Improvement
Bootsnap.setup() warm init ~220ms ~8ms 27x
Bundler.require 3.3-7.4s 2.4-3.3s ~2x
Total boot (rails runner) 8-11s 5.6s ~1.8x

First boot (cold, auto-building bundles): ~25s. Subsequent boots: ~5.6s steady state.

Linux production (Docker):

Production containers with warm page cache saw no measurable improvement (~5.4s before and after). This is expected: Linux ext4/overlay2 serves individual cache files very efficiently when files are in page cache, so the syscall reduction doesn't translate to wall-clock savings. The remaining boot time is dominated by RubyVM::InstructionSequence.load_from_binary (CPU-bound ISeq deserialization) which bootsnap cannot optimize.

Where the gains show up:

  • macOS development: Significant. APFS filesystem overhead, Gatekeeper/Spotlight checks, and cold page cache make individual file operations expensive. Bundling eliminates thousands of these.
  • CI runners / cold cache: First boot on a fresh machine benefits from auto-built bundles on subsequent runs.
  • Large apps: The load path index optimization (218ms -> 8ms) is proportional to gem count and always applies, though it's a small fraction of total boot time.

How it works

Load path index (cache.rb, store.rb):

On first boot, after building the @index hash the normal way, we serialize it (plus the resolved realpaths) to load-path-cache-index keyed by an MD5 fingerprint of $LOAD_PATH. On subsequent boots, if the fingerprint matches, we load the index directly via MessagePack -- skipping the per-directory mtime checks, File.realpath calls, and hash insertions that previously cost ~220ms.

Per-gem ISeq bundles (iseq_bundle.rb):

Each $LOAD_PATH entry gets its own bundle file under iseq-bundles/. A bundle contains a MessagePack header (index mapping source path -> offset/size/mtime) followed by a concatenated blob of ISeq binaries. On load_iseq, we check the in-memory index and serve the ISeq via byteslice -- one stat call per file (or zero in readonly mode) instead of 5+ syscalls per file with the individual cache.

Invalidation is natural: gem paths include the version (/gems/nokogiri-1.16.0/lib), so upgrading a gem creates a new bundle path. Adding a new gem means no bundle exists yet -- it falls back to the individual compile cache (standard bootsnap behavior) until a bundle is auto-built on the next production boot.

Usage

Works automatically -- no configuration change needed for existing apps.

Optional pre-build for Docker (maximizes first-boot speed):

bundle exec bootsnap precompile --bundle --gemfile app/ lib/ config/

Disable bundles (if needed):

BOOTSNAP_NO_BUNDLE=1

Test plan

  • All 164 existing tests pass (0 failures, 0 errors)
  • 18 new tests covering:
    • Pre-built index: cache creation, warm load, invalidation on path change, fingerprint determinism/ordering/content sensitivity
    • Per-gem bundles: build/load round-trip, persistence, mtime validation, skip_validation mode, deleted source handling, version mismatch, BOOTSNAP_NO_BUNDLE env var, build_for_paths! CLI integration
  • Benchmarked on synthetic gem environment (300 gems x 20 files) and real Rails app (236 gems)
  • Security review: no new attack surface (cache directory write access was already a trust boundary); added defensive data size limit (500MB) and randomized temp file names
  • Compatible with Ruby 2.7+ (all APIs used exist since 2.4+)
  • Falls back gracefully on any cache miss to existing bootsnap behavior
  • Tested on macOS (development) and Linux (VM + production Docker)
  • Not yet tested on Windows -- Windows contributors welcome to verify

bdevel and others added 10 commits March 23, 2026 17:21
On each boot, bootsnap rebuilds its in-memory index by iterating every
$LOAD_PATH entry, checking mtimes, and inserting all discoverable files
into a hash. For large apps (300+ gems), this costs ~220ms even with a
warm cache — wasted work when nothing has changed.

This commit stores the final resolved index (feature→path mapping) plus
the resolved realpaths in a separate cache file, keyed by an MD5
fingerprint of $LOAD_PATH. On warm boot with an unchanged load path,
the index is loaded directly via MessagePack, skipping all File.realpath
calls, mtime checks, and hash rebuilding entirely.

Benchmark (300 gems, 6300 files):
  Before: 218ms warm init
  After:    8ms warm init (27x faster)

Cache miss falls back to the original behavior transparently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The existing compile cache stores each file's compiled ISeq as a
separate cache file. For each require, bootsnap opens the source file
(stat), opens the cache file, reads the header, and reads the data —
5+ syscalls per file. At 2700 loaded features, that's 13,500+ syscalls.

This adds an optional "ISeq bundle" — a single file containing all
compiled ISeq binaries with a MessagePack index. On boot, the bundle
is read once into memory, and subsequent require calls serve ISeq data
from the in-memory hash with at most 1 stat call (or zero in readonly/
production mode).

Usage:
  # Build the bundle (run after deploy/precompile):
  Bootsnap::CompileCache::ISeqBundle.build!(cache_dir, source_paths: paths)

  # Auto-loaded on boot if present. In readonly mode, skips per-file
  # stat validation for maximum speed.

Benchmark (300 gems, 6300 files):
  Individual cache:          17.2s
  Bundled cache:              8.2s  (2.1x faster)
  Bundled + readonly (prod):  7.3s  (2.3x faster)

Falls back to individual cache on any miss.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `bootsnap precompile --bundle --gemfile app` which builds the ISeq
bundle as part of the existing precompile workflow.

Also adds automatic Gemfile.lock fingerprinting: when the bundle is
built, the lock file's MD5 is stored in the bundle header. On boot, if
the fingerprint matches, per-file stat validation is skipped entirely
(same effect as readonly mode, but automatic). This means there's
effectively one mode — if gems haven't changed, you get full speed
without any configuration.

Usage:
  bootsnap precompile --bundle --gemfile app lib
  # or just --bundle if you don't need gem precompilation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the monolithic ISeq bundle with per-$LOAD_PATH-entry bundles.
Each gem gets its own bundle file, keyed by MD5 of its load path (which
includes the gem version). This means:

- Adding a gem: only the new gem lacks a bundle, everything else is fast
- Upgrading a gem: new version = new path = new bundle, old stays valid
- No manual rebuild needed: bundles auto-build on first require
- No global invalidation: Gemfile.lock changes don't blow away cache

The precompile CLI (`bootsnap precompile --bundle`) can still pre-build
bundles for maximum first-boot speed (useful in Docker builds).

Benchmark (300 gems, 6300 files):
  Individual cache (current):  47s
  Per-gem cold (auto-build):   22s  (2.2x — first boot, no precompile)
  Per-gem warm (on disk):      8.8s (5.4x — subsequent boots)
  Bundle files: 302, total 15MB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Auto-build of per-gem bundles now only happens in non-development mode
  (production, staging, etc.). Development/test boots skip auto-build and
  use the individual compile cache, restoring test suite speed (16s → 7s).
- Bundles can be pre-built in any mode via `bootsnap precompile --bundle`.
- Add BOOTSNAP_NO_BUNDLE env var to disable bundles entirely.
- Update README with ISeq bundles documentation, performance numbers,
  and new env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds 18 new tests covering:

Pre-built load path index:
- Index cache created on first boot
- Index cache loaded on second boot (skipping dir scanning)
- Index invalidated when $LOAD_PATH changes
- Fingerprint determinism, ordering sensitivity, content sensitivity

Per-gem ISeq bundles:
- Bundle build and load round-trip
- Bundle persistence to disk and reload
- Unknown path returns nil
- Source mtime validation (returns nil on change)
- skip_validation mode (returns cached ISeq despite changes)
- Deleted source file returns nil
- Different gem versions produce different bundle paths
- Empty directory produces no bundle
- BOOTSNAP_NO_BUNDLE env var disables bundles
- build_for_paths! CLI integration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 500MB sanity limit on data_size when loading bundles (prevents
  OOM from maliciously crafted bundle files)
- Add random component to temp file names during bundle writes
  (aligns with Store's existing pattern)

No new attack surface vs existing bootsnap — cache directory write
access was already a trust boundary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bench_compile_cache.rb tested the monolithic bundle approach which was
replaced by per-gem bundles. The remaining benchmarks cover the current
implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@byroot
Copy link
Copy Markdown
Member

byroot commented Mar 24, 2026

So I'm not sure how much you understand bootsnap given the AI use, and the PR isn't reviewable anyway, and some of the claims in the description make no sense.

If you actually found some improvement, please make the effort to produce a proper PR.

But anyway, the revalidation is intentional so that edits made with bundle open are applied instantly. This can already be disabled by calling setup with development_mode: false.

@byroot byroot closed this Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants