Skip to content

perf(dir-catalog): rewrite manifest mutations with copy-on-write#7176

Open
jackye1995 wants to merge 1 commit into
lance-format:mainfrom
jackye1995:jack/dir-manifest-cow-clean
Open

perf(dir-catalog): rewrite manifest mutations with copy-on-write#7176
jackye1995 wants to merge 1 commit into
lance-format:mainfrom
jackye1995:jack/dir-manifest-cow-clean

Conversation

@jackye1995

@jackye1995 jackye1995 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Replace __manifest merge-insert/delete maintenance with streaming copy-on-write rewrites.
  • Build strict-overwrite replacement scalar indices for object_id, object_type, and base_objects; delete table-version ranges without expanding every version id.

This PR is the cleaned-up, production implementation of the copy-on-write approach prototyped in #6794.

Expected performance

From the #6794 prototype benchmark (S3, multi-process, 1K manifest entries):

  • Write throughput: ~1.6-2x higher (~2.3-2.5 vs ~1.2-1.5 ops/s)
  • Write tail latency: ~3.5-4.1x lower p90/p99 (no periodic compaction spikes)
  • Warm read at high concurrency: up to ~2.4x throughput (single-fragment reads)
  • Cold read: ~11-15% lower latency

@github-actions github-actions Bot added A-deps Dependency updates A-namespace Namespace impls performance labels Jun 9, 2026
@jackye1995 jackye1995 force-pushed the jack/dir-manifest-cow-clean branch from 2a96c17 to 2f1b29a Compare June 9, 2026 10:00
@github-actions github-actions Bot added A-python Python bindings and removed A-python Python bindings labels Jun 9, 2026
@jackye1995 jackye1995 force-pushed the jack/dir-manifest-cow-clean branch from 2f1b29a to 7aa7936 Compare June 9, 2026 10:11
@github-actions github-actions Bot added the A-python Python bindings label Jun 9, 2026
@jackye1995 jackye1995 force-pushed the jack/dir-manifest-cow-clean branch from 7aa7936 to 06ad0be Compare June 9, 2026 10:35
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.44068% with 216 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-namespace-impls/src/dir/manifest.rs 85.82% 154 Missing and 53 partials ⚠️
rust/lance/src/dataset/write/commit.rs 94.49% 6 Missing ⚠️
rust/lance/src/io/commit.rs 75.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@jackye1995 jackye1995 force-pushed the jack/dir-manifest-cow-clean branch from 06ad0be to d8c90da Compare June 10, 2026 23:23

@LuQQiu LuQQiu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the performance number with this new approach

@LuQQiu

LuQQiu commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Oh see the performance numbers.... lolll skip reviewing the PR descriptions fully

@LuQQiu

LuQQiu commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Why
Cold read: ~11-15% lower latency

@jackye1995

Copy link
Copy Markdown
Contributor Author

Why Cold read: ~11-15% lower latency

I think it's because in general just fewer files to read

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-deps Dependency updates A-namespace Namespace impls A-python Python bindings performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants