Skip to content

Cache bug fixes#1735

Draft
ilongin wants to merge 2 commits intomainfrom
ilongin/1725-cache-fix
Draft

Cache bug fixes#1735
ilongin wants to merge 2 commits intomainfrom
ilongin/1725-cache-fix

Conversation

@ilongin
Copy link
Copy Markdown
Contributor

@ilongin ilongin commented Apr 27, 2026

Fixes #1725read_dataset(...).map(...) re-downloads files that are already in ~/.datachain/cache/.

Two distinct bugs, both rooted in cache=False being treated as "ignore cache entirely" rather than "don't write to cache". Per @dmpetrov: cache= is a write switch; reads should always consult the persistent cache.

Bug A — prefetcher ignores persistent. With cache=False, prefetch>0, the prefetcher swaps to a fresh temp cache and re-downloads everything. Fix: Cache gains an optional read-only fallback; the temp cache uses persistent as fallback, so contains()/get_path() see hits in persistent while writes/eviction stay scoped to temp.

Bug B — read path ignores persistent when prefetch=0. _caching_enabled=False makes open_object(use_cache=False) stream from remote even on a cache hit. Fix: decoupled _caching_enabled — it now gates writes (ensure_cached) only; reads (open_object, get_local_path) always consult cache.
Applied to all 5 read sites.

@ilongin ilongin linked an issue Apr 27, 2026 that may be closed by this pull request
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 27, 2026

Deploying datachain with  Cloudflare Pages  Cloudflare Pages

Latest commit: aaf9850
Status: ✅  Deploy successful!
Preview URL: https://3a5df550.datachain-2g6.pages.dev
Branch Preview URL: https://ilongin-1725-cache-fix.datachain-2g6.pages.dev

View logs

@ilongin ilongin marked this pull request as draft April 27, 2026 22:26
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 88.23529% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/datachain/cache.py 89.74% 2 Missing and 2 partials ⚠️
src/datachain/lib/file.py 77.77% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache does not work

1 participant