Skip to content

feat(caching): implement negative stat cache to optimize polling of missing files#4729

Open
alleaditya wants to merge 57 commits into
GoogleCloudPlatform:masterfrom
alleaditya:add-negative-stat-cache
Open

feat(caching): implement negative stat cache to optimize polling of missing files#4729
alleaditya wants to merge 57 commits into
GoogleCloudPlatform:masterfrom
alleaditya:add-negative-stat-cache

Conversation

@alleaditya

@alleaditya alleaditya commented May 25, 2026

Copy link
Copy Markdown
Contributor

Description

This change implements negative entry caching (non-existent path caching) to optimize workloads that aggressively poll missing files (e.g., JupyterLab).

Activation: Controlled by metadata-cache: negative-ttl-secs config parameter (or --metadata-cache-negative-ttl-secs flag). It is enabled by default with a 5-second TTL. Setting it to 0 disables the feature.
Proactive Listing Cache: Empty directory listings (ListObjects returning 0 results) are proactively cached as negative directory entries to fully protect against implicit directory network probes.
VFS Routing: LookUpChild tracks definitive negative cache hits and short-circuits immediately in memory, avoiding network fallback.

Benchmark Results

A custom benchmarking script (bench.sh) was executed against a locally mounted test bucket to measure performance over sustained 10-second polling windows.

The metrics demonstrate that negative caching successfully eliminates redundant network traffic across all lookup scenarios, including those requiring implicit directory checking.

1. Implicit Dirs vs Explicit Dirs (Throughput & Network Calls)

In --implicit-dirs scenarios, FUSE previously required both a StatObject and a fallback ListObjects call per lookup. With this change, both are aggressively short-circuited:

Scenario (10s Polling) Negative Caching TTL=0s Negative Caching TTL=5s Result
Explicit Dirs Ops: 60,169
StatObject calls: 120,340
ListObjects calls: 1
Ops: 98,860
StatObject calls: 6
ListObjects calls: 1
100% network drop.
Throughput: +64%
Implicit Dirs Ops: 61,605
StatObject calls: 61,606
ListObjects calls: 61,607
Ops: 93,299
StatObject calls: 2
ListObjects calls: 3
100% network drop.
Throughput: +51%

(Note: Network calls exactly match mathematically expected TTL expiration rates - approximately 1 network call per 5 seconds).

Link to the issue in case of a bug fix.

https://b.corp.google.com/issues/511786738

Testing details

  1. Manual - Verified sustained polling behavior against local mounts with and without --implicit-dirs, validating exact TTL expiration counts and throughput improvements.
  2. Unit tests - Added WrappedSaysNotFound_NegativeCachingDisabled, CacheHit_Negative_Disabled_FetchOnly, and EmptyListing_NegativeCaching. Enhanced TestGCSMetrics_RequestCount_NegativeCachingShortCircuit to strictly verify ListObjects short-circuiting.
  3. Integration tests - Ran existing E2E caching suites successfully

Any backward incompatible change? If so, please explain.

No.

…issing files

This change adds negative caching to StatCache, short-circuits LookUpChild on negative hits, and adds metrics/integration tests to verify.
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces negative stat caching to optimize performance in scenarios where applications frequently poll for non-existent files. By caching negative results (404s) and short-circuiting lookups in the filesystem layer, the change significantly reduces redundant backend network traffic. The implementation includes configurable TTL settings and comprehensive integration tests to ensure correctness and efficiency.

Highlights

  • Negative Stat Caching: Implemented negative caching for non-existent paths to reduce redundant backend GCS requests for workloads that poll missing files.
  • Short-circuit Logic: Updated LookUpChild to short-circuit on confirmed negative hits for both files and directories, preventing unnecessary backend calls.
  • Metrics and Testing: Added integration tests to verify that negative cache hits correctly short-circuit and do not emit backend network requests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements negative stat-cache functionality to optimize lookups for non-existent files and directories, reducing redundant backend GCS requests. Key changes include updates to LookUpChild in dir.go to short-circuit on confirmed negative hits, and logic in fast_stat_bucket.go to manage negative cache entries. Documentation and tests were also added to support this feature. Review feedback suggests removing speculative negative caching in insertListing to avoid correctness issues with paginated GCS listings and adding a warning in the documentation regarding the risks of using an infinite TTL for negative entries.

Comment thread internal/storage/caching/fast_stat_bucket.go Outdated
Comment thread docs/semantics.md
…s blocking CI

- Remove speculative negative caching from listing path to avoid pagination bugs (addressed review comment).
- Add warning about infinite negative TTL usage in semantics.md (addressed review comment).
- Fix deadlock in createFile's defer in fs.go when error occurs before child inode is minted.
- Fix data race in downloader Job by deep-copying MinObject, preventing race with reader thread.
- Fix missing lock in fake bucket's MoveObject, resolving race with StatObject.
…hing, add doc warning

- Revert data race fixes in downloader Job (moved to separate PR).
- Revert name length checks and deadlock fix in fs.go (moved to separate PR).
- Revert name length checks in dir.go (moved to separate PR), keeping only LookUpChild short-circuiting.
- Revert MoveObject lock fix in fake bucket (moved to separate PR), keeping FetchOnlyFromCache checks.
- Remove speculative negative caching on empty directory listing to avoid pagination bugs (addressed review comment).
- Add warning about infinite TTL usage for negative caching in semantics.md (addressed review comment).
@alleaditya alleaditya requested review from vadlakondaswetha and removed request for geertj May 25, 2026 13:36
@alleaditya alleaditya self-assigned this May 25, 2026
@alleaditya alleaditya added execute-perf-test Execute performance test in PR execute-integration-tests Run only integration tests execute-integration-tests-on-zb To run E2E tests on zonal bucket. and removed execute-perf-test Execute performance test in PR execute-integration-tests Run only integration tests labels May 25, 2026
@alleaditya alleaditya requested a review from raj-prince June 1, 2026 08:31
@alleaditya alleaditya enabled auto-merge (squash) June 1, 2026 08:31
@alleaditya alleaditya disabled auto-merge June 1, 2026 08:31
Comment thread internal/fs/inode/dir.go Outdated
@alleaditya alleaditya requested a review from raj-prince June 4, 2026 08:31
…_bucket

Address PR comments by avoiding changes to inode/dir.go and limiting
negative caching to fast_stat_bucket. Skip negative caching for HNS
buckets.
Comment thread internal/storage/caching/fast_stat_bucket.go Outdated
Comment thread internal/storage/caching/fast_stat_bucket.go Outdated
Comment thread internal/storage/caching/fast_stat_bucket_test.go Outdated
@alleaditya alleaditya requested a review from meet2mky as a code owner June 18, 2026 07:51
@alleaditya alleaditya force-pushed the add-negative-stat-cache branch from efb0561 to beab398 Compare June 18, 2026 10:59
@alleaditya alleaditya removed the execute-perf-test Execute performance test in PR label Jun 19, 2026
The dentry_cache integration tests were failing on the CI pipeline because `s.T().Name()` includes a slash (e.g. `SuiteName/TestName`). This was causing the tests to create implicit directories in GCS. When one test deleted its file, the implicit directory was negatively cached by the Stat Cache. Subsequent tests in the same suite then failed because they attempted to access files inside the negatively cached implicit directory.

To fix this, `s.T().Name()` has been replaced with specific test names that do not contain slashes, ensuring no implicit directories are created and preventing test cross-talk.
@alleaditya alleaditya force-pushed the add-negative-stat-cache branch from 3343e28 to b62cc6b Compare June 20, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

execute-integration-tests Run only integration tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants