feat(caching): add negative entry caching to stat cache#4678
feat(caching): add negative entry caching to stat cache#4678alleaditya wants to merge 11 commits into
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces negative entry caching to the stat cache, aimed at optimizing workloads that perform aggressive polling for non-existent files. By proactively caching empty directory listings and short-circuiting lookups for missing files in memory, the system avoids unnecessary network probes. This change includes the necessary configuration flags, updated documentation, and comprehensive test coverage to ensure performance gains and correctness. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces "Negative Stat-cache" (Non-existent Entry Caching) to optimize performance for workloads that frequently poll missing files. It adds a new configuration parameter enable-nonexistent-entry-caching to control this behavior, which is disabled by default. The implementation updates the stat bucket logic to conditionally store and retrieve negative entries and modifies directory lookup to short-circuit on confirmed cache hits. Feedback focused on significant formatting regressions in the metrics/ package, specifically manual edits to auto-generated code and violations of standard Go formatting conventions in metric_handle.go and noop_metrics.go.
| const ( | ||
| FsErrorCategoryDEVICEERRORAttr FsErrorCategory = "DEVICE_ERROR" | ||
| FsErrorCategoryDIRNOTEMPTYAttr FsErrorCategory = "DIR_NOT_EMPTY" | ||
| FsErrorCategoryFILEDIRERRORAttr FsErrorCategory = "FILE_DIR_ERROR" | ||
| FsErrorCategoryFILEEXISTSAttr FsErrorCategory = "FILE_EXISTS" | ||
| FsErrorCategoryINTERRUPTERRORAttr FsErrorCategory = "INTERRUPT_ERROR" | ||
| FsErrorCategoryINVALIDARGUMENTAttr FsErrorCategory = "INVALID_ARGUMENT" | ||
| FsErrorCategoryINVALIDOPERATIONAttr FsErrorCategory = "INVALID_OPERATION" | ||
| FsErrorCategoryIOERRORAttr FsErrorCategory = "IO_ERROR" | ||
| FsErrorCategoryMISCERRORAttr FsErrorCategory = "MISC_ERROR" | ||
| FsErrorCategoryNETWORKERRORAttr FsErrorCategory = "NETWORK_ERROR" | ||
| FsErrorCategoryNOTADIRAttr FsErrorCategory = "NOT_A_DIR" | ||
| FsErrorCategoryNOTIMPLEMENTEDAttr FsErrorCategory = "NOT_IMPLEMENTED" | ||
| FsErrorCategoryNOFILEORDIRAttr FsErrorCategory = "NO_FILE_OR_DIR" | ||
| FsErrorCategoryPERMERRORAttr FsErrorCategory = "PERM_ERROR" | ||
| FsErrorCategoryDEVICEERRORAttr FsErrorCategory = "DEVICE_ERROR" | ||
| FsErrorCategoryDIRNOTEMPTYAttr FsErrorCategory = "DIR_NOT_EMPTY" | ||
| FsErrorCategoryFILEDIRERRORAttr FsErrorCategory = "FILE_DIR_ERROR" | ||
| FsErrorCategoryFILEEXISTSAttr FsErrorCategory = "FILE_EXISTS" | ||
| FsErrorCategoryINTERRUPTERRORAttr FsErrorCategory = "INTERRUPT_ERROR" | ||
| FsErrorCategoryINVALIDARGUMENTAttr FsErrorCategory = "INVALID_ARGUMENT" | ||
| FsErrorCategoryINVALIDOPERATIONAttr FsErrorCategory = "INVALID_OPERATION" | ||
| FsErrorCategoryIOERRORAttr FsErrorCategory = "IO_ERROR" | ||
| FsErrorCategoryMISCERRORAttr FsErrorCategory = "MISC_ERROR" | ||
| FsErrorCategoryNETWORKERRORAttr FsErrorCategory = "NETWORK_ERROR" | ||
| FsErrorCategoryNOTADIRAttr FsErrorCategory = "NOT_A_DIR" | ||
| FsErrorCategoryNOTIMPLEMENTEDAttr FsErrorCategory = "NOT_IMPLEMENTED" | ||
| FsErrorCategoryNOFILEORDIRAttr FsErrorCategory = "NO_FILE_OR_DIR" | ||
| FsErrorCategoryPERMERRORAttr FsErrorCategory = "PERM_ERROR" | ||
| FsErrorCategoryPROCESSRESOURCEMGMTERRORAttr FsErrorCategory = "PROCESS_RESOURCE_MGMT_ERROR" | ||
| FsErrorCategoryTOOMANYOPENFILESAttr FsErrorCategory = "TOO_MANY_OPEN_FILES" | ||
| FsErrorCategoryTOOMANYOPENFILESAttr FsErrorCategory = "TOO_MANY_OPEN_FILES" | ||
| ) | ||
|
|
||
| // FsOp is a custom type for the fs_op attribute. | ||
| type FsOp string | ||
|
|
||
| const ( | ||
| FsOpBatchForgetAttr FsOp = "BatchForget" | ||
| FsOpCreateFileAttr FsOp = "CreateFile" | ||
| FsOpCreateLinkAttr FsOp = "CreateLink" | ||
| FsOpCreateSymlinkAttr FsOp = "CreateSymlink" | ||
| FsOpFlushFileAttr FsOp = "FlushFile" | ||
| FsOpForgetInodeAttr FsOp = "ForgetInode" | ||
| FsOpBatchForgetAttr FsOp = "BatchForget" | ||
| FsOpCreateFileAttr FsOp = "CreateFile" | ||
| FsOpCreateLinkAttr FsOp = "CreateLink" | ||
| FsOpCreateSymlinkAttr FsOp = "CreateSymlink" | ||
| FsOpFlushFileAttr FsOp = "FlushFile" | ||
| FsOpForgetInodeAttr FsOp = "ForgetInode" | ||
| FsOpGetInodeAttributesAttr FsOp = "GetInodeAttributes" | ||
| FsOpLookUpInodeAttr FsOp = "LookUpInode" | ||
| FsOpMkDirAttr FsOp = "MkDir" | ||
| FsOpMkNodeAttr FsOp = "MkNode" | ||
| FsOpOpenDirAttr FsOp = "OpenDir" | ||
| FsOpOpenFileAttr FsOp = "OpenFile" | ||
| FsOpOthersAttr FsOp = "Others" | ||
| FsOpReadDirAttr FsOp = "ReadDir" | ||
| FsOpReadDirPlusAttr FsOp = "ReadDirPlus" | ||
| FsOpReadFileAttr FsOp = "ReadFile" | ||
| FsOpReadSymlinkAttr FsOp = "ReadSymlink" | ||
| FsOpReleaseDirHandleAttr FsOp = "ReleaseDirHandle" | ||
| FsOpReleaseFileHandleAttr FsOp = "ReleaseFileHandle" | ||
| FsOpRenameAttr FsOp = "Rename" | ||
| FsOpRmDirAttr FsOp = "RmDir" | ||
| FsOpLookUpInodeAttr FsOp = "LookUpInode" | ||
| FsOpMkDirAttr FsOp = "MkDir" | ||
| FsOpMkNodeAttr FsOp = "MkNode" | ||
| FsOpOpenDirAttr FsOp = "OpenDir" | ||
| FsOpOpenFileAttr FsOp = "OpenFile" | ||
| FsOpOthersAttr FsOp = "Others" | ||
| FsOpReadDirAttr FsOp = "ReadDir" | ||
| FsOpReadDirPlusAttr FsOp = "ReadDirPlus" | ||
| FsOpReadFileAttr FsOp = "ReadFile" | ||
| FsOpReadSymlinkAttr FsOp = "ReadSymlink" | ||
| FsOpReleaseDirHandleAttr FsOp = "ReleaseDirHandle" | ||
| FsOpReleaseFileHandleAttr FsOp = "ReleaseFileHandle" | ||
| FsOpRenameAttr FsOp = "Rename" | ||
| FsOpRmDirAttr FsOp = "RmDir" | ||
| FsOpSetInodeAttributesAttr FsOp = "SetInodeAttributes" | ||
| FsOpSyncFileAttr FsOp = "SyncFile" | ||
| FsOpUnlinkAttr FsOp = "Unlink" | ||
| FsOpWriteFileAttr FsOp = "WriteFile" | ||
| FsOpSyncFileAttr FsOp = "SyncFile" | ||
| FsOpUnlinkAttr FsOp = "Unlink" | ||
| FsOpWriteFileAttr FsOp = "WriteFile" | ||
| ) | ||
|
|
||
| // GcsMethod is a custom type for the gcs_method attribute. | ||
| type GcsMethod string | ||
|
|
||
| const ( | ||
| GcsMethodComposeObjectsAttr GcsMethod = "ComposeObjects" | ||
| GcsMethodCopyObjectAttr GcsMethod = "CopyObject" | ||
| GcsMethodComposeObjectsAttr GcsMethod = "ComposeObjects" | ||
| GcsMethodCopyObjectAttr GcsMethod = "CopyObject" | ||
| GcsMethodCreateAppendableObjectWriterAttr GcsMethod = "CreateAppendableObjectWriter" | ||
| GcsMethodCreateFolderAttr GcsMethod = "CreateFolder" | ||
| GcsMethodCreateObjectAttr GcsMethod = "CreateObject" | ||
| GcsMethodCreateObjectChunkWriterAttr GcsMethod = "CreateObjectChunkWriter" | ||
| GcsMethodDeleteFolderAttr GcsMethod = "DeleteFolder" | ||
| GcsMethodDeleteObjectAttr GcsMethod = "DeleteObject" | ||
| GcsMethodFinalizeUploadAttr GcsMethod = "FinalizeUpload" | ||
| GcsMethodFlushPendingWritesAttr GcsMethod = "FlushPendingWrites" | ||
| GcsMethodGetFolderAttr GcsMethod = "GetFolder" | ||
| GcsMethodListObjectsAttr GcsMethod = "ListObjects" | ||
| GcsMethodMoveObjectAttr GcsMethod = "MoveObject" | ||
| GcsMethodMultiRangeDownloaderAddAttr GcsMethod = "MultiRangeDownloader::Add" | ||
| GcsMethodNewMultiRangeDownloaderAttr GcsMethod = "NewMultiRangeDownloader" | ||
| GcsMethodNewReaderAttr GcsMethod = "NewReader" | ||
| GcsMethodRenameFolderAttr GcsMethod = "RenameFolder" | ||
| GcsMethodStatObjectAttr GcsMethod = "StatObject" | ||
| GcsMethodUpdateObjectAttr GcsMethod = "UpdateObject" | ||
| GcsMethodCreateFolderAttr GcsMethod = "CreateFolder" | ||
| GcsMethodCreateObjectAttr GcsMethod = "CreateObject" | ||
| GcsMethodCreateObjectChunkWriterAttr GcsMethod = "CreateObjectChunkWriter" | ||
| GcsMethodDeleteFolderAttr GcsMethod = "DeleteFolder" | ||
| GcsMethodDeleteObjectAttr GcsMethod = "DeleteObject" | ||
| GcsMethodFinalizeUploadAttr GcsMethod = "FinalizeUpload" | ||
| GcsMethodFlushPendingWritesAttr GcsMethod = "FlushPendingWrites" | ||
| GcsMethodGetFolderAttr GcsMethod = "GetFolder" | ||
| GcsMethodListObjectsAttr GcsMethod = "ListObjects" | ||
| GcsMethodMoveObjectAttr GcsMethod = "MoveObject" | ||
| GcsMethodMultiRangeDownloaderAddAttr GcsMethod = "MultiRangeDownloader::Add" | ||
| GcsMethodNewMultiRangeDownloaderAttr GcsMethod = "NewMultiRangeDownloader" | ||
| GcsMethodNewReaderAttr GcsMethod = "NewReader" | ||
| GcsMethodRenameFolderAttr GcsMethod = "RenameFolder" | ||
| GcsMethodStatObjectAttr GcsMethod = "StatObject" | ||
| GcsMethodUpdateObjectAttr GcsMethod = "UpdateObject" | ||
| ) | ||
|
|
||
| // IoMethod is a custom type for the io_method attribute. | ||
| type IoMethod string | ||
|
|
||
| const ( | ||
| IoMethodClosedAttr IoMethod = "closed" | ||
| IoMethodOpenedAttr IoMethod = "opened" | ||
| ) | ||
|
|
||
| // ReadType is a custom type for the read_type attribute. | ||
| type ReadType string | ||
|
|
||
| const ( | ||
| ReadTypeBufferedAttr ReadType = "Buffered" | ||
| ReadTypeParallelAttr ReadType = "Parallel" | ||
| ReadTypeRandomAttr ReadType = "Random" | ||
| ReadTypeBufferedAttr ReadType = "Buffered" | ||
| ReadTypeParallelAttr ReadType = "Parallel" | ||
| ReadTypeRandomAttr ReadType = "Random" | ||
| ReadTypeSequentialAttr ReadType = "Sequential" | ||
| ReadTypeUnknownAttr ReadType = "Unknown" | ||
| ReadTypeUnknownAttr ReadType = "Unknown" | ||
| ) | ||
|
|
||
| // Reason is a custom type for the reason attribute. | ||
| type Reason string | ||
|
|
||
| const ( | ||
| ReasonInsufficientMemoryAttr Reason = "insufficient_memory" | ||
| ReasonRandomReadDetectedAttr Reason = "random_read_detected" | ||
| ) | ||
|
|
||
| // RequestType is a custom type for the request_type attribute. | ||
| type RequestType string | ||
|
|
||
| const ( | ||
| RequestTypeAttr1Attr RequestType = "attr1" | ||
| RequestTypeAttr2Attr RequestType = "attr2" | ||
| ) | ||
|
|
||
| // RetryErrorCategory is a custom type for the retry_error_category attribute. | ||
| type RetryErrorCategory string | ||
|
|
||
| const ( | ||
| RetryErrorCategoryOTHERERRORSAttr RetryErrorCategory = "OTHER_ERRORS" | ||
| RetryErrorCategoryOTHERERRORSAttr RetryErrorCategory = "OTHER_ERRORS" | ||
| RetryErrorCategorySTALLEDREADREQUESTAttr RetryErrorCategory = "STALLED_READ_REQUEST" | ||
| ) |
There was a problem hiding this comment.
This file contains auto-generated code and should not be edited manually. The current changes have removed blank lines and broken the alignment of constants, which significantly reduces readability. Please revert these manual formatting changes.
References
- Auto-generated code should not be edited manually, even to fix minor formatting or style issues.
There was a problem hiding this comment.
Haven't modified this file and is not present in the change list either.
a7c2886 to
b96f037
Compare
df635c0 to
f3d7283
Compare
This change implements negative entry caching (non-existent path caching) to optimize workloads that aggressively poll missing files (e.g., JupyterLab). - New Flag: --enable-nonexistent-entry-caching (boolean, default false). - Proactive Listing Cache: Empty directory listings (ListObjects returning 0 results) are proactively cached as negative directory entries to fully protect against implicit directory network probes. - VFS Routing: LookUpChild tracks definitive negative cache hits and short-circuits immediately in memory, avoiding network fallback. TAG=agy CONV=3da12d68-a026-456e-9f11-16369d509c21
…uring negative cache short-circuit
e79c36b to
a962ac3
Compare
…abled to prevent stale entries
…ation test and rename proxy log prefix to expose it on Kokoro
| return err | ||
| } | ||
|
|
||
| flagSet.BoolP("enable-nonexistent-entry-caching", "", false, "Cache paths confirmed to not exist (404s) in the unified stat cache.") |
There was a problem hiding this comment.
we already have a negative-stat-cache-ttl. why not put this feature also under that flag.
can we do a bit of thinking here for pros and cons.
Now we have one flag for ttl and another for enabling/disabing the feature
There was a problem hiding this comment.
Should moving to use "negative-ttl-secs" alone be a good measure without needing to maintain two separate flags? I am in favor of the decision.
I chose entry-caching to align with the terminology of 'entries' in the stat cache (which stores directory entries).
|
|
||
| EnableMetadataPrefetch bool `yaml:"enable-metadata-prefetch"` | ||
|
|
||
| EnableNonexistentEntryCaching bool `yaml:"enable-nonexistent-entry-caching"` |
There was a problem hiding this comment.
nit: what does entry-caching mean?
There was a problem hiding this comment.
It was implying "creating an entry".
| b.mu.Lock() | ||
| defer b.mu.Unlock() | ||
|
|
||
| if !b.enableNonexistentEntryCaching { |
There was a problem hiding this comment.
why did you put this check for folders?
There was a problem hiding this comment.
The fastStatBucket handles caching for both files and folders. addNegativeEntryForFolder is called when a folder lookup fails. If enable-nonexistent-entry-caching is set to false, we must ensure we do not cache negative entries for folders either.
This check ensures that when the feature is disabled, we do not add a negative entry for the folder, and instead we explicitly call b.cache.Erase(name) to invalidate any potentially stale positive entry from the cache.
| var fileCacheMiss bool | ||
| fileResult, err := findExplicitInode(ctx, d.Bucket(), NewFileName(d.Name(), name), true) | ||
| if err != nil && !errors.As(err, &cacheMissErr) { | ||
| if fileResult != nil { |
There was a problem hiding this comment.
i didnt follow. why these changes are required
There was a problem hiding this comment.
- Currently,
LookUpChildchecks the cache for a directory first. If it hits a negative cache entry (returning NotFoundError), it immediately returns ENOENT to the kernel without checking if a file with the same name exists. This is incorrect because a file foo can exist even if the directory foo/ does not. - These changes ensure we check the cache for both the
directory (foo/)and thefile (foo). We only short-circuit and return ENOENT (definitive negative hit) if both lookups hit the cache and both confirmed non-existence (i.e., neither was a cache miss). If either check results in a cache miss, we must fall back to GCS (fetchCoreEntity) to ensure correctness.
| return | ||
| } | ||
|
|
||
| if len(listing.MinObjects) == 0 && len(listing.CollapsedRuns) == 0 && b.enableNonexistentEntryCaching && strings.HasSuffix(dirName, "/") { |
There was a problem hiding this comment.
can you add a comment what is this check doing. when does minObjects and collapsed Runs are empty.
There was a problem hiding this comment.
- "I will add a comment. To clarify: MinObjects and CollapsedRuns are empty when GCS returns an empty listing for a directory prefix. This happens when the directory is implicit and all its children have been deleted (meaning it no longer exists), or when the directory never existed. In these cases, if enable-nonexistent-entry-caching is true, we cache the directory path (with a trailing slash) as a negative entry to prevent future redundant listing requests.
- Explicit empty directories contain a placeholder object (e.g., dir/) which is returned in MinObjects, so they are not cached as negative entries. I will add this explanation as a comment in the next commit."
|
Closing this PR in favor of two smaller, independent PRs:
|
Description
This change implements negative entry caching (non-existent path caching) to optimize workloads that aggressively poll missing files (e.g., JupyterLab).
--enable-nonexistent-entry-caching(boolean, defaultfalse).ListObjectsreturning 0 results) are proactively cached as negative directory entries to fully protect against implicit directory network probes.LookUpChildtracks definitive negative cache hits and short-circuits immediately in memory, avoiding network fallback.Benchmark Results
A custom benchmarking script was executed against a locally mounted test bucket to measure performance over a sustained 60-second polling window per scenario (aggregating over 1.3 million total VFS operations). The bucket was configured with a
5snegative cache TTL (--metadata-cache-negative-ttl-secs=5).1. VFS Throughput & Latency Distributions
Memory interception nearly doubles total application throughput while slashing tail latencies by over 40%:
DISABLEDENABLED458,234 lookups877,848 lookups0.1303 ms(130 µs)0.0679 ms(68 µs)0.1171 ms(117 µs)0.0602 ms(60 µs)0.2939 ms(294 µs)0.1729 ms(173 µs)2. TTL Cache Expiration Mechanics (Trace Logs)
Over a 60-second window with a 5-second TTL, exactly 12 cache expirations are mathematically expected.
missing_fileandmissing_file/to refresh the cache).12 expirations × 2 calls = 24 calls. Every other request is intercepted entirely in user-space memory.Link to the issue in case of a bug fix.
https://b.corp.google.com/issues/511786738
Testing details
WrappedSaysNotFound_NegativeCachingDisabled,CacheHit_Negative_Disabled_FetchOnly, andEmptyListing_NegativeCachingto verify strict bypass when disabled and proper tombstone storage when enabled.Any backward incompatible change? If so, please explain.
No.
TAG=agy
CONV=3da12d68-a026-456e-9f11-16369d509c21