Skip to content

[Snapshot Cache] Provide new snapshot interface#1405

Open
valerian-roche wants to merge 3 commits intomainfrom
vr/snapshot-v2
Open

[Snapshot Cache] Provide new snapshot interface#1405
valerian-roche wants to merge 3 commits intomainfrom
vr/snapshot-v2

Conversation

@valerian-roche
Copy link
Copy Markdown
Contributor

Description

This PR adds a new Snapshot type to provide to the Snapshot Cache. This new snapshot types.Snapshot has the following properties:

  • a clean public/private interface split, to allow simpler further evolution
  • a much smaller public interface
  • uses CachedResource instances instead of types.Resource internally, allowing mutualization of serialization and hashing when computing and serializing responses
  • allow to build it from a new public SnapshotResource type, allowing extension (e.g. providing the resource version and TTL) in a uniform entry (e.g. avoiding the TTL split). This allows providing fully opaque resources without the need for a GetName method
  • allows reusing sub-snapshots per type. This provides the capability for instance to reuse the RDS snapshot when only changing the CDS one. As the version will not have changed resources will not be sent to envoy for the unchanged types. It is also possible to share the same sub-snapshot across snapshots.

The legacy snapshot remains present and functional for now. Users are encouraged to use the new type, but the snapshot cache will keep using the same. The cache.Snapshot interface no longer includes the methods GetResources(typeURL string) map[string]types.Resource and GetResourcesAndTTL(typeURL string) map[string]types.ResourceWithTTL but the legacy snapshot still implements them, and interface casting can still be used to access them. The cache.Snapshot interface now includes the additional GetTypeSnapshot(typeURL string) types.TypeSnapshot method. Users providing their own snapshot type will have to implement it. It is expected to be an easy addition as the types.NewTypeSnapshot should be a simple adaptation layer. If this creates a problem please create an issue on the repository.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How has this been tested?

On top of additional tests in the PR itself, this has been in production through the Datadog fork for multiple months.

@valerian-roche valerian-roche force-pushed the vr/snapshot-v2 branch 2 times, most recently from 7b2b311 to 9facddb Compare February 16, 2026 23:43
@valerian-roche valerian-roche marked this pull request as ready for review February 17, 2026 19:04
if len(resourcesToReturn) == 0 && len(deletedResources) == 0 && !replyIfEmpty {
// Nothing's changed
return nil
return nil, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite strange to have a nil, nil return

Copy link
Copy Markdown
Contributor

@mmorel-35 mmorel-35 Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that it is an already used pattern, ninil advise against it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm used to use it in a lot of places. Otherwise we end up returning errors that are not errors, which are, imo, a worst pattern. Will take a look at how to update this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add a boolean but I don't think it'd make the interface clearer

}

// WithMarshaledResource enables the user to provide the already marshaled bytes if they have them.
// Those bytes should strive at being consistent if the object has not changed (beware protobuf non-deterministic marshaling)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Those bytes should strive at being consistent if the object has not changed (beware protobuf non-deterministic marshaling)
// Those bytes should strive for consistency if the object has not changed (beware of protobuf non-deterministic marshaling)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

}

// WithResourceTTL sets a TTL on the resource, that will be sent to the client with the payload.
func WithResourceTTL(ttl *time.Duration) CachedResourceOption {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe WithTTL? It sounds like you should be able to set the full Resource along with the TTL given the current name

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the new name definitely makes sense, updated (along with WithResourceVersion)

}
}

var deltaResourceTypeURL = "type.googleapis.com/" + string(proto.MessageName(&discovery.Resource{}))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to something non-delta specific?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// GetSotwResource returns the serialized resource to return within a sotw response, including TTL handling if applicable.
// It uses lazily computed and cached fields to ensure a resource is serialized at most once per update.
func (c *CachedResource) GetSotwResource(isHeartbeat bool) (*anypb.Any, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should call this method and the "delta" version something like

GetAnyResource()

vs

GetDiscoveryResource()

or something along those lines. My thought here is that this CachedResource doesn't necessarily care about Sotw or Delta, it just cares about different ways to serialize a discovery response given a provided marshaled resource. IMHO the method names should be description as to how the data is returned rather than the protocol that it will be used with

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, especially as Sotw does use Discovery also when TTL is used

}

// getMarshaledResource lazily marshals the resource and returns the bytes.
func (c *CachedResource) getMarshaledResource() ([]byte, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete this method? It seems to only be called in the below methods and it just returns c.marshalfunc() anyways?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,116 @@
package types
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good step in the right direction. That being said, I think we may want to think about the organization, type names, and package structure.

Right now I think the public API isn't as declarative as it could be which is yielding a bit of a clunky user experience. I'm wondering if we should play with the idea of "Snapshots" being per type, and then we introduce something else for nodes "Snapshot" consisting of many type snapshots. As an example we could introduce another package for the higher level node grouping of a per type snapshot list.

I think exploring how the imports look would be a really good way to think about this. Something along the lines of:

routeSnapshot := type.Snapshot{}
proxySnapshot := node.Snapshot{
    Resources: map[string]type.Snapshot{
        routeTypeURL: routeSnapshot
    }
}

I personally also like the ability to sometimes declare objects without constructors but that's just a preference thing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a look at this. The idea of package to match node or type looks good, but I'm unsure how it will actually end up reflecting in user code. If they already have packages with those (common) names they will end up with renaming that might makes it more confusing than the type itself carrying the meaning. Maybe something along the lines of cache.TypeSnapshot and cache.NodeSnapshot would carry a clear state while avoiding relying on the package name

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to call them types.NodeSnapshot and types.TypeSnapshot. If you feel strongly about it I can update to type and node package names, though it'd require using public interface only, which may limit how much we can do to connect them

@@ -0,0 +1,221 @@
package internal
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think we should avoid the use of an "internal" package. See my comment below about potentially adding new packages and/or files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine renaming this. I spent some time on this and could not figure something which ended up not requiring renaming in all imports. Imo it is not that important though as part of the private API

valerian-roche and others added 3 commits March 8, 2026 20:50
This commit adds a new Snapshot type to provide to the Snapshot Cache.
This new snapshot `types.Snapshot` has the following properties:
 - a clean public/private interface split, to allow simpler further evolution
 - a much smaller public interface
 - uses `CachedResource` instances instead of `types.Resource` internally, allowing mutualization of serialization and hashing when computing and serializing responses
 - allow to build it from a new public `SnapshotResource` type, allowing extension (e.g. providing the resource version and TTL) in a uniform entry (e.g. avoiding the TTL split). This allows providing fully opaque resources without the need for a `GetName` method
 - allows reusing sub-snapshots per type. This provides the capability for instance to reuse the RDS snapshot when only changing the CDS one. As the version will not have changed resources will not be sent to envoy for the unchanged types. It is also possible to share the same sub-snapshot across snapshots.

The legacy snapshot remains present and functional for now. Users are encouraged to use the new type, but the snapshot cache will keep using the same.
The `cache.Snapshot` interface no longer includes the methods `GetResources(typeURL string) map[string]types.Resource` and `GetResourcesAndTTL(typeURL string) map[string]types.ResourceWithTTL` but the legacy snapshot still implements them, and interface casting can still be used to access them.
The `cache.Snapshot` interface now includes the additional `GetTypeSnapshot(typeURL string) types.TypeSnapshot` method. Users providing their own snapshot type will have to implement it. It is expected to be an easy addition as the `types.NewTypeSnapshot` should be a simple adaptation layer. If this creates a problem please create an issue on the repository.

Signed-off-by: Valerian Roche <valerian.roche@gmail.com>
Copy link
Copy Markdown
Contributor Author

@valerian-roche valerian-roche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed some renaming

if len(resourcesToReturn) == 0 && len(deletedResources) == 0 && !replyIfEmpty {
// Nothing's changed
return nil
return nil, nil
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add a boolean but I don't think it'd make the interface clearer

}

// WithMarshaledResource enables the user to provide the already marshaled bytes if they have them.
// Those bytes should strive at being consistent if the object has not changed (beware protobuf non-deterministic marshaling)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

}

// WithResourceTTL sets a TTL on the resource, that will be sent to the client with the payload.
func WithResourceTTL(ttl *time.Duration) CachedResourceOption {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the new name definitely makes sense, updated (along with WithResourceVersion)

}

// getMarshaledResource lazily marshals the resource and returns the bytes.
func (c *CachedResource) getMarshaledResource() ([]byte, error) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

var deltaResourceTypeURL = "type.googleapis.com/" + string(proto.MessageName(&discovery.Resource{}))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,116 @@
package types
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to call them types.NodeSnapshot and types.TypeSnapshot. If you feel strongly about it I can update to type and node package names, though it'd require using public interface only, which may limit how much we can do to connect them

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants