Add Iceberg to AlloyDB YAML template#3932
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new YAML-based Dataflow template designed to facilitate batch data migration from Apache Iceberg tables to AlloyDB. The changes include the core template definition, parameter configurations, documentation, and a robust integration test to ensure reliability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the "Iceberg to AlloyDB (YAML)" batch pipeline template, including its Java metadata interface, YAML pipeline definition, configuration options, documentation, and integration tests. The review feedback identifies several critical issues: syntax errors in the YAML template's filesToCopy and requirements properties, the need to make the alloydbTable parameter mandatory rather than optional, the removal of the unused and unsupported query parameter across the interface and configurations, and a correction to the Terraform directory path in the README.
| filesToCopy: > | ||
| {"main.py", "requirements.txt", "options/iceberg_options.yaml", "options/alloydb_options.yaml"} |
There was a problem hiding this comment.
The filesToCopy property is defined as a folded block scalar string containing a set-like syntax {"main.py", ...}. In YAML, this is parsed as a single string rather than a list of strings, which will cause template validation or staging to fail. It should be defined as a standard YAML sequence (list).
filesToCopy:
- "main.py"
- "requirements.txt"
- "options/iceberg_options.yaml"
- "options/alloydb_options.yaml"| requirements: { | ||
| "The Input Iceberg table must exist and be accessible through the provided catalog.", | ||
| "The Output AlloyDB instance must exist and the target table must exist or be created." | ||
| } |
There was a problem hiding this comment.
The requirements property is defined using curly braces {} which represents a map/dictionary in YAML flow style, but contains a comma-separated list of strings without keys. This is invalid YAML syntax and will fail to parse. It should be defined as a standard YAML sequence (list).
requirements:
- "The Input Iceberg table must exist and be accessible through the provided catalog."
- "The Output AlloyDB instance must exist and the target table must exist or be created."| query: {{ query }} | ||
| connection_properties: {{ connectionProperties }} |
There was a problem hiding this comment.
The query parameter is passed to the WriteToPostgres transform. However, WriteToPostgres (and database write transforms in general) writes the input PCollection directly to a target table and does not support executing a custom query (which is typically a SELECT statement used for reading). This parameter is unused and should be removed from the write configuration to avoid validation errors or confusion.
connection_properties: {{ connectionProperties }}| @TemplateParameter.Text( | ||
| order = 12, | ||
| name = "alloydbTable", | ||
| optional = true, | ||
| description = "The name of the AlloyDB table.", | ||
| helpText = "The name of the database table.", | ||
| example = "public.my_table") | ||
| String getAlloydbTable(); |
There was a problem hiding this comment.
The alloydbTable parameter is marked as optional = true. However, since this pipeline writes data from Iceberg to AlloyDB, a target table name is required for the WriteToPostgres transform to know where to write the records. This parameter should be marked as required (optional = false and @Validation.Required) to prevent runtime failures when the template is run without it. Validating mandatory parameters at the entry point prevents NullPointerExceptions in downstream transforms.
| @TemplateParameter.Text( | |
| order = 12, | |
| name = "alloydbTable", | |
| optional = true, | |
| description = "The name of the AlloyDB table.", | |
| helpText = "The name of the database table.", | |
| example = "public.my_table") | |
| String getAlloydbTable(); | |
| @TemplateParameter.Text( | |
| order = 12, | |
| name = "alloydbTable", | |
| optional = false, | |
| description = "The name of the AlloyDB table.", | |
| helpText = "The name of the database table.", | |
| example = "public.my_table") | |
| @Validation.Required | |
| String getAlloydbTable(); |
References
- Ensure that mandatory parameters are validated for nullability at the entry point or in upstream components to prevent NullPointerExceptions (especially during auto-unboxing) in downstream transforms, avoiding the need for redundant null checks.
| - order: 5 | ||
| name: "alloydbTable" | ||
| description: "The name of the AlloyDB table." | ||
| help: "The name of the database table." | ||
| example: "public.my_table" | ||
| required: false | ||
| type: text |
There was a problem hiding this comment.
The alloydbTable parameter is marked as required: false. Since this is a write-only pipeline to AlloyDB, the target table name is required for the write operation to succeed. It should be marked as required: true to ensure validation at the entry point.
- order: 5
name: "alloydbTable"
description: "The name of the AlloyDB table."
help: "The name of the database table."
example: "public.my_table"
required: true
type: textReferences
- Ensure that mandatory parameters are validated for nullability at the entry point or in upstream components to prevent NullPointerExceptions (especially during auto-unboxing) in downstream transforms, avoiding the need for redundant null checks.
| @TemplateParameter.Text( | ||
| order = 13, | ||
| name = "query", | ||
| optional = true, | ||
| description = "The SQL query/statement to execute.", | ||
| helpText = "The SQL query/statement to execute on the source/sink.", | ||
| example = "SELECT * FROM my_table WHERE status = 'active'") | ||
| String getQuery(); |
There was a problem hiding this comment.
The query parameter is defined here, but since this is a batch write-only pipeline from Iceberg to AlloyDB, a SQL query is not supported or used by the WriteToPostgres sink. To prevent user confusion and avoid exposing unused parameters, the query parameter should be removed from the template interface.
| - order: 6 | ||
| name: "query" | ||
| description: "The SQL query/statement to execute." | ||
| help: "The SQL query/statement to execute on the source/sink." | ||
| example: "SELECT * FROM my_table WHERE status = 'active'" | ||
| required: false | ||
| type: text |
| [terraform workflow](https://developer.hashicorp.com/terraform/intro/core-workflow): | ||
|
|
||
| ```shell | ||
| cd v2/yaml/terraform/Iceberg_To_AlloyDB_Yaml |
There was a problem hiding this comment.
The directory path is documented as v2/yaml/terraform/..., but the module is located under yaml/terraform/... in this repository. This should be corrected to avoid confusion when users try to navigate to the Terraform directory.
| cd v2/yaml/terraform/Iceberg_To_AlloyDB_Yaml | |
| cd yaml/terraform/Iceberg_To_AlloyDB_Yaml |
Successful IT JobID : 2026-06-19_09_54_35-1302943962825921393