Skip to content

Add Iceberg to AlloyDB YAML template#3932

Open
ihebimad wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
ihebimad:feature/iceberg-to-alloydb
Open

Add Iceberg to AlloyDB YAML template#3932
ihebimad wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
ihebimad:feature/iceberg-to-alloydb

Conversation

@ihebimad

Copy link
Copy Markdown
Contributor

Successful IT JobID : 2026-06-19_09_54_35-1302943962825921393

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new YAML-based Dataflow template designed to facilitate batch data migration from Apache Iceberg tables to AlloyDB. The changes include the core template definition, parameter configurations, documentation, and a robust integration test to ensure reliability.

Highlights

  • New Template Addition: Added a new YAML-based Dataflow template for migrating data from Apache Iceberg tables to AlloyDB.
  • Configuration and Documentation: Included necessary configuration files, documentation, and parameter definitions to support the new Iceberg to AlloyDB pipeline.
  • Integration Testing: Implemented a comprehensive integration test suite to verify the end-to-end functionality of the new template.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the "Iceberg to AlloyDB (YAML)" batch pipeline template, including its Java metadata interface, YAML pipeline definition, configuration options, documentation, and integration tests. The review feedback identifies several critical issues: syntax errors in the YAML template's filesToCopy and requirements properties, the need to make the alloydbTable parameter mandatory rather than optional, the removal of the unused and unsupported query parameter across the interface and configurations, and a correction to the Terraform directory path in the README.

Comment on lines +11 to +12
filesToCopy: >
{"main.py", "requirements.txt", "options/iceberg_options.yaml", "options/alloydb_options.yaml"}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The filesToCopy property is defined as a folded block scalar string containing a set-like syntax {"main.py", ...}. In YAML, this is parsed as a single string rather than a list of strings, which will cause template validation or staging to fail. It should be defined as a standard YAML sequence (list).

  filesToCopy:
    - "main.py"
    - "requirements.txt"
    - "options/iceberg_options.yaml"
    - "options/alloydb_options.yaml"

Comment on lines +14 to +17
requirements: {
"The Input Iceberg table must exist and be accessible through the provided catalog.",
"The Output AlloyDB instance must exist and the target table must exist or be created."
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The requirements property is defined using curly braces {} which represents a map/dictionary in YAML flow style, but contains a comma-separated list of strings without keys. This is invalid YAML syntax and will fail to parse. It should be defined as a standard YAML sequence (list).

  requirements:
    - "The Input Iceberg table must exist and be accessible through the provided catalog."
    - "The Output AlloyDB instance must exist and the target table must exist or be created."

Comment on lines +51 to +52
query: {{ query }}
connection_properties: {{ connectionProperties }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The query parameter is passed to the WriteToPostgres transform. However, WriteToPostgres (and database write transforms in general) writes the input PCollection directly to a target table and does not support executing a custom query (which is typically a SELECT statement used for reading). This parameter is unused and should be removed from the write configuration to avoid validation errors or confusion.

        connection_properties: {{ connectionProperties }}

Comment on lines +152 to +159
@TemplateParameter.Text(
order = 12,
name = "alloydbTable",
optional = true,
description = "The name of the AlloyDB table.",
helpText = "The name of the database table.",
example = "public.my_table")
String getAlloydbTable();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The alloydbTable parameter is marked as optional = true. However, since this pipeline writes data from Iceberg to AlloyDB, a target table name is required for the WriteToPostgres transform to know where to write the records. This parameter should be marked as required (optional = false and @Validation.Required) to prevent runtime failures when the template is run without it. Validating mandatory parameters at the entry point prevents NullPointerExceptions in downstream transforms.

Suggested change
@TemplateParameter.Text(
order = 12,
name = "alloydbTable",
optional = true,
description = "The name of the AlloyDB table.",
helpText = "The name of the database table.",
example = "public.my_table")
String getAlloydbTable();
@TemplateParameter.Text(
order = 12,
name = "alloydbTable",
optional = false,
description = "The name of the AlloyDB table.",
helpText = "The name of the database table.",
example = "public.my_table")
@Validation.Required
String getAlloydbTable();
References
  1. Ensure that mandatory parameters are validated for nullability at the entry point or in upstream components to prevent NullPointerExceptions (especially during auto-unboxing) in downstream transforms, avoiding the need for redundant null checks.

Comment on lines +32 to +38
- order: 5
name: "alloydbTable"
description: "The name of the AlloyDB table."
help: "The name of the database table."
example: "public.my_table"
required: false
type: text

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The alloydbTable parameter is marked as required: false. Since this is a write-only pipeline to AlloyDB, the target table name is required for the write operation to succeed. It should be marked as required: true to ensure validation at the entry point.

      - order: 5
        name: "alloydbTable"
        description: "The name of the AlloyDB table."
        help: "The name of the database table."
        example: "public.my_table"
        required: true
        type: text
References
  1. Ensure that mandatory parameters are validated for nullability at the entry point or in upstream components to prevent NullPointerExceptions (especially during auto-unboxing) in downstream transforms, avoiding the need for redundant null checks.

Comment on lines +161 to +168
@TemplateParameter.Text(
order = 13,
name = "query",
optional = true,
description = "The SQL query/statement to execute.",
helpText = "The SQL query/statement to execute on the source/sink.",
example = "SELECT * FROM my_table WHERE status = 'active'")
String getQuery();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The query parameter is defined here, but since this is a batch write-only pipeline from Iceberg to AlloyDB, a SQL query is not supported or used by the WriteToPostgres sink. To prevent user confusion and avoid exposing unused parameters, the query parameter should be removed from the template interface.

Comment on lines +39 to +45
- order: 6
name: "query"
description: "The SQL query/statement to execute."
help: "The SQL query/statement to execute on the source/sink."
example: "SELECT * FROM my_table WHERE status = 'active'"
required: false
type: text

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The query parameter is defined in the AlloyDB options, but it is not used or supported by the WriteToPostgres sink in this write-only pipeline. It should be removed to keep the template options clean and avoid confusion.

[terraform workflow](https://developer.hashicorp.com/terraform/intro/core-workflow):

```shell
cd v2/yaml/terraform/Iceberg_To_AlloyDB_Yaml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The directory path is documented as v2/yaml/terraform/..., but the module is located under yaml/terraform/... in this repository. This should be corrected to avoid confusion when users try to navigate to the Terraform directory.

Suggested change
cd v2/yaml/terraform/Iceberg_To_AlloyDB_Yaml
cd yaml/terraform/Iceberg_To_AlloyDB_Yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant