Skip to content

[FEATURE] Add cluster by option in DeltaTableWriter #214

Description

@riccamini

Is your feature request related to a problem? Please describe.

Koheesio DeltaTableWriter does not allow to specify any clustering columns.

Describe the solution you'd like

  • Add the clustering_columns parameter to the model
  • add validation to clustering_columns and partition_by parameters (they cannot be specified at the same time)
  • modify the __data_frame_writer to apply clustering if provided

Describe alternatives you've considered

An alternative could have been to have that handled in the DeltaTableStep class, but the options that can be specified upon table creation do not include clustering columns (a list of complete properties can be found here)

EDIT
When a table is created with the clusterBy method called, the delta table property "clusteringColumns" will be set. I have tried creating a table setting that as an option but it will not affect the clustering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions