Skip to content

wr.redshift.copy() silently renames columns with spaces due to pyarrow defaulting to flavor='spark' in internal s3.to_parquet call #3293

@choinhet

Description

@choinhet

Describe the bug

In wr.redshift.copy(), the internal call to s3.to_parquet doesn't pass pyarrow_additional_kwargs={'flavor': None} (or '2.0'), so pyarrow defaults to flavor='spark', which calls sanitize_pandas_metadata and renames columns with spaces (e.g. "my col" → "my_col"). The Redshift COPY then fails or maps to the wrong columns because the DDL was generated from the original DataFrame schema.
That will result in an error like 'column "test col" of relation "test_table" does not exist'

How to Reproduce

Minimal DataFrame with a column name containing a space, and calling wr.redshift.copy() with any mode.

Expected behavior

Column names preserved as-is (or error explaining the constraint)

Your project

No response

Screenshots

No response

OS

Win

Python version

3.9.10

AWS SDK for pandas version

3.14.0

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions