Skip to content

[Go SDK] fileio.MatchAll() cannot match all files on GCS #38059

@firehg

Description

@firehg

The GCS filesystem's List() method in go/pkg/beam/io/filesystem/gcs/gcs.go uses filepath.Match(object, obj.Name) to filter listed objects. Since filepath.Match treats * as matching non-separator characters only (and / is the separator on Unix), patterns like gs://bucket/* and gs://bucket/** fail to match any GCS object whose name contains / — which is the vast majority of real-world GCS objects.
The Java and Python SDKs handle this correctly: Java's GcsUtil supports ** for recursive matching, and Python's fnmatch doesn't treat / as special.

Reproduction: fileio.MatchFiles(scope, "gs://my-bucket/*") returns only objects with no / in their name. Objects like dir/subdir/file.txt are silently excluded. Same for "gs://my-bucket/**"

Suggested fix: Replace filepath.Match with a matcher that doesn't treat / as a separator for cloud storage schemes, or add ** support similar to the Java SDK.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions