The GCS filesystem's List() method in go/pkg/beam/io/filesystem/gcs/gcs.go uses filepath.Match(object, obj.Name) to filter listed objects. Since filepath.Match treats * as matching non-separator characters only (and / is the separator on Unix), patterns like gs://bucket/* and gs://bucket/** fail to match any GCS object whose name contains / — which is the vast majority of real-world GCS objects.
The Java and Python SDKs handle this correctly: Java's GcsUtil supports ** for recursive matching, and Python's fnmatch doesn't treat / as special.
Reproduction: fileio.MatchFiles(scope, "gs://my-bucket/*") returns only objects with no / in their name. Objects like dir/subdir/file.txt are silently excluded. Same for "gs://my-bucket/**"
Suggested fix: Replace filepath.Match with a matcher that doesn't treat / as a separator for cloud storage schemes, or add ** support similar to the Java SDK.
The GCS filesystem's
List()method ingo/pkg/beam/io/filesystem/gcs/gcs.gousesfilepath.Match(object, obj.Name)to filter listed objects. Sincefilepath.Matchtreats*as matching non-separator characters only (and/is the separator on Unix), patterns likegs://bucket/*andgs://bucket/**fail to match any GCS object whose name contains / — which is the vast majority of real-world GCS objects.The Java and Python SDKs handle this correctly: Java's
GcsUtilsupports**for recursive matching, and Python'sfnmatchdoesn't treat/as special.Reproduction:
fileio.MatchFiles(scope, "gs://my-bucket/*")returns only objects with no/in their name. Objects likedir/subdir/file.txtare silently excluded. Same for"gs://my-bucket/**"Suggested fix: Replace
filepath.Matchwith a matcher that doesn't treat/as a separator for cloud storage schemes, or add**support similar to the Java SDK.