Skip to content

BUG: tarball extraction fails with "file exists" error when archive contains duplicate directory entries #5023

@juanis2112

Description

@juanis2112

Describe the bug
When running Scorecard against certain repositories, the tool exits early with an internal error due to a mkdir: file exists error during tarball extraction. This causes the repository to be skipped entirely.

Error message:

Skipping https://github.com/alterm4nn/ChronoZoom: run: RepoClient.LocalPath: error during tarballHandler.setup: internal error: error during os.Mkdir: mkdir /var/folders/h5/8mnm7jy57ync65g1j_5knyk40000gn/T/repo3602233882/Source/Chronozoom.UI/dumps: file exists

Reproduction steps
Steps to reproduce the behavior:

  1. Run scorecards with this repo: https://github.com/alterm4nn/ChronoZoom

Expected behavior
If tarballs contain file entries that share a parent directory, there shouldn't be an error returned.

Additional context
In clients/githubrepo/tarball.go, the extractTarball() function uses os.Mkdir to create directories while iterating over tarball entries. Some tarballs contain multiple file entries that share a parent directory. When the second entry is processed, the parent directory already exists and os.Mkdir returns a "file exists" error, causing the whole process to abort.

Suggested fix
Replacing os.Mkdir with os.MkdirAll in both locations inside extractTarball() (both for GitHub and GitLab) fixes this since os.MkdirAll doesn't return an error when the target exists.

I already tested the fix locally, let me know if this fix sounds okay and if so, I'm happy to submit a PR. :)

P.D: I am running Scorecard across a large number of repositories as part of a security analysis for the University of California system and have encountered some other bugs along the way, so I'm planning to open more issues and I'm also happy to contribute PRs to fix them. Just wanted to give you a heads up, let me know if that's ok! 😊

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions