Detect archive type by file signature when extension is wrong or missing#120
Merged
davidnewhall merged 2 commits intoFeb 19, 2026
Merged
Conversation
Contributor
Author
Physical test resultsRan a standalone program that creates real archive files with wrong extensions and extracts them using the branch. All pass: Test 1 is the exact scenario from the issue — a RAR-signatured file with The only lint issue in CI is the pre-existing |
bcf07fe to
87cde95
Compare
87cde95 to
c41c694
Compare
When a file has an incorrect or missing extension, xtractr now reads the first bytes of the file to identify the archive type by its signature. Extension-based matching is still tried first; signature detection kicks in only as a fallback.
c41c694 to
5d851bf
Compare
Introduces a custom error type that stashes both the extension-based and signature-based extraction errors so consumers can inspect all failure details via errors.As. Also supports a Warnings slice for non-fatal info like truncated names or extension mismatches.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's going on
Right now xtractr relies entirely on file extensions to figure out what kind of archive it's dealing with. That works great most of the time, but it falls apart when files have incorrect extensions — which happens more often than you'd think. A common example from the issue: a RAR file with a
.gzextension just blows up withgzip: invalid header.What this does
This adds file signature (magic number) detection as a fallback to the existing extension-based matching. The flow is:
So existing behavior is completely preserved — if your extensions are correct, nothing changes. The signature detection only kicks in when something goes wrong.
Supported signatures
ZIP, RAR (v4 + v5), 7z, gzip, bzip2, XZ, zstandard, LZ4, LZMA, brotli, AR/DEB, RPM, and ISO9660 (at all three standard offsets). TAR doesn't have a reliable magic number so it stays extension-only, which is fine.
Example
A RAR file named
movie.gzthat previously failed:Now gets detected as RAR by its
Rar!signature and extracts correctly.How I tested it
Added
magic_test.gowith tests covering:.rar, real zip named.gz, gzip with completely unknown.foobarextension — all extract successfullyAll existing tests continue to pass.
Files changed
magic.go— signature table and detection logicfiles.go— modifiedExtractFile()to fall back to signature detectionmagic_test.go— tests