fix(modeling): reject sharded-checkpoint weight_map entries that escape the checkpoint folder (#4067)#4070
Open
Anai-Guo wants to merge 2 commits into
Open
Conversation
added 2 commits
June 9, 2026 12:14
…pe the checkpoint folder (huggingface#4067)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this fixes
Closes #4067 (path-traversal half of the report).
load_checkpoint_in_modelbuilds the list of shard files to load directly from the untrustedweight_mapof a sharded-checkpoint index (*.index.json):The values come straight from the model author. Because
os.path.joinresolves..segments and discards the prefix entirely when given an absolute path, a malicious or corrupted index can point a "shard" anywhere on the filesystem:{"weight_map": {"w": "../../../../etc/passwd"}} // escapes via traversal {"weight_map": {"w": "/etc/shadow"}} // absolute path wins in os.path.joinThe file is then opened and parsed as a checkpoint shard, which is a path-traversal / arbitrary-file-read primitive (and an easy DoS vector when pointed at a large or special file).
The fix
After resolving each entry against
checkpoint_folder, verify it still lives inside that folder and raiseValueErrorotherwise. Legitimate sharded checkpoints (HFsave_pretrainedoutput) always reference plain sibling filenames, so they are unaffected.The containment check (
startswith(folder + os.sep)) also rejects absolute paths and, on Windows, paths on a different drive.Scope note
This targets the path-traversal vector specifically. The "malicious-model DoS" angle in the title (unbounded shard size / count) is broader and policy-dependent — happy to follow up separately if maintainers want bounds there too.
🤖 Generated with Claude Code