[CLI] [API] Add `HfApi.copy_files` method to copy files remotely and update 'hf buckets cp' by Wauplin · Pull Request #3874 · huggingface/huggingface_hub

Wauplin · 2026-03-02T15:12:16Z

Note: requires https://github.com/huggingface-internal/moon-landing/pull/17593 to be merged first

This PR adds a new HfApi.copy_files API and extends hf buckets cp to support remote HF-handle copy workflows.

Copy from bucket to bucket (same bucket or different bucket)
Copy from repo (model/dataset/space) to bucket
Reject bucket->repo and repo->repo destinations (not supported yet)

If source is a file, copies it. If a directory, recursively copy files under source folder.

Repo source file with xet_hash: copied directly by hash
Repo source file without xet_hash (regular small file): download then re-upload
Bucket to bucket: always copied by hash

See https://github.com/huggingface-internal/moon-landing/pull/17593#issue-4201288199 PR description for working test.

Note

Medium Risk
Introduces new bucket mutation paths (copyFile operations) and expands CLI behavior to allow remote-to-remote copies, which could affect data placement/overwrites if path resolution or handle parsing is wrong. Changes are contained to bucket tooling but touch upload/batch logic and revision parsing edge-cases.

Overview
Adds a new public copy_files API to copy files/folders between Hub sources and bucket destinations using hf://... handles, performing server-side hash copies when Xet-backed and falling back to download+reupload for non-Xet repo files.

Extends batch_bucket_files and the internal /batch payload to support copy operations (copyFile) alongside add/delete, adjusts batching order (copy → add → delete), and tweaks add-file mtime handling.

Updates hf buckets cp to support remote hf:// → remote hf:// copy (including repo→bucket and bucket→bucket), exports copy_files from huggingface_hub, centralizes SPECIAL_REFS_REVISION_REGEX, and updates docs/tests to cover the new workflows and constraints (destination must be a bucket).

^{Reviewed by Cursor Bugbot for commit a996a0f. Bugbot is set up for automated code reviews on this repo. Configure here.}

bot-ci-comment · 2026-03-02T15:19:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Wauplin · 2026-03-03T14:04:07Z

Current status: x-bucket copies do not work. Need an extra call to CAS to tell about the xethash being registered in destination bucket

Wauplin · 2026-04-07T13:34:47Z

src/huggingface_hub/hf_api.py

+            else:
+                all_adds.append((_download_from_repo(file.path), target_path))


sub-optimal: could be parallelize but that's not something we want to optimize for now

Wauplin · 2026-04-07T14:46:16Z

src/huggingface_hub/hf_api.py



+def _parse_hf_copy_handle(hf_handle: str) -> _BucketCopyHandle | _RepoCopyHandle:
+    # TODO: Harmonize hf:// parsing. See https://github.com/huggingface/huggingface_hub/issues/3971


yes, #3971 is getting high in my priorities 🙈

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit b46f914. Configure here.}

src/huggingface_hub/hf_api.py

cursor · 2026-04-07T14:50:33Z

src/huggingface_hub/_buckets.py

+            try:
+                self.mtime = int(os.path.getmtime(self.source) * 1000)
+            except FileNotFoundError:
+                pass


Path objects lose file mtime due to isinstance check

Medium Severity

The mtime detection in _BucketAddFile.__post_init__ now checks isinstance(self.source, str) but the old code checked not isinstance(self.source, bytes), which covered both str and Path. Since the public batch_bucket_files API accepts Path objects as source, passing a Path will now always use time.time() instead of the actual file modification time from os.path.getmtime.

^{Reviewed by Cursor Bugbot for commit b46f914. Configure here.}

tests/test_buckets_cli.py

hanouticelina

Made a first pass!

hanouticelina · 2026-04-07T15:11:16Z

src/huggingface_hub/_buckets.py

+        if isinstance(self.source, str):
+            try:
+                self.mtime = int(os.path.getmtime(self.source) * 1000)
+            except FileNotFoundError:


not sure I understand why we are catching FileNotFoundError here

hanouticelina · 2026-04-07T15:28:41Z

src/huggingface_hub/hf_api.py

+    if revision is None:
+        revision = constants.DEFAULT_REVISION
+    elif remaining_parts:
+        maybe_special_ref = f"{unquote(revision)}/{remaining_parts[0]}"
+        match = SPECIAL_REFS_REVISION_REGEX.match(maybe_special_ref)
+        if match is not None:
+            special_ref = match.group()
+            revision = special_ref
+            suffix = maybe_special_ref.removeprefix(special_ref).lstrip("/")
+            remaining_parts = ([suffix] if suffix else []) + remaining_parts[1:]
+        else:
+            revision = unquote(revision)
+    else:
+        revision = unquote(revision)


Suggested change

if revision is None:

revision = constants.DEFAULT_REVISION

elif remaining_parts:

maybe_special_ref = f"{unquote(revision)}/{remaining_parts[0]}"

match = SPECIAL_REFS_REVISION_REGEX.match(maybe_special_ref)

if match is not None:

special_ref = match.group()

revision = special_ref

suffix = maybe_special_ref.removeprefix(special_ref).lstrip("/")

remaining_parts = ([suffix] if suffix else []) + remaining_parts[1:]

else:

revision = unquote(revision)

else:

revision = unquote(revision)

if revision is None:

revision = constants.DEFAULT_REVISION

else:

revision = unquote(revision)

if remaining_parts:

maybe_special_ref = f"{revision}/{remaining_parts[0]}"

match = SPECIAL_REFS_REVISION_REGEX.match(maybe_special_ref)

if match is not None:

revision = match.group()

suffix = maybe_special_ref.removeprefix(revision).lstrip("/")

remaining_parts = ([suffix] if suffix else []) + remaining_parts[1:]

hanouticelina · 2026-04-07T15:30:44Z

src/huggingface_hub/hf_api.py

+                    is not None
+                )
+
+        all_adds: list[_BucketAddFile | tuple[str, str]] = []


looks like we only append tuples no? (line 12622 all_adds.append((_download_from_repo(file.path), target_path)))

Suggested change

all_adds: list[_BucketAddFile | tuple[str, str]] = []

all_adds: list[tuple[str, str]] = []

hanouticelina · 2026-04-07T15:31:55Z

src/huggingface_hub/cli/buckets.py

-        raise typer.BadParameter("Remote-to-remote copy not supported.")
+    # Remote to remote copy
+    if src_is_hf and dst_is_hf:
+        assert dst is not None


not needed, we already do dst_is_hf = dst is not None and _is_hf_handle(dst)

Suggested change

assert dst is not None

or maybe it's the linter who's not happy? if it's the case, let's ignore the linting error instead of having an assert

hanouticelina · 2026-04-07T15:40:38Z

src/huggingface_hub/hf_api.py

+                return rel_path
+            return f"{destination_path.rstrip('/')}/{rel_path}"
+
+        def _copy_by_hash(


(nit) _copy_by_hash name suggests it performs the copy but it only builds _BucketCopyFile, maybe we can rename it _build_copy_op or something like that

hanouticelina · 2026-04-07T15:43:10Z

tests/test_buckets.py

    return bucket.bucket_id


+@pytest.fixture(scope="function")


function scope is the default

Suggested change

@pytest.fixture(scope="function")

hanouticelina · 2026-04-07T15:45:06Z

src/huggingface_hub/hf_api.py

+                destination_is_directory = (
+                    next(
+                        iter(
+                            self.list_bucket_tree(
+                                destination_bucket_id, prefix=destination_path, recursive=False, token=token
+                            )
+                        ),
+                        None,
+                    )
+                    is not None
+                )


(nit)

Suggested change

destination_is_directory = (

next(

iter(

self.list_bucket_tree(

destination_bucket_id, prefix=destination_path, recursive=False, token=token

)

),

None,

)

is not None

)

destination_is_directory = any(

self.list_bucket_tree(destination_bucket_id, prefix=destination_path, recursive=False, token=token)

)

any() returns True on first yield I think

hanouticelina · 2026-04-07T15:50:46Z

src/huggingface_hub/hf_api.py

-        if len(add) + len(delete) <= _BUCKET_BATCH_ADD_CHUNK_SIZE:
-            self._batch_bucket_files(bucket_id, add=add or None, delete=delete or None, token=token)
+        if len(add) + len(copy) + len(delete) <= _BUCKET_BATCH_ADD_CHUNK_SIZE:
+            self._batch_bucket_files(bucket_id, add=add or None, copy=copy or None, delete=delete or None, token=token)  # type: ignore


_batch_bucket_files already handles empty lists (not introduced by this PR )

Suggested change

self._batch_bucket_files(bucket_id, add=add or None, copy=copy or None, delete=delete or None, token=token) # type: ignore

self._batch_bucket_files(bucket_id, add=add, copy=copy, delete=delete, token=token) # type: ignore

Wauplin added 3 commits March 2, 2026 15:51

Add HfApi copy_files and remote hf buckets cp support

1490f6c

Adjust copy_files return type and cp output message

bca3c0a

remove useless

0ad2305

Wauplin added 4 commits March 2, 2026 16:32

docs

122c4ae

do not catch

641470a

comment

5c5b5ea

much better

d97522f

Wauplin added 15 commits April 3, 2026 16:50

Merge branch 'main' into feat/hfapi-copy-files

b067a9e

Server-side copies

d99a877

Merge branch 'main' into feat/hfapi-copy-files

5c1279f

check remote path

1dee89a

simpler

da4efab

docs

c685a7a

type

b58c13f

no dummy check

d1115fa

fix imports and types

9f82d81

add todo

e6711cb

simplified

de374cf

useless

5e76d93

type

d601517

all good

47345a0

review tests

b46f914

Wauplin changed the title ~~[API] Add HfApi.copy_files method to copy files remotely~~ [CLI] [API] Add HfApi.copy_files method to copy files remotely and update 'hf buckets cp' Apr 7, 2026

Wauplin requested a review from hanouticelina April 7, 2026 14:44

Wauplin marked this pull request as ready for review April 7, 2026 14:44

Wauplin commented Apr 7, 2026

View reviewed changes

cursor bot reviewed Apr 7, 2026

View reviewed changes

mypy happy

a996a0f

hanouticelina reviewed Apr 7, 2026

View reviewed changes

		else:
		all_adds.append((_download_from_repo(file.path), target_path))



		def _parse_hf_copy_handle(hf_handle: str) -> _BucketCopyHandle \| _RepoCopyHandle:
		# TODO: Harmonize hf:// parsing. See https://github.com/huggingface/huggingface_hub/issues/3971

	all_adds: list[_BucketAddFile \| tuple[str, str]] = []
	all_adds: list[tuple[str, str]] = []

	self._batch_bucket_files(bucket_id, add=add or None, copy=copy or None, delete=delete or None, token=token) # type: ignore
	self._batch_bucket_files(bucket_id, add=add, copy=copy, delete=delete, token=token) # type: ignore

Conversation

Wauplin commented Mar 2, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note: requires https://github.com/huggingface-internal/moon-landing/pull/17593 to be merged first

Uh oh!

bot-ci-comment bot commented Mar 2, 2026

Uh oh!

Wauplin commented Mar 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot Apr 7, 2026

Choose a reason for hiding this comment

Path objects lose file mtime due to isinstance check

Uh oh!

Uh oh!

hanouticelina left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Wauplin commented Mar 2, 2026 •

edited by cursor bot

Loading