-
Notifications
You must be signed in to change notification settings - Fork 139
feat(cli): add datachain bucket status command #1717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
amritghimire
wants to merge
14
commits into
main
Choose a base branch
from
feat/bucket-status-cmd
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
856bea6
feat(cli): add `datachain bucket status` command
amritghimire 717d344
Merge branch 'main' into feat/bucket-status-cmd
amritghimire f618be1
fix(bucket-status): address PR review feedback on anon probe kwargs a…
amritghimire 44a0d21
Merge branch 'main' into feat/bucket-status-cmd
amritghimire 70cc1ea
fix(bucket-status): error on URI with path component
amritghimire bd97a03
Merge branch 'main' into feat/bucket-status-cmd
amritghimire 41e5cb6
Merge branch 'main' into feat/bucket-status-cmd
amritghimire ec0a246
fix(bucket-status): address PR review feedback and fix Azure anon probe
amritghimire 26bd989
Fix for gcs
amritghimire 3a6fa86
Merge branch 'main' into feat/bucket-status-cmd
amritghimire dce825c
Fix test
amritghimire f8c3828
Refactor bucket status handling for Azure and GCS clients
amritghimire 862bd64
Merge branch 'main' into feat/bucket-status-cmd
amritghimire 3bc2e4d
Add missing test coverage
amritghimire File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| import sys | ||
|
|
||
| from datachain.client import bucket_status | ||
|
|
||
|
|
||
| def bucket_status_cmd(uri: str, client_config: dict | None = None) -> int: | ||
| """Check existence and access of a bucket/container. | ||
|
|
||
| Returns 0 if bucket exists, 1 if not found. | ||
| Raises on network errors. | ||
| """ | ||
| status = bucket_status(uri, **(client_config or {})) | ||
|
|
||
| if status.exists: | ||
| print("Status: exists") | ||
| print(f"Access: {status.access}") | ||
| else: | ||
| print("Status: not found") | ||
| if status.error: | ||
| print(f"Error: {status.error}", file=sys.stderr) | ||
| return 0 if status.exists else 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,24 @@ | ||
| from .fsspec import Client | ||
| from .fsspec import BucketStatus, Client | ||
|
|
||
| __all__ = ["Client"] | ||
|
|
||
| def bucket_status(uri: str, **client_config) -> BucketStatus: | ||
| """Check bucket existence and access level without listing objects. | ||
|
|
||
| Args: | ||
| uri: Bucket URI, e.g. "s3://my-bucket/", "gs://my-bucket/", "az://my-container/" | ||
| **client_config: Storage client configuration (aws_key, etc.) | ||
| For Azure, pass ``account_name`` to enable anonymous access detection; | ||
| without it, only authenticated access is probed. | ||
|
|
||
| Returns: | ||
| BucketStatus(exists, access) where access is one of: | ||
| 'anonymous', 'authenticated', 'denied' | ||
| """ | ||
| client_cls = Client.get_implementation(uri) | ||
| name, path = client_cls.split_url(uri) | ||
| if path: | ||
| raise ValueError(f"path in a bucket is not allowed, only bucket name: {uri!r}") | ||
| return client_cls.bucket_status(name, **client_config) | ||
|
|
||
|
|
||
| __all__ = ["BucketStatus", "Client", "bucket_status"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,18 +1,70 @@ | ||
| from typing import Any | ||
|
|
||
| from adlfs import AzureBlobFileSystem | ||
| from azure.core.exceptions import ( | ||
| ClientAuthenticationError, | ||
| HttpResponseError, | ||
| ResourceNotFoundError, | ||
| ) | ||
| from azure.storage.blob import BlobServiceClient | ||
| from fsspec.asyn import get_loop, sync | ||
|
|
||
| from datachain.lib.file import File | ||
| from datachain.progress import tqdm | ||
|
|
||
| from .fsspec import DELIMITER, Client, ResultQueue | ||
| from .fsspec import DELIMITER, BucketStatus, Client, ResultQueue | ||
|
|
||
|
|
||
| class AzureClient(Client): | ||
| FS_CLASS = AzureBlobFileSystem | ||
| PREFIX = "az://" | ||
| protocol = "az" | ||
|
|
||
| @classmethod | ||
| def bucket_status(cls, name: str, **kwargs) -> BucketStatus: | ||
| # Step 1: Anonymous probe — uses BlobServiceClient directly (not adlfs) | ||
| # to avoid picking up credentials from environment variables like | ||
| # AZURE_STORAGE_CONNECTION_STRING. | ||
| account_name = kwargs.get("account_name") | ||
| if account_name: | ||
| try: | ||
| url = f"https://{account_name}.blob.core.windows.net" | ||
| anon_client = BlobServiceClient(account_url=url) | ||
|
amritghimire marked this conversation as resolved.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to double check - how we build URL - is it only bucket name?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, if not we raise error as requested by dmitry in one of the comment. |
||
| anon_client.get_container_client(name).get_container_properties() | ||
| return BucketStatus(exists=True, access="anonymous") | ||
| except ClientAuthenticationError: | ||
| pass | ||
| except ResourceNotFoundError: | ||
| return BucketStatus( | ||
| exists=False, | ||
| access="denied", | ||
| error=f"Azure container '{name}' not found", | ||
| ) | ||
| except HttpResponseError as e: | ||
| if e.status_code not in (401, 403): | ||
| raise | ||
|
|
||
| # Step 2: Authenticated probe. | ||
| try: | ||
| auth_fs = cls.create_fs(**kwargs) | ||
| sync(get_loop(), auth_fs._info, name) | ||
| return BucketStatus(exists=True, access="authenticated") | ||
| except (PermissionError, ClientAuthenticationError): | ||
| return BucketStatus( | ||
| exists=True, | ||
| access="denied", | ||
| error=f"Access denied to Azure container '{name}'" | ||
|
shcheklein marked this conversation as resolved.
|
||
| " — check credentials/configuration", | ||
| ) | ||
| except FileNotFoundError: | ||
| return BucketStatus( | ||
|
amritghimire marked this conversation as resolved.
|
||
| exists=False, | ||
| access="denied", | ||
| error=f"Azure container '{name}' not found", | ||
| ) | ||
| except ValueError as e: | ||
| return BucketStatus(exists=False, access="denied", error=str(e)) | ||
|
|
||
| def info_to_file(self, v: dict[str, Any], path: str) -> File: | ||
| version_id = v.get("version_id") if self._is_version_aware() else None | ||
| return File( | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it required for non anon access check also? in some cases I think key doesn't include account name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave as it is. In case key doesn't include account name, it raises error saying the same.