Misc upstream fixes: items, recovery, helpers, export, SQL, admin#1160
Misc upstream fixes: items, recovery, helpers, export, SQL, admin#1160KornAlexander wants to merge 1 commit intomicrosoft:mainfrom
Conversation
…port, SQL utilities, admin tenant
There was a problem hiding this comment.
Pull request overview
This PR bundles several small upstream fixes across sempy_labs, touching admin tenant utilities, SQL connectivity helpers, delta-table helper utilities, and top-level package exports.
Changes:
- Updates SQL connection logic to resolve workspace name+id and shifts
ConnectBaseto anendpoint_typemodel. - Refactors delta-table helper code paths (schema building, delta reads, and column aggregates).
- Adjusts top-level imports/exports in
sempy_labs.__init__and removesenable_item_recoveryfrom the tenant admin module.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
src/sempy_labs/admin/_tenant.py |
Removes enable_item_recovery tenant setting helper. |
src/sempy_labs/_sql.py |
Changes ConnectBase initialization to use endpoint_type and workspace name+id resolution. |
src/sempy_labs/_helper_functions.py |
Refactors save_as_delta_table, _get_column_aggregate, and _read_delta_table behavior. |
src/sempy_labs/__init__.py |
Updates top-level imports/__all__, adding new exports and removing some existing ones. |
Comments suppressed due to low confidence (4)
src/sempy_labs/admin/_tenant.py:572
enable_item_recoverywas removed from this module, butsrc/sempy_labs/admin/__init__.pystill imports and re-exports it. This will raise an ImportError when importingsempy_labs.admin. Either restoreenable_item_recoveryhere (or move it and updateadmin/__init__.pyaccordingly) so the admin package remains importable.
_update_dataframe_datatypes(dataframe=df, column_map=columns)
return df
src/sempy_labs/_helper_functions.py:962
build_schema()no longer validates thatdtypeis non-null and present intype_mapping. As written,dtype=Nonewill raiseAttributeErrorondtype.lower(), and unknown dtypes will flow intopa.field(..., None)/StructField(..., None)causing less actionable errors. Add explicit checks and raise a clearValueErrorwhendtypeis missing or unsupported (including the column name).
def build_schema(schema_dict, type_mapping, use_arrow=True):
if use_arrow:
fields = [
pa.field(name, type_mapping.get(dtype.lower()))
for name, dtype in schema_dict.items()
]
return pa.schema(fields)
else:
return StructType(
[
StructField(name, type_mapping.get(dtype.lower()), True)
for name, dtype in schema_dict.items()
]
)
src/sempy_labs/_helper_functions.py:992
save_as_delta_table()no longer prefixes table names withdbo/when lakehouse schemas are enabled. Several callers pass bare table names (e.g. Delta Analyzer exportsdelta_table_name=f"{prefix}{name}"), which will now write to/Tables/<name>instead of/Tables/dbo/<name>on schema-enabled lakehouses. This likely breaks exports in schema-enabled environments. Consider restoring theis_schema_enabled(...)check and defaulting todbo/<name>(or usecreate_abfss_path(..., schema="dbo")when appropriate).
dataframe = dataframe.withColumnRenamed(col_name, new_name)
spark_df = dataframe
file_path = create_abfss_path(
lakehouse_id=lakehouse_id,
lakehouse_workspace_id=workspace_id,
delta_table_name=delta_table_name,
)
src/sempy_labs/_helper_functions.py:2483
_read_delta_table()no longer supports column projection (the priorcolumns=...path viato_pyarrow_table(columns=...)was removed). Combined with_get_column_aggregate()now calling_read_delta_table(path)up front, this can force loading full tables when only a few columns are needed. Consider reintroducing an optionalcolumnsparameter and using it for both pure-python and Spark reads to avoid unnecessary IO/memory.
def _read_delta_table(path: str, to_pandas: bool = True, to_df: bool = False):
if _pure_python_notebook():
from deltalake import DeltaTable
df = DeltaTable(table_uri=path)
if to_pandas:
df = df.to_pandas()
else:
spark = _create_spark_session()
df = spark.read.format("delta").load(path)
if to_df:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class ConnectBase: | ||
| def __init__( | ||
| self, | ||
| item: str | UUID, | ||
| type: Optional[str] = "Warehouse", | ||
| workspace: Optional[Union[str, UUID]] = None, | ||
| timeout: Optional[int] = None, | ||
| endpoint_type: str = "warehouse", | ||
| ): | ||
| from sempy.fabric._credentials import get_access_token | ||
| import pyodbc | ||
|
|
||
| workspace_id = resolve_workspace_id(workspace) | ||
| (workspace_name, workspace_id) = resolve_workspace_name_and_id(workspace) | ||
|
|
||
| # Resolve the appropriate ID and name (warehouse or lakehouse) | ||
| if type == "SQLDatabase": | ||
| if endpoint_type == "sqldatabase": | ||
| # SQLDatabase is has special case for resolving the name and id | ||
| (resource_name, resource_id) = resolve_item_name_and_id( | ||
| item=item, type=type, workspace=workspace_id | ||
| item=item, type="SQLDatabase", workspace=workspace_id | ||
| ) | ||
| elif type == "Lakehouse": | ||
| elif endpoint_type == "lakehouse": | ||
| (resource_name, resource_id) = resolve_lakehouse_name_and_id( | ||
| lakehouse=item, | ||
| workspace=workspace_id, | ||
| ) | ||
| else: | ||
| (resource_name, resource_id) = resolve_item_name_and_id( | ||
| item=item, workspace=workspace_id, type=type | ||
| item=item, workspace=workspace_id, type=endpoint_type.capitalize() | ||
| ) |
There was a problem hiding this comment.
ConnectBase.__init__ no longer accepts the type keyword, but there are existing call sites that still pass type=... (e.g. src/sempy_labs/semantic_model/_vertipaq_analyzer.py uses with ConnectBase(item=source_name, type=source_type, ...)). This will raise TypeError: __init__() got an unexpected keyword argument 'type'. To keep compatibility, consider accepting type as a deprecated alias (or update all call sites in the repo to use endpoint_type). Also normalize/validate endpoint_type (e.g., endpoint_type = endpoint_type.lower() and restrict to known values) so callers passing "Warehouse"/"Lakehouse" don’t produce incorrect URL segments or miss the comparison branches.
| df = _read_delta_table(path) | ||
|
|
||
| function = function.lower() | ||
|
|
||
| if isinstance(column_name, str): | ||
| column_name = [column_name] | ||
|
|
||
| if _pure_python_notebook(): | ||
| import polars as pl | ||
| from polars.datatypes import Datetime, Decimal | ||
|
|
||
| lf = pl.scan_delta(path) | ||
| schema = lf.collect_schema() | ||
| if not isinstance(df, pd.DataFrame): | ||
| df.to_pandas() | ||
|
|
||
| df = pl.from_pandas(df) | ||
|
|
There was a problem hiding this comment.
In _get_column_aggregate() (pure-python path), this change forces a full read of the Delta table into pandas and then converts to Polars (_read_delta_table -> pl.from_pandas). Previously the code used pl.scan_delta(...) which is lazy and avoids loading the whole table into memory. For large lakehouse tables this is a significant memory/performance regression; consider using pl.scan_delta(path) (or deltalake.DeltaTable(...).to_pyarrow_table(columns=...)) and projecting only needed columns.
| def get_expr(col): | ||
| col_dtype = schema[col] | ||
| col_dtype = df.schema[col] | ||
|
|
||
| if "approx" in function: | ||
| return pl.col(col).unique().count().alias(col) | ||
|
|
||
| elif "distinct" in function: | ||
| # Check for decimal type properly | ||
| if isinstance(col_dtype, Decimal): | ||
| # Cast to Float64 for unique counting | ||
| if col_dtype == pl.Decimal: | ||
| return pl.col(col).cast(pl.Float64).n_unique().alias(col) | ||
| else: | ||
| return pl.col(col).n_unique().alias(col) | ||
|
|
||
| elif function == "sum": |
There was a problem hiding this comment.
The Polars decimal handling is incorrect: col_dtype is a Polars DataType instance (e.g., pl.Decimal(precision, scale)), so col_dtype == pl.Decimal will never be true. If decimals require casting for n_unique()/mean(), use a proper type check (e.g., isinstance(col_dtype, pl.datatypes.Decimal) or compare against pl.Decimal(…, …)), otherwise decimal columns may error or produce unexpected results.
| exprs = [get_expr(col) for col in column_name] | ||
| aggs = df.select(exprs).to_dict(as_series=False) | ||
|
|
||
| if len(column_name) == 1: | ||
| result = values[column_name[0]] or default_value | ||
| result = aggs[column_name[0]][0] or default_value | ||
| else: | ||
| result = values | ||
| result = {col: aggs[col][0] for col in column_name} |
There was a problem hiding this comment.
result = aggs[column_name[0]][0] or default_value treats valid falsy aggregates (e.g. 0) as “missing” and replaces them with default_value. Use an explicit is None check (or pd.isna) so legitimate 0/0.0 results are preserved.
| create_model_bpa_semantic_model, | ||
| ) | ||
| from ._model_bpa import run_model_bpa | ||
| from ._fix_model_bpa import fix_model_bpa |
There was a problem hiding this comment.
__init__.py now imports fix_model_bpa from ._fix_model_bpa, but src/sempy_labs/_fix_model_bpa.py is not present in the package. This will cause an ImportError on import sempy_labs. Ensure the module is added in this PR (or gate the import / remove it until it exists).
| from ._fix_model_bpa import fix_model_bpa | |
| try: | |
| from ._fix_model_bpa import fix_model_bpa | |
| except ImportError: | |
| fix_model_bpa = None |
| from ._catalog import ( | ||
| list_endorsements, | ||
| list_favorites, | ||
| ) | ||
| from ._item_recovery import ( | ||
| list_recoverable_items, | ||
| recover_item, | ||
| permanently_delete_item, | ||
| ) | ||
| from ._items import ( | ||
| bulk_export_items, | ||
| bulk_import_items, | ||
| ) | ||
| from ._pbi_fixer import pbi_fixer | ||
|
|
||
| __all__ = [ | ||
| "bulk_export_items", | ||
| "bulk_import_items", | ||
| "list_recoverable_items", | ||
| "recover_item", | ||
| "permanently_delete_item", | ||
| "pbi_fixer", |
There was a problem hiding this comment.
__init__.py now imports pbi_fixer from ._pbi_fixer, but src/sempy_labs/_pbi_fixer.py is not present in the package. This will cause an ImportError on import sempy_labs. Either include the module in this PR or avoid importing/exporting it from the top-level until it exists.
| from ._takeover import ( | ||
| takeover_item_ownership, | ||
| ) | ||
| from ._catalog import ( | ||
| list_endorsements, | ||
| list_favorites, | ||
| ) | ||
| from ._item_recovery import ( | ||
| list_recoverable_items, | ||
| recover_item, | ||
| permanently_delete_item, | ||
| ) | ||
| from ._items import ( | ||
| bulk_export_items, | ||
| bulk_import_items, | ||
| ) | ||
| from ._pbi_fixer import pbi_fixer | ||
|
|
||
| __all__ = [ | ||
| "bulk_export_items", | ||
| "bulk_import_items", | ||
| "list_recoverable_items", | ||
| "recover_item", | ||
| "permanently_delete_item", | ||
| "pbi_fixer", | ||
| "resolve_warehouse_id", |
There was a problem hiding this comment.
Top-level exports for bulk_export_items, bulk_import_items, and the item recovery helpers were removed from sempy_labs.__init__ (__all__ and the corresponding imports). This is a breaking API change for users doing import sempy_labs as sll; sll.bulk_export_items(...), and it conflicts with the PR’s “backward-compatible” claim. If the intent is backward compatibility, keep re-exporting these symbols (even if you also add newer entry points like pbi_fixer).
Miscellaneous Upstream Fixes
A collection of small fixes and improvements across several existing SLL modules. These were identified during PBI Fixer development but are general improvements that benefit all SLL users.
Changes by File
_items.py_item_recovery.py_helper_functions.pyadmin/_tenant.py_export_report.py_sql.py__init__.pyFiles
src/sempy_labs/_items.py(modified)src/sempy_labs/_item_recovery.py(modified)src/sempy_labs/_helper_functions.py(modified)src/sempy_labs/admin/_tenant.py(modified)src/sempy_labs/_export_report.py(modified)src/sempy_labs/_sql.py(modified)src/sempy_labs/__init__.py(modified)src/sempy_labs/admin/__init__.py(modified)Backward Compatibility
All changes are backward-compatible. No existing function signatures are altered. These are bug fixes, improved error handling, and minor feature additions.
PBI Fixer Contribution — Overview
This PR is part of the PBI Fixer contribution to semantic-link-labs — an interactive ipywidgets-based UI for scanning and fixing Power BI reports and semantic models directly in Microsoft Fabric Notebooks.
The PBI Fixer provides a tabbed ipywidgets interface (Semantic Model Explorer, Report Explorer, Perspective Editor, Vertipaq Analyzer) that lets users interactively scan, inspect, and fix Power BI artifacts without leaving the notebook. All underlying fixer functions also work as standalone API calls, so users can integrate them into scripts and pipelines without the UI.
Contribution Structure
The full contribution (~17K lines across 68 files) is split into 22 focused PRs across 6 phases to keep each PR reviewable and self-contained. Only new files are added in Phases 1–4 and 6 — no existing SLL code is modified.
/)..Find()fixes and expression capture (tom/_model.py), Vertipaq analyzer enhancements with memory/column-level analysis (_vertipaq.py, ~1000 lines changed), and various small fixes across_items.py,_item_recovery.py,_helper_functions.py,_export_report.py,_sql.py, andadmin/_tenant.py. These carry higher merge conflict risk and may need closer review or discussion.Dependencies & Review Order
sempy_labs.report.fix_piecharts(...)orsempy_labs.semantic_model.add_calculated_calendar(...).