Misc upstream fixes: items, recovery, helpers, export, SQL, admin by KornAlexander · Pull Request #1160 · microsoft/semantic-link-labs

KornAlexander · 2026-04-09T16:27:57Z

Miscellaneous Upstream Fixes

⚠️ This PR modifies multiple existing SLL files. These are small, targeted fixes and improvements discovered while building the PBI Fixer. Each change is independently useful and not PBI-Fixer-specific.

A collection of small fixes and improvements across several existing SLL modules. These were identified during PBI Fixer development but are general improvements that benefit all SLL users.

Changes by File

File	Lines Changed	Nature
`_items.py`	~152	Item listing improvements, better error handling
`_item_recovery.py`	~171	Item recovery enhancements
`_helper_functions.py`	~87	Utility function improvements
`admin/_tenant.py`	~57	Tenant admin fixes
`_export_report.py`	~37	Report export improvements
`_sql.py`	~28	SQL query fixes
`__init__.py`	~18	Import updates
Other small files	~small	Minor fixes

Files

src/sempy_labs/_items.py (modified)
src/sempy_labs/_item_recovery.py (modified)
src/sempy_labs/_helper_functions.py (modified)
src/sempy_labs/admin/_tenant.py (modified)
src/sempy_labs/_export_report.py (modified)
src/sempy_labs/_sql.py (modified)
src/sempy_labs/__init__.py (modified)
src/sempy_labs/admin/__init__.py (modified)
Other small files with trivial changes

Backward Compatibility

All changes are backward-compatible. No existing function signatures are altered. These are bug fixes, improved error handling, and minor feature additions.

PBI Fixer Contribution — Overview

This PR is part of the PBI Fixer contribution to semantic-link-labs — an interactive ipywidgets-based UI for scanning and fixing Power BI reports and semantic models directly in Microsoft Fabric Notebooks.

The PBI Fixer provides a tabbed ipywidgets interface (Semantic Model Explorer, Report Explorer, Perspective Editor, Vertipaq Analyzer) that lets users interactively scan, inspect, and fix Power BI artifacts without leaving the notebook. All underlying fixer functions also work as standalone API calls, so users can integrate them into scripts and pipelines without the UI.

Contribution Structure

The full contribution (~17K lines across 68 files) is split into 22 focused PRs across 6 phases to keep each PR reviewable and self-contained. Only new files are added in Phases 1–4 and 6 — no existing SLL code is modified.

Phase	Focus	PRs	Description
1	Report Fixers	7	Standalone functions that programmatically fix common Power BI report issues: replace pie charts with bar charts, standardize page sizes to Full HD, apply chart formatting best practices, migrate slicers to slicerbars, hide visual-level filters, clean up unused custom visuals, align visuals, migrate report-level measures, and upgrade reports to PBIR format. Each function operates on PBIR-format report definitions.
2	Semantic Model Fixers	4	Functions that fix and enhance semantic models via XMLA/TOM: add calculated calendar and measure tables, add calculation groups for units and time intelligence, discourage implicit measures, and 19 BPA auto-fixers covering formatting conventions, naming standards, data types, column visibility, sort order, and DAX patterns (e.g., use DIVIDE instead of `/`).
3	SM Setup & Analysis	3	Setup utilities: configure cache warming queries, set up incremental refresh policies, and prepare semantic models for AI/Copilot integration (descriptions, metadata enrichment).
4	Report Utilities	3	Report-level utilities: auto-generate report page prototypes from a semantic model's structure, extract and apply report themes, and generate IBCS-compliant variance charts.
5	Upstream Enhancements	3	⚠️ These PRs modify existing SLL code (unlike Phases 1–4 which only add new files). Changes include TOM model `.Find()` fixes and expression capture (`tom/_model.py`), Vertipaq analyzer enhancements with memory/column-level analysis (`_vertipaq.py`, ~1000 lines changed), and various small fixes across `_items.py`, `_item_recovery.py`, `_helper_functions.py`, `_export_report.py`, `_sql.py`, and `admin/_tenant.py`. These carry higher merge conflict risk and may need closer review or discussion.
6	PBI Fixer UI	2	The interactive UI layer: shared UI components (theme, icons, tree builders, layout helpers), BPA scan runners, report helpers, and the main PBI Fixer application with its tabbed interface (SM Explorer, Report Explorer, Perspective Editor, Vertipaq Analyzer). Depends on Phases 1–5 but uses lazy imports to degrade gracefully if individual fixers aren't yet merged.

Dependencies & Review Order

Phases 1–4 are fully independent — they only add new files and can be reviewed/merged in any order.
Phase 5 is also independent but modifies existing code, so it may benefit from early discussion.
Phase 6 (the UI) ties everything together. It depends on the earlier phases but works standalone via lazy imports.
All fixer functions work without the UI — they can be called directly as sempy_labs.report.fix_piecharts(...) or sempy_labs.semantic_model.add_calculated_calendar(...).

…port, SQL utilities, admin tenant

Copilot

Pull request overview

This PR bundles several small upstream fixes across sempy_labs, touching admin tenant utilities, SQL connectivity helpers, delta-table helper utilities, and top-level package exports.

Changes:

Updates SQL connection logic to resolve workspace name+id and shifts ConnectBase to an endpoint_type model.
Refactors delta-table helper code paths (schema building, delta reads, and column aggregates).
Adjusts top-level imports/exports in sempy_labs.__init__ and removes enable_item_recovery from the tenant admin module.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File	Description
`src/sempy_labs/admin/_tenant.py`	Removes `enable_item_recovery` tenant setting helper.
`src/sempy_labs/_sql.py`	Changes `ConnectBase` initialization to use `endpoint_type` and workspace name+id resolution.
`src/sempy_labs/_helper_functions.py`	Refactors `save_as_delta_table`, `_get_column_aggregate`, and `_read_delta_table` behavior.
`src/sempy_labs/__init__.py`	Updates top-level imports/`__all__`, adding new exports and removing some existing ones.

Comments suppressed due to low confidence (4)

src/sempy_labs/admin/_tenant.py:572

enable_item_recovery was removed from this module, but src/sempy_labs/admin/__init__.py still imports and re-exports it. This will raise an ImportError when importing sempy_labs.admin. Either restore enable_item_recovery here (or move it and update admin/__init__.py accordingly) so the admin package remains importable.

        _update_dataframe_datatypes(dataframe=df, column_map=columns)

    return df

src/sempy_labs/_helper_functions.py:962

build_schema() no longer validates that dtype is non-null and present in type_mapping. As written, dtype=None will raise AttributeError on dtype.lower(), and unknown dtypes will flow into pa.field(..., None) / StructField(..., None) causing less actionable errors. Add explicit checks and raise a clear ValueError when dtype is missing or unsupported (including the column name).

    def build_schema(schema_dict, type_mapping, use_arrow=True):
        if use_arrow:
            fields = [
                pa.field(name, type_mapping.get(dtype.lower()))
                for name, dtype in schema_dict.items()
            ]
            return pa.schema(fields)
        else:
            return StructType(
                [
                    StructField(name, type_mapping.get(dtype.lower()), True)
                    for name, dtype in schema_dict.items()
                ]
            )

src/sempy_labs/_helper_functions.py:992

save_as_delta_table() no longer prefixes table names with dbo/ when lakehouse schemas are enabled. Several callers pass bare table names (e.g. Delta Analyzer exports delta_table_name=f"{prefix}{name}"), which will now write to /Tables/<name> instead of /Tables/dbo/<name> on schema-enabled lakehouses. This likely breaks exports in schema-enabled environments. Consider restoring the is_schema_enabled(...) check and defaulting to dbo/<name> (or use create_abfss_path(..., schema="dbo") when appropriate).

            dataframe = dataframe.withColumnRenamed(col_name, new_name)
        spark_df = dataframe

    file_path = create_abfss_path(
        lakehouse_id=lakehouse_id,
        lakehouse_workspace_id=workspace_id,
        delta_table_name=delta_table_name,
    )

src/sempy_labs/_helper_functions.py:2483

_read_delta_table() no longer supports column projection (the prior columns=... path via to_pyarrow_table(columns=...) was removed). Combined with _get_column_aggregate() now calling _read_delta_table(path) up front, this can force loading full tables when only a few columns are needed. Consider reintroducing an optional columns parameter and using it for both pure-python and Spark reads to avoid unnecessary IO/memory.

def _read_delta_table(path: str, to_pandas: bool = True, to_df: bool = False):

    if _pure_python_notebook():
        from deltalake import DeltaTable

        df = DeltaTable(table_uri=path)
        if to_pandas:
            df = df.to_pandas()
    else:
        spark = _create_spark_session()
        df = spark.read.format("delta").load(path)
        if to_df:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T16:34:07Z

 class ConnectBase:
    def __init__(
        self,
        item: str | UUID,
-        type: Optional[str] = "Warehouse",
        workspace: Optional[Union[str, UUID]] = None,
        timeout: Optional[int] = None,
+        endpoint_type: str = "warehouse",
    ):
        from sempy.fabric._credentials import get_access_token
        import pyodbc

-        workspace_id = resolve_workspace_id(workspace)
+        (workspace_name, workspace_id) = resolve_workspace_name_and_id(workspace)

        # Resolve the appropriate ID and name (warehouse or lakehouse)
-        if type == "SQLDatabase":
+        if endpoint_type == "sqldatabase":
            # SQLDatabase is has special case for resolving the name and id
            (resource_name, resource_id) = resolve_item_name_and_id(
-                item=item, type=type, workspace=workspace_id
+                item=item, type="SQLDatabase", workspace=workspace_id
            )
-        elif type == "Lakehouse":
+        elif endpoint_type == "lakehouse":
            (resource_name, resource_id) = resolve_lakehouse_name_and_id(
                lakehouse=item,
                workspace=workspace_id,
            )
        else:
            (resource_name, resource_id) = resolve_item_name_and_id(
-                item=item, workspace=workspace_id, type=type
+                item=item, workspace=workspace_id, type=endpoint_type.capitalize()
            )


ConnectBase.__init__ no longer accepts the type keyword, but there are existing call sites that still pass type=... (e.g. src/sempy_labs/semantic_model/_vertipaq_analyzer.py uses with ConnectBase(item=source_name, type=source_type, ...)). This will raise TypeError: __init__() got an unexpected keyword argument 'type'. To keep compatibility, consider accepting type as a deprecated alias (or update all call sites in the repo to use endpoint_type). Also normalize/validate endpoint_type (e.g., endpoint_type = endpoint_type.lower() and restrict to known values) so callers passing "Warehouse"/"Lakehouse" don’t produce incorrect URL segments or miss the comparison branches.

Copilot · 2026-04-09T16:34:08Z

+    df = _read_delta_table(path)
+
    function = function.lower()

    if isinstance(column_name, str):
        column_name = [column_name]

    if _pure_python_notebook():
        import polars as pl
-        from polars.datatypes import Datetime, Decimal

-        lf = pl.scan_delta(path)
-        schema = lf.collect_schema()
+        if not isinstance(df, pd.DataFrame):
+            df.to_pandas()
+
+        df = pl.from_pandas(df)



In _get_column_aggregate() (pure-python path), this change forces a full read of the Delta table into pandas and then converts to Polars (_read_delta_table -> pl.from_pandas). Previously the code used pl.scan_delta(...) which is lazy and avoids loading the whole table into memory. For large lakehouse tables this is a significant memory/performance regression; consider using pl.scan_delta(path) (or deltalake.DeltaTable(...).to_pyarrow_table(columns=...)) and projecting only needed columns.

Copilot · 2026-04-09T16:34:08Z

        def get_expr(col):
-            col_dtype = schema[col]
+            col_dtype = df.schema[col]

            if "approx" in function:
                return pl.col(col).unique().count().alias(col)
-
            elif "distinct" in function:
-                # Check for decimal type properly
-                if isinstance(col_dtype, Decimal):
-                    # Cast to Float64 for unique counting
+                if col_dtype == pl.Decimal:
                    return pl.col(col).cast(pl.Float64).n_unique().alias(col)
                else:
                    return pl.col(col).n_unique().alias(col)
-
            elif function == "sum":


The Polars decimal handling is incorrect: col_dtype is a Polars DataType instance (e.g., pl.Decimal(precision, scale)), so col_dtype == pl.Decimal will never be true. If decimals require casting for n_unique()/mean(), use a proper type check (e.g., isinstance(col_dtype, pl.datatypes.Decimal) or compare against pl.Decimal(…, …)), otherwise decimal columns may error or produce unexpected results.

Copilot · 2026-04-09T16:34:08Z

+        exprs = [get_expr(col) for col in column_name]
+        aggs = df.select(exprs).to_dict(as_series=False)

        if len(column_name) == 1:
-            result = values[column_name[0]] or default_value
+            result = aggs[column_name[0]][0] or default_value
        else:
-            result = values
+            result = {col: aggs[col][0] for col in column_name}


result = aggs[column_name[0]][0] or default_value treats valid falsy aggregates (e.g. 0) as “missing” and replaces them with default_value. Use an explicit is None check (or pd.isna) so legitimate 0/0.0 results are preserved.

Copilot · 2026-04-09T16:34:09Z

    create_model_bpa_semantic_model,
 )
 from ._model_bpa import run_model_bpa
+from ._fix_model_bpa import fix_model_bpa


__init__.py now imports fix_model_bpa from ._fix_model_bpa, but src/sempy_labs/_fix_model_bpa.py is not present in the package. This will cause an ImportError on import sempy_labs. Ensure the module is added in this PR (or gate the import / remove it until it exists).

Suggested change

from ._fix_model_bpa import fix_model_bpa

try:

from ._fix_model_bpa import fix_model_bpa

except ImportError:

fix_model_bpa = None

Copilot · 2026-04-09T16:34:09Z

 from ._catalog import (
    list_endorsements,
    list_favorites,
 )
-from ._item_recovery import (
-    list_recoverable_items,
-    recover_item,
-    permanently_delete_item,
-)
-from ._items import (
-    bulk_export_items,
-    bulk_import_items,
-)
+from ._pbi_fixer import pbi_fixer

 __all__ = [
-    "bulk_export_items",
-    "bulk_import_items",
-    "list_recoverable_items",
-    "recover_item",
-    "permanently_delete_item",
+    "pbi_fixer",


__init__.py now imports pbi_fixer from ._pbi_fixer, but src/sempy_labs/_pbi_fixer.py is not present in the package. This will cause an ImportError on import sempy_labs. Either include the module in this PR or avoid importing/exporting it from the top-level until it exists.

Copilot · 2026-04-09T16:34:09Z

 from ._takeover import (
    takeover_item_ownership,
 )
 from ._catalog import (
    list_endorsements,
    list_favorites,
 )
-from ._item_recovery import (
-    list_recoverable_items,
-    recover_item,
-    permanently_delete_item,
-)
-from ._items import (
-    bulk_export_items,
-    bulk_import_items,
-)
+from ._pbi_fixer import pbi_fixer

 __all__ = [
-    "bulk_export_items",
-    "bulk_import_items",
-    "list_recoverable_items",
-    "recover_item",
-    "permanently_delete_item",
+    "pbi_fixer",
    "resolve_warehouse_id",


Top-level exports for bulk_export_items, bulk_import_items, and the item recovery helpers were removed from sempy_labs.__init__ (__all__ and the corresponding imports). This is a breaking API change for users doing import sempy_labs as sll; sll.bulk_export_items(...), and it conflicts with the PR’s “backward-compatible” claim. If the intent is backward compatibility, keep re-exporting these symbols (even if you also add newer entry points like pbi_fixer).

Misc fixes: items listing, item recovery, helper functions, report ex…

40d9432

…port, SQL utilities, admin tenant

Copilot AI review requested due to automatic review settings April 9, 2026 16:28

Copilot started reviewing on behalf of KornAlexander April 9, 2026 16:29 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc upstream fixes: items, recovery, helpers, export, SQL, admin#1160

Misc upstream fixes: items, recovery, helpers, export, SQL, admin#1160
KornAlexander wants to merge 1 commit intomicrosoft:mainfrom
KornAlexander:feature/misc-upstream-fixes

KornAlexander commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KornAlexander commented Apr 9, 2026

Miscellaneous Upstream Fixes

Changes by File

Files

Backward Compatibility

PBI Fixer Contribution — Overview

Contribution Structure

Dependencies & Review Order

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants