Skip to content

Misc upstream fixes: items, recovery, helpers, export, SQL, admin#1160

Open
KornAlexander wants to merge 1 commit intomicrosoft:mainfrom
KornAlexander:feature/misc-upstream-fixes
Open

Misc upstream fixes: items, recovery, helpers, export, SQL, admin#1160
KornAlexander wants to merge 1 commit intomicrosoft:mainfrom
KornAlexander:feature/misc-upstream-fixes

Conversation

@KornAlexander
Copy link
Copy Markdown

Miscellaneous Upstream Fixes

⚠️ This PR modifies multiple existing SLL files. These are small, targeted fixes and improvements discovered while building the PBI Fixer. Each change is independently useful and not PBI-Fixer-specific.

A collection of small fixes and improvements across several existing SLL modules. These were identified during PBI Fixer development but are general improvements that benefit all SLL users.

Changes by File

File Lines Changed Nature
_items.py ~152 Item listing improvements, better error handling
_item_recovery.py ~171 Item recovery enhancements
_helper_functions.py ~87 Utility function improvements
admin/_tenant.py ~57 Tenant admin fixes
_export_report.py ~37 Report export improvements
_sql.py ~28 SQL query fixes
__init__.py ~18 Import updates
Other small files ~small Minor fixes

Files

  • src/sempy_labs/_items.py (modified)
  • src/sempy_labs/_item_recovery.py (modified)
  • src/sempy_labs/_helper_functions.py (modified)
  • src/sempy_labs/admin/_tenant.py (modified)
  • src/sempy_labs/_export_report.py (modified)
  • src/sempy_labs/_sql.py (modified)
  • src/sempy_labs/__init__.py (modified)
  • src/sempy_labs/admin/__init__.py (modified)
  • Other small files with trivial changes

Backward Compatibility

All changes are backward-compatible. No existing function signatures are altered. These are bug fixes, improved error handling, and minor feature additions.


PBI Fixer Contribution — Overview

This PR is part of the PBI Fixer contribution to semantic-link-labs — an interactive ipywidgets-based UI for scanning and fixing Power BI reports and semantic models directly in Microsoft Fabric Notebooks.

The PBI Fixer provides a tabbed ipywidgets interface (Semantic Model Explorer, Report Explorer, Perspective Editor, Vertipaq Analyzer) that lets users interactively scan, inspect, and fix Power BI artifacts without leaving the notebook. All underlying fixer functions also work as standalone API calls, so users can integrate them into scripts and pipelines without the UI.

Contribution Structure

The full contribution (~17K lines across 68 files) is split into 22 focused PRs across 6 phases to keep each PR reviewable and self-contained. Only new files are added in Phases 1–4 and 6 — no existing SLL code is modified.

Phase Focus PRs Description
1 Report Fixers 7 Standalone functions that programmatically fix common Power BI report issues: replace pie charts with bar charts, standardize page sizes to Full HD, apply chart formatting best practices, migrate slicers to slicerbars, hide visual-level filters, clean up unused custom visuals, align visuals, migrate report-level measures, and upgrade reports to PBIR format. Each function operates on PBIR-format report definitions.
2 Semantic Model Fixers 4 Functions that fix and enhance semantic models via XMLA/TOM: add calculated calendar and measure tables, add calculation groups for units and time intelligence, discourage implicit measures, and 19 BPA auto-fixers covering formatting conventions, naming standards, data types, column visibility, sort order, and DAX patterns (e.g., use DIVIDE instead of /).
3 SM Setup & Analysis 3 Setup utilities: configure cache warming queries, set up incremental refresh policies, and prepare semantic models for AI/Copilot integration (descriptions, metadata enrichment).
4 Report Utilities 3 Report-level utilities: auto-generate report page prototypes from a semantic model's structure, extract and apply report themes, and generate IBCS-compliant variance charts.
5 Upstream Enhancements 3 ⚠️ These PRs modify existing SLL code (unlike Phases 1–4 which only add new files). Changes include TOM model .Find() fixes and expression capture (tom/_model.py), Vertipaq analyzer enhancements with memory/column-level analysis (_vertipaq.py, ~1000 lines changed), and various small fixes across _items.py, _item_recovery.py, _helper_functions.py, _export_report.py, _sql.py, and admin/_tenant.py. These carry higher merge conflict risk and may need closer review or discussion.
6 PBI Fixer UI 2 The interactive UI layer: shared UI components (theme, icons, tree builders, layout helpers), BPA scan runners, report helpers, and the main PBI Fixer application with its tabbed interface (SM Explorer, Report Explorer, Perspective Editor, Vertipaq Analyzer). Depends on Phases 1–5 but uses lazy imports to degrade gracefully if individual fixers aren't yet merged.

Dependencies & Review Order

  • Phases 1–4 are fully independent — they only add new files and can be reviewed/merged in any order.
  • Phase 5 is also independent but modifies existing code, so it may benefit from early discussion.
  • Phase 6 (the UI) ties everything together. It depends on the earlier phases but works standalone via lazy imports.
  • All fixer functions work without the UI — they can be called directly as sempy_labs.report.fix_piecharts(...) or sempy_labs.semantic_model.add_calculated_calendar(...).

Copilot AI review requested due to automatic review settings April 9, 2026 16:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bundles several small upstream fixes across sempy_labs, touching admin tenant utilities, SQL connectivity helpers, delta-table helper utilities, and top-level package exports.

Changes:

  • Updates SQL connection logic to resolve workspace name+id and shifts ConnectBase to an endpoint_type model.
  • Refactors delta-table helper code paths (schema building, delta reads, and column aggregates).
  • Adjusts top-level imports/exports in sempy_labs.__init__ and removes enable_item_recovery from the tenant admin module.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
src/sempy_labs/admin/_tenant.py Removes enable_item_recovery tenant setting helper.
src/sempy_labs/_sql.py Changes ConnectBase initialization to use endpoint_type and workspace name+id resolution.
src/sempy_labs/_helper_functions.py Refactors save_as_delta_table, _get_column_aggregate, and _read_delta_table behavior.
src/sempy_labs/__init__.py Updates top-level imports/__all__, adding new exports and removing some existing ones.
Comments suppressed due to low confidence (4)

src/sempy_labs/admin/_tenant.py:572

  • enable_item_recovery was removed from this module, but src/sempy_labs/admin/__init__.py still imports and re-exports it. This will raise an ImportError when importing sempy_labs.admin. Either restore enable_item_recovery here (or move it and update admin/__init__.py accordingly) so the admin package remains importable.
        _update_dataframe_datatypes(dataframe=df, column_map=columns)

    return df

src/sempy_labs/_helper_functions.py:962

  • build_schema() no longer validates that dtype is non-null and present in type_mapping. As written, dtype=None will raise AttributeError on dtype.lower(), and unknown dtypes will flow into pa.field(..., None) / StructField(..., None) causing less actionable errors. Add explicit checks and raise a clear ValueError when dtype is missing or unsupported (including the column name).
    def build_schema(schema_dict, type_mapping, use_arrow=True):
        if use_arrow:
            fields = [
                pa.field(name, type_mapping.get(dtype.lower()))
                for name, dtype in schema_dict.items()
            ]
            return pa.schema(fields)
        else:
            return StructType(
                [
                    StructField(name, type_mapping.get(dtype.lower()), True)
                    for name, dtype in schema_dict.items()
                ]
            )

src/sempy_labs/_helper_functions.py:992

  • save_as_delta_table() no longer prefixes table names with dbo/ when lakehouse schemas are enabled. Several callers pass bare table names (e.g. Delta Analyzer exports delta_table_name=f"{prefix}{name}"), which will now write to /Tables/<name> instead of /Tables/dbo/<name> on schema-enabled lakehouses. This likely breaks exports in schema-enabled environments. Consider restoring the is_schema_enabled(...) check and defaulting to dbo/<name> (or use create_abfss_path(..., schema="dbo") when appropriate).
            dataframe = dataframe.withColumnRenamed(col_name, new_name)
        spark_df = dataframe

    file_path = create_abfss_path(
        lakehouse_id=lakehouse_id,
        lakehouse_workspace_id=workspace_id,
        delta_table_name=delta_table_name,
    )

src/sempy_labs/_helper_functions.py:2483

  • _read_delta_table() no longer supports column projection (the prior columns=... path via to_pyarrow_table(columns=...) was removed). Combined with _get_column_aggregate() now calling _read_delta_table(path) up front, this can force loading full tables when only a few columns are needed. Consider reintroducing an optional columns parameter and using it for both pure-python and Spark reads to avoid unnecessary IO/memory.
def _read_delta_table(path: str, to_pandas: bool = True, to_df: bool = False):

    if _pure_python_notebook():
        from deltalake import DeltaTable

        df = DeltaTable(table_uri=path)
        if to_pandas:
            df = df.to_pandas()
    else:
        spark = _create_spark_session()
        df = spark.read.format("delta").load(path)
        if to_df:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/sempy_labs/_sql.py
Comment on lines 34 to 61
class ConnectBase:
def __init__(
self,
item: str | UUID,
type: Optional[str] = "Warehouse",
workspace: Optional[Union[str, UUID]] = None,
timeout: Optional[int] = None,
endpoint_type: str = "warehouse",
):
from sempy.fabric._credentials import get_access_token
import pyodbc

workspace_id = resolve_workspace_id(workspace)
(workspace_name, workspace_id) = resolve_workspace_name_and_id(workspace)

# Resolve the appropriate ID and name (warehouse or lakehouse)
if type == "SQLDatabase":
if endpoint_type == "sqldatabase":
# SQLDatabase is has special case for resolving the name and id
(resource_name, resource_id) = resolve_item_name_and_id(
item=item, type=type, workspace=workspace_id
item=item, type="SQLDatabase", workspace=workspace_id
)
elif type == "Lakehouse":
elif endpoint_type == "lakehouse":
(resource_name, resource_id) = resolve_lakehouse_name_and_id(
lakehouse=item,
workspace=workspace_id,
)
else:
(resource_name, resource_id) = resolve_item_name_and_id(
item=item, workspace=workspace_id, type=type
item=item, workspace=workspace_id, type=endpoint_type.capitalize()
)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectBase.__init__ no longer accepts the type keyword, but there are existing call sites that still pass type=... (e.g. src/sempy_labs/semantic_model/_vertipaq_analyzer.py uses with ConnectBase(item=source_name, type=source_type, ...)). This will raise TypeError: __init__() got an unexpected keyword argument 'type'. To keep compatibility, consider accepting type as a deprecated alias (or update all call sites in the repo to use endpoint_type). Also normalize/validate endpoint_type (e.g., endpoint_type = endpoint_type.lower() and restrict to known values) so callers passing "Warehouse"/"Lakehouse" don’t produce incorrect URL segments or miss the comparison branches.

Copilot uses AI. Check for mistakes.
Comment on lines +1906 to 1920
df = _read_delta_table(path)

function = function.lower()

if isinstance(column_name, str):
column_name = [column_name]

if _pure_python_notebook():
import polars as pl
from polars.datatypes import Datetime, Decimal

lf = pl.scan_delta(path)
schema = lf.collect_schema()
if not isinstance(df, pd.DataFrame):
df.to_pandas()

df = pl.from_pandas(df)

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _get_column_aggregate() (pure-python path), this change forces a full read of the Delta table into pandas and then converts to Polars (_read_delta_table -> pl.from_pandas). Previously the code used pl.scan_delta(...) which is lazy and avoids loading the whole table into memory. For large lakehouse tables this is a significant memory/performance regression; consider using pl.scan_delta(path) (or deltalake.DeltaTable(...).to_pyarrow_table(columns=...)) and projecting only needed columns.

Copilot uses AI. Check for mistakes.
Comment on lines 1921 to 1931
def get_expr(col):
col_dtype = schema[col]
col_dtype = df.schema[col]

if "approx" in function:
return pl.col(col).unique().count().alias(col)

elif "distinct" in function:
# Check for decimal type properly
if isinstance(col_dtype, Decimal):
# Cast to Float64 for unique counting
if col_dtype == pl.Decimal:
return pl.col(col).cast(pl.Float64).n_unique().alias(col)
else:
return pl.col(col).n_unique().alias(col)

elif function == "sum":
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Polars decimal handling is incorrect: col_dtype is a Polars DataType instance (e.g., pl.Decimal(precision, scale)), so col_dtype == pl.Decimal will never be true. If decimals require casting for n_unique()/mean(), use a proper type check (e.g., isinstance(col_dtype, pl.datatypes.Decimal) or compare against pl.Decimal(…, …)), otherwise decimal columns may error or produce unexpected results.

Copilot uses AI. Check for mistakes.
Comment on lines +1944 to +1950
exprs = [get_expr(col) for col in column_name]
aggs = df.select(exprs).to_dict(as_series=False)

if len(column_name) == 1:
result = values[column_name[0]] or default_value
result = aggs[column_name[0]][0] or default_value
else:
result = values
result = {col: aggs[col][0] for col in column_name}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result = aggs[column_name[0]][0] or default_value treats valid falsy aggregates (e.g. 0) as “missing” and replaces them with default_value. Use an explicit is None check (or pd.isna) so legitimate 0/0.0 results are preserved.

Copilot uses AI. Check for mistakes.
create_model_bpa_semantic_model,
)
from ._model_bpa import run_model_bpa
from ._fix_model_bpa import fix_model_bpa
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init__.py now imports fix_model_bpa from ._fix_model_bpa, but src/sempy_labs/_fix_model_bpa.py is not present in the package. This will cause an ImportError on import sempy_labs. Ensure the module is added in this PR (or gate the import / remove it until it exists).

Suggested change
from ._fix_model_bpa import fix_model_bpa
try:
from ._fix_model_bpa import fix_model_bpa
except ImportError:
fix_model_bpa = None

Copilot uses AI. Check for mistakes.
Comment on lines 353 to +360
from ._catalog import (
list_endorsements,
list_favorites,
)
from ._item_recovery import (
list_recoverable_items,
recover_item,
permanently_delete_item,
)
from ._items import (
bulk_export_items,
bulk_import_items,
)
from ._pbi_fixer import pbi_fixer

__all__ = [
"bulk_export_items",
"bulk_import_items",
"list_recoverable_items",
"recover_item",
"permanently_delete_item",
"pbi_fixer",
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init__.py now imports pbi_fixer from ._pbi_fixer, but src/sempy_labs/_pbi_fixer.py is not present in the package. This will cause an ImportError on import sempy_labs. Either include the module in this PR or avoid importing/exporting it from the top-level until it exists.

Copilot uses AI. Check for mistakes.
Comment on lines 350 to 361
from ._takeover import (
takeover_item_ownership,
)
from ._catalog import (
list_endorsements,
list_favorites,
)
from ._item_recovery import (
list_recoverable_items,
recover_item,
permanently_delete_item,
)
from ._items import (
bulk_export_items,
bulk_import_items,
)
from ._pbi_fixer import pbi_fixer

__all__ = [
"bulk_export_items",
"bulk_import_items",
"list_recoverable_items",
"recover_item",
"permanently_delete_item",
"pbi_fixer",
"resolve_warehouse_id",
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level exports for bulk_export_items, bulk_import_items, and the item recovery helpers were removed from sempy_labs.__init__ (__all__ and the corresponding imports). This is a breaking API change for users doing import sempy_labs as sll; sll.bulk_export_items(...), and it conflicts with the PR’s “backward-compatible” claim. If the intent is backward compatibility, keep re-exporting these symbols (even if you also add newer entry points like pbi_fixer).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants