1168 by m-kovalsky · Pull Request #1169 · microsoft/semantic-link-labs

m-kovalsky · 2026-04-14T05:35:32Z

Copilot

Pull request overview

Fixes issue #1168 where vertipaq_analyzer() returns "Model Summary"["Total Size"] as a string, preventing numeric operations, by introducing a typed (non-formatted) return path.

Changes:

Import and use _update_dataframe_datatypes() to cast returned DataFrame columns based on vertipaq_map data types.
Return a new final_dict of DataFrames (int/float/bool/string-typed) instead of the display-formatted dfs[...].
Attempt to switch the export == "table" path to use final_dict (currently inconsistent/broken).

Comments suppressed due to low confidence (1)

src/sempy_labs/semantic_model/_vertipaq_analyzer.py:1053

export == "table" path is now broken: final_dict stores DataFrames directly (e.g., final_dict[title] = df), but df_map treats each entry as a dict and does final_dict[k]["data"]. Also the keys used in df_map include "Model" while final_dict uses "Model Summary". This will raise at runtime (KeyError/TypeError) and prevent table export. Suggestion: either keep using dfs = create_dfs(...) for export, or change final_dict to use consistent keys and store the same { "data": df, ... } structure expected by df_map.

    if export == "table":
        #dfs = create_dfs(column_formatting="data_type")

        print(
            f"{icons.in_progress} Saving Vertipaq Analyzer to delta tables in the lakehouse...\n"
        )

        now = datetime.datetime.now()

        # Dataset metadata
        df_datasets = fabric.list_datasets(workspace=workspace_id, mode="rest")
        configured_by = df_datasets.loc[
            df_datasets["Dataset Id"] == dataset_id, "Configured By"
        ].iloc[0]

        (capacity_id, capacity_name) = resolve_workspace_capacity(
            workspace=workspace_id
        )

        base_metadata = {
            "Capacity Name": capacity_name,
            "Capacity Id": capacity_id,
            "Workspace Name": workspace_name,
            "Workspace Id": workspace_id,
            "Dataset Name": dataset_name,
            "Dataset Id": dataset_id,
            "Configured By": configured_by,
            "RunId": run_id,
            "Timestamp": now,
        }

        df_map = {
            k: final_dict[k]["data"]
            for k in [
                "Columns",
                "Tables",
                "Partitions",
                "Relationships",
                "Hierarchies",
                "Model",
            ]
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T05:44:12Z

+        #dfs = create_dfs(column_formatting="data_type")



There is commented-out code left in the export branch (#dfs = create_dfs(column_formatting="data_type")). Since the function’s return/export logic is being reworked, this should be either removed or reinstated with the correct implementation to avoid confusion and keep the export path maintainable.

Suggested change

#dfs = create_dfs(column_formatting="data_type")

Copilot · 2026-04-14T05:44:12Z

+    # Prepare output for returned dictionary of dataframes and for exported dataframes
+    dtype_map = {"string": "string", "long": "int", "double": "float", "bool": "bool"}
+    return_sections = {
+        "Model": "Model Summary",
+        "Tables": "Tables",
+        "Partitions": "Partitions",
+        "Columns": "Columns",
+        "Relationships": "Relationships",
+        "Hierarchies": "Hierarchies",
+    }
+    final_dict = {}
+    for name, title in return_sections.items():
+        items = config[name]
+        data = items.get("data")
+        sort_col = items.get("sortby")
+        df = pd.DataFrame(data, columns=list(vertipaq_map[name].keys()))
+        if sort_col and sort_col in df.columns:
+            df = df.sort_values(by=sort_col, ascending=False).reset_index(drop=True)
+        col_types = {


In the export is None path, the code now builds DataFrames twice: once in the new final_dict loop and again in create_dfs() for visualization. For large models (especially the Columns section) this doubles DataFrame construction/sorting work. Consider deriving the returned DataFrames from the already-built dfs (or vice-versa) by copying before formatting, so the raw-typed and display-formatted outputs share the same underlying DataFrame build/cleanup steps.

Copilot · 2026-04-14T05:44:13Z

+    # Prepare output for returned dictionary of dataframes and for exported dataframes
+    dtype_map = {"string": "string", "long": "int", "double": "float", "bool": "bool"}
+    return_sections = {
+        "Model": "Model Summary",
+        "Tables": "Tables",
+        "Partitions": "Partitions",
+        "Columns": "Columns",
+        "Relationships": "Relationships",
+        "Hierarchies": "Hierarchies",
+    }
+    final_dict = {}
+    for name, title in return_sections.items():
+        items = config[name]
+        data = items.get("data")
+        sort_col = items.get("sortby")
+        df = pd.DataFrame(data, columns=list(vertipaq_map[name].keys()))
+        if sort_col and sort_col in df.columns:
+            df = df.sort_values(by=sort_col, ascending=False).reset_index(drop=True)
+        col_types = {
+            k: dtype_map.get(v["data_type"], "string")
+            for k, v in vertipaq_map[name].items()
+            if k in df.columns
+        }
+        _update_dataframe_datatypes(df, col_types)
+        final_dict[title] = df
+


The new final_dict construction duplicates the DataFrame-building logic from create_dfs() but does not apply the same schema/row cleanup (e.g., filtering out Type == "RowNumber", dropping "Source Column", removing Direct Lake-only partition columns when not Direct Lake, and removing "Missing Rows" when read_stats_from_data is false). This changes the returned DataFrames compared to what is visualized/previously returned and can reintroduce unwanted rows/columns. Suggestion: reuse the existing cleanup logic (refactor it into a shared helper or run the same filtering/dropping steps before storing each df in final_dict).

1168

4d726c0

Copilot AI review requested due to automatic review settings April 14, 2026 05:35

Copilot started reviewing on behalf of m-kovalsky April 14, 2026 05:39 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1168#1169

1168#1169
m-kovalsky wants to merge 1 commit intomicrosoft:mainfrom
m-kovalsky:m-kovalsky/1168

m-kovalsky commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

m-kovalsky commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants