Skip to content

1168#1169

Open
m-kovalsky wants to merge 1 commit intomicrosoft:mainfrom
m-kovalsky:m-kovalsky/1168
Open

1168#1169
m-kovalsky wants to merge 1 commit intomicrosoft:mainfrom
m-kovalsky:m-kovalsky/1168

Conversation

@m-kovalsky
Copy link
Copy Markdown
Collaborator

Copilot AI review requested due to automatic review settings April 14, 2026 05:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes issue #1168 where vertipaq_analyzer() returns "Model Summary"["Total Size"] as a string, preventing numeric operations, by introducing a typed (non-formatted) return path.

Changes:

  • Import and use _update_dataframe_datatypes() to cast returned DataFrame columns based on vertipaq_map data types.
  • Return a new final_dict of DataFrames (int/float/bool/string-typed) instead of the display-formatted dfs[...].
  • Attempt to switch the export == "table" path to use final_dict (currently inconsistent/broken).
Comments suppressed due to low confidence (1)

src/sempy_labs/semantic_model/_vertipaq_analyzer.py:1053

  • export == "table" path is now broken: final_dict stores DataFrames directly (e.g., final_dict[title] = df), but df_map treats each entry as a dict and does final_dict[k]["data"]. Also the keys used in df_map include "Model" while final_dict uses "Model Summary". This will raise at runtime (KeyError/TypeError) and prevent table export. Suggestion: either keep using dfs = create_dfs(...) for export, or change final_dict to use consistent keys and store the same { "data": df, ... } structure expected by df_map.
    if export == "table":
        #dfs = create_dfs(column_formatting="data_type")

        print(
            f"{icons.in_progress} Saving Vertipaq Analyzer to delta tables in the lakehouse...\n"
        )

        now = datetime.datetime.now()

        # Dataset metadata
        df_datasets = fabric.list_datasets(workspace=workspace_id, mode="rest")
        configured_by = df_datasets.loc[
            df_datasets["Dataset Id"] == dataset_id, "Configured By"
        ].iloc[0]

        (capacity_id, capacity_name) = resolve_workspace_capacity(
            workspace=workspace_id
        )

        base_metadata = {
            "Capacity Name": capacity_name,
            "Capacity Id": capacity_id,
            "Workspace Name": workspace_name,
            "Workspace Id": workspace_id,
            "Dataset Name": dataset_name,
            "Dataset Id": dataset_id,
            "Configured By": configured_by,
            "RunId": run_id,
            "Timestamp": now,
        }

        df_map = {
            k: final_dict[k]["data"]
            for k in [
                "Columns",
                "Tables",
                "Partitions",
                "Relationships",
                "Hierarchies",
                "Model",
            ]
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1013 to 1014
#dfs = create_dfs(column_formatting="data_type")

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is commented-out code left in the export branch (#dfs = create_dfs(column_formatting="data_type")). Since the function’s return/export logic is being reworked, this should be either removed or reinstated with the correct implementation to avoid confusion and keep the export path maintainable.

Suggested change
#dfs = create_dfs(column_formatting="data_type")

Copilot uses AI. Check for mistakes.
Comment on lines +974 to +992
# Prepare output for returned dictionary of dataframes and for exported dataframes
dtype_map = {"string": "string", "long": "int", "double": "float", "bool": "bool"}
return_sections = {
"Model": "Model Summary",
"Tables": "Tables",
"Partitions": "Partitions",
"Columns": "Columns",
"Relationships": "Relationships",
"Hierarchies": "Hierarchies",
}
final_dict = {}
for name, title in return_sections.items():
items = config[name]
data = items.get("data")
sort_col = items.get("sortby")
df = pd.DataFrame(data, columns=list(vertipaq_map[name].keys()))
if sort_col and sort_col in df.columns:
df = df.sort_values(by=sort_col, ascending=False).reset_index(drop=True)
col_types = {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the export is None path, the code now builds DataFrames twice: once in the new final_dict loop and again in create_dfs() for visualization. For large models (especially the Columns section) this doubles DataFrame construction/sorting work. Consider deriving the returned DataFrames from the already-built dfs (or vice-versa) by copying before formatting, so the raw-typed and display-formatted outputs share the same underlying DataFrame build/cleanup steps.

Copilot uses AI. Check for mistakes.
Comment on lines +974 to +999
# Prepare output for returned dictionary of dataframes and for exported dataframes
dtype_map = {"string": "string", "long": "int", "double": "float", "bool": "bool"}
return_sections = {
"Model": "Model Summary",
"Tables": "Tables",
"Partitions": "Partitions",
"Columns": "Columns",
"Relationships": "Relationships",
"Hierarchies": "Hierarchies",
}
final_dict = {}
for name, title in return_sections.items():
items = config[name]
data = items.get("data")
sort_col = items.get("sortby")
df = pd.DataFrame(data, columns=list(vertipaq_map[name].keys()))
if sort_col and sort_col in df.columns:
df = df.sort_values(by=sort_col, ascending=False).reset_index(drop=True)
col_types = {
k: dtype_map.get(v["data_type"], "string")
for k, v in vertipaq_map[name].items()
if k in df.columns
}
_update_dataframe_datatypes(df, col_types)
final_dict[title] = df

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new final_dict construction duplicates the DataFrame-building logic from create_dfs() but does not apply the same schema/row cleanup (e.g., filtering out Type == "RowNumber", dropping "Source Column", removing Direct Lake-only partition columns when not Direct Lake, and removing "Missing Rows" when read_stats_from_data is false). This changes the returned DataFrames compared to what is visualized/previously returned and can reintroduce unwanted rows/columns. Suggestion: reuse the existing cleanup logic (refactor it into a shared helper or run the same filtering/dropping steps before storing each df in final_dict).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants