Skip to content
Open

1168 #1169

Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 31 additions & 10 deletions src/sempy_labs/semantic_model/_vertipaq_analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
_get_column_aggregate,
resolve_workspace_name_and_id,
resolve_dataset_name_and_id,
_update_dataframe_datatypes,
)
from sempy_labs.lakehouse._get_lakehouse_tables import get_lakehouse_tables
from typing import Optional, Literal
Expand Down Expand Up @@ -970,6 +971,32 @@ def create_dfs(column_formatting: str = "format"):
}
return dfs

# Prepare output for returned dictionary of dataframes and for exported dataframes
dtype_map = {"string": "string", "long": "int", "double": "float", "bool": "bool"}
return_sections = {
"Model": "Model Summary",
"Tables": "Tables",
"Partitions": "Partitions",
"Columns": "Columns",
"Relationships": "Relationships",
"Hierarchies": "Hierarchies",
}
final_dict = {}
for name, title in return_sections.items():
items = config[name]
data = items.get("data")
sort_col = items.get("sortby")
df = pd.DataFrame(data, columns=list(vertipaq_map[name].keys()))
if sort_col and sort_col in df.columns:
df = df.sort_values(by=sort_col, ascending=False).reset_index(drop=True)
col_types = {
Comment on lines +974 to +992
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the export is None path, the code now builds DataFrames twice: once in the new final_dict loop and again in create_dfs() for visualization. For large models (especially the Columns section) this doubles DataFrame construction/sorting work. Consider deriving the returned DataFrames from the already-built dfs (or vice-versa) by copying before formatting, so the raw-typed and display-formatted outputs share the same underlying DataFrame build/cleanup steps.

Copilot uses AI. Check for mistakes.
k: dtype_map.get(v["data_type"], "string")
for k, v in vertipaq_map[name].items()
if k in df.columns
}
_update_dataframe_datatypes(df, col_types)
final_dict[title] = df

Comment on lines +974 to +999
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new final_dict construction duplicates the DataFrame-building logic from create_dfs() but does not apply the same schema/row cleanup (e.g., filtering out Type == "RowNumber", dropping "Source Column", removing Direct Lake-only partition columns when not Direct Lake, and removing "Missing Rows" when read_stats_from_data is false). This changes the returned DataFrames compared to what is visualized/previously returned and can reintroduce unwanted rows/columns. Suggestion: reuse the existing cleanup logic (refactor it into a shared helper or run the same filtering/dropping steps before storing each df in final_dict).

Copilot uses AI. Check for mistakes.
if export is None:
dfs = create_dfs(column_formatting="format")
default_sort = {
Expand All @@ -978,18 +1005,12 @@ def create_dfs(column_formatting: str = "format"):
if items.get("sortby")
}
visualize_vertipaq(dfs, dataset_name, vertipaq_map, default_sort=default_sort)
return {
"Model Summary": dfs["Model"]["data"],
"Tables": dfs["Tables"]["data"],
"Partitions": dfs["Partitions"]["data"],
"Columns": dfs["Columns"]["data"],
"Relationships": dfs["Relationships"]["data"],
"Hierarchies": dfs["Hierarchies"]["data"],
}

return final_dict

# Export vertipaq to delta tables in lakehouse
if export == "table":
dfs = create_dfs(column_formatting="data_type")
#dfs = create_dfs(column_formatting="data_type")

Comment on lines +1013 to 1014
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is commented-out code left in the export branch (#dfs = create_dfs(column_formatting="data_type")). Since the function’s return/export logic is being reworked, this should be either removed or reinstated with the correct implementation to avoid confusion and keep the export path maintainable.

Suggested change
#dfs = create_dfs(column_formatting="data_type")

Copilot uses AI. Check for mistakes.
print(
f"{icons.in_progress} Saving Vertipaq Analyzer to delta tables in the lakehouse...\n"
Expand Down Expand Up @@ -1020,7 +1041,7 @@ def create_dfs(column_formatting: str = "format"):
}

df_map = {
k: dfs[k]["data"]
k: final_dict[k]["data"]
for k in [
"Columns",
"Tables",
Expand Down
Loading