Skip to content

Add auto-measure generators: MeasuresFromColumns and PY measures#1149

Open
KornAlexander wants to merge 1 commit intomicrosoft:mainfrom
KornAlexander:feature/add-measures-auto
Open

Add auto-measure generators: MeasuresFromColumns and PY measures#1149
KornAlexander wants to merge 1 commit intomicrosoft:mainfrom
KornAlexander:feature/add-measures-auto

Conversation

@KornAlexander
Copy link
Copy Markdown

Add Auto-Generated Measures

Two functions that auto-generate measures from a semantic model's structure: one creates SUM/COUNT/etc. measures from columns based on their SummarizeBy property, and the other creates Prior Year (PY) time intelligence measures for existing measures.

Functions Added

Function Description
add_measures_from_columns(dataset, workspace=None, target_table=None, scan_only=False) Creates measures from columns based on their SummarizeBy property (e.g., a column with SummarizeBy=Sum gets a SUM([Column]) measure).
add_py_measures(dataset, workspace=None, measures=None, calendar_table=None, date_column=None, target_table=None, scan_only=False) Creates Prior Year (PY) time intelligence measures for each specified measure using CALCULATE + SAMEPERIODLASTYEAR.

Files

  • src/sempy_labs/semantic_model/_Add_MeasuresFromColumns.py (new file)
  • src/sempy_labs/semantic_model/_Add_PYMeasures.py (new file)
  • src/sempy_labs/semantic_model/__init__.py (updated exports)

Usage

import sempy_labs as labs

# Preview which measures would be created from columns
labs.semantic_model.add_measures_from_columns("My Dataset", scan_only=True)

# Create measures and place them in a specific table
labs.semantic_model.add_measures_from_columns("My Dataset", target_table="Measures")

# Add PY measures for specific measures
labs.semantic_model.add_py_measures("My Dataset", measures=["Total Sales", "Total Cost"])

PBI Fixer Contribution — Overview

This PR is part of the PBI Fixer contribution to semantic-link-labs — an interactive ipywidgets-based UI for scanning and fixing Power BI reports and semantic models directly in Microsoft Fabric Notebooks.

The PBI Fixer provides a tabbed ipywidgets interface (Semantic Model Explorer, Report Explorer, Perspective Editor, Vertipaq Analyzer) that lets users interactively scan, inspect, and fix Power BI artifacts without leaving the notebook. All underlying fixer functions also work as standalone API calls, so users can integrate them into scripts and pipelines without the UI.

Contribution Structure

The full contribution (~17K lines across 68 files) is split into 22 focused PRs across 6 phases to keep each PR reviewable and self-contained. Only new files are added in Phases 1–4 and 6 — no existing SLL code is modified.

Phase Focus PRs Description
1 Report Fixers 7 Standalone functions that programmatically fix common Power BI report issues: replace pie charts with bar charts, standardize page sizes to Full HD, apply chart formatting best practices, migrate slicers to slicerbars, hide visual-level filters, clean up unused custom visuals, align visuals, migrate report-level measures, and upgrade reports to PBIR format. Each function operates on PBIR-format report definitions.
2 Semantic Model Fixers 4 Functions that fix and enhance semantic models via XMLA/TOM: add calculated calendar and measure tables, add calculation groups for units and time intelligence, discourage implicit measures, and 19 BPA auto-fixers covering formatting conventions, naming standards, data types, column visibility, sort order, and DAX patterns (e.g., use DIVIDE instead of /).
3 SM Setup & Analysis 3 Setup utilities: configure cache warming queries, set up incremental refresh policies, and prepare semantic models for AI/Copilot integration (descriptions, metadata enrichment).
4 Report Utilities 3 Report-level utilities: auto-generate report page prototypes from a semantic model's structure, extract and apply report themes, and generate IBCS-compliant variance charts.
5 Upstream Enhancements 3 ⚠️ These PRs modify existing SLL code (unlike Phases 1–4 which only add new files). Changes include TOM model .Find() fixes and expression capture (tom/_model.py), Vertipaq analyzer enhancements with memory/column-level analysis (_vertipaq.py, ~1000 lines changed), and various small fixes across _items.py, _item_recovery.py, _helper_functions.py, _export_report.py, _sql.py, and admin/_tenant.py. These carry higher merge conflict risk and may need closer review or discussion.
6 PBI Fixer UI 2 The interactive UI layer: shared UI components (theme, icons, tree builders, layout helpers), BPA scan runners, report helpers, and the main PBI Fixer application with its tabbed interface (SM Explorer, Report Explorer, Perspective Editor, Vertipaq Analyzer). Depends on Phases 1–5 but uses lazy imports to degrade gracefully if individual fixers aren't yet merged.

Dependencies & Review Order

  • Phases 1–4 are fully independent — they only add new files and can be reviewed/merged in any order.
  • Phase 5 is also independent but modifies existing code, so it may benefit from early discussion.
  • Phase 6 (the UI) ties everything together. It depends on the earlier phases but works standalone via lazy imports.
  • All fixer functions work without the UI — they can be called directly as sempy_labs.report.fix_piecharts(...) or sempy_labs.semantic_model.add_calculated_calendar(...).

Copilot AI review requested due to automatic review settings April 9, 2026 16:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new semantic model “fixer” utilities to auto-generate measures via TOM/XMLA: one based on column SummarizeBy settings and one generating Prior Year (PY) time-intelligence variants for existing measures.

Changes:

  • Added add_measures_from_columns() to create aggregation measures from columns and optionally hide source columns.
  • Added add_py_measures() to generate PY, variance, and highlight measures using CALCULATE + SAMEPERIODLASTYEAR.
  • (Per PR description) intended to update semantic_model exports/usage surface, but current codebase state still needs alignment.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.

File Description
src/sempy_labs/semantic_model/_Add_MeasuresFromColumns.py New function to generate measures from column summarization metadata.
src/sempy_labs/semantic_model/_Add_PYMeasures.py New function to generate PY measure variants for selected/all measures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +11 to +19
@log
def add_py_measures(
dataset: str | UUID,
workspace: Optional[str | UUID] = None,
measures: Optional[List[str]] = None,
calendar_table: Optional[str] = None,
date_column: Optional[str] = None,
target_table: Optional[str] = None,
scan_only: bool = False,
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description shows usage as labs.semantic_model.add_py_measures(...), but the current semantic_model/__init__.py does not export add_py_measures / add_measures_from_columns (and sempy_labs.__init__ doesn’t import the semantic_model subpackage). As-is, the documented import path won’t work unless callers import the module directly; please either update the relevant __init__.py exports/imports or adjust the documentation/usage examples accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +45
calendar_table : str, default=None
Name of the calendar/date table. If None, auto-detects
by looking for a table with DataCategory="Time" or IsKey column.
date_column : str, default=None
Name of the date column. If None, auto-detects the key column.
target_table : str, default=None
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calendar_table auto-detection in the docstring mentions falling back to a table with an IsKey column, but the implementation only checks DataCategory == "Time". Either implement the documented behavior (e.g., prefer tom.all_date_tables() / look for tables with a key DateTime column) or update the docstring so users aren’t misled.

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +111
else:
# Auto-detect measure table by name
for t in tom.model.Tables:
if "measure" in t.Name.lower():
dest_table_obj = t
print(f"{icons.info} Auto-detected measure table: '{t.Name}'")
break
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default target_table behavior in the docstring says new measures are added to the source measure’s table, but when target_table is None the code auto-detects a "measure" table and uses it for all variants. Please make the behavior match the docstring (or update the docstring).

Suggested change
else:
# Auto-detect measure table by name
for t in tom.model.Tables:
if "measure" in t.Name.lower():
dest_table_obj = t
print(f"{icons.info} Auto-detected measure table: '{t.Name}'")
break
# When target_table is not provided, leave dest_table_obj as None so
# downstream logic uses each source measure's table by default.

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +90
# Auto-detect date column
dt_col = None
if date_column:
dt_col = date_column
else:
for c in cal_table.Columns:
if getattr(c, "IsKey", False):
dt_col = c.Name
break
if dt_col is None:
for c in cal_table.Columns:
if "date" in c.Name.lower():
dt_col = c.Name
break
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When date_column is provided, it’s assigned directly to dt_col without verifying that the column exists on cal_table (or that it’s a DateTime/key column). This can create measures that reference a non-existent column and only fail later at query time; validate the column (and ideally its data type) and raise a clear error if invalid.

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +75
if cal_table is None:
print(f"{icons.red_dot} No calendar table found. Specify calendar_table parameter.")
return 0
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error cases (e.g., no calendar table found) currently print(...) and return 0. In the rest of the codebase, invalid user input typically raises ValueError (often prefixed with icons.red_dot), which is easier for callers to detect/handle programmatically than a sentinel return value; consider raising instead of returning 0 here.

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +137
variants = [
(f"{name} PY", f"CALCULATE([{name}], SAMEPERIODLASTYEAR('{cal_name}'[{dt_col}]))"),
(f"{name} \u0394 PY", f"[{name}] - [{name} PY]"),
(f"{name} \u0394 PY %", f"DIVIDE([{name}] - [{name} PY], [{name}])"),
(f"{name} Max Green PY", f"IF([{name} \u0394 PY] > 0, MAX([{name}], [{name} PY]))"),
(f"{name} Max Red AC", f"IF([{name} \u0394 PY] < 0, MAX([{name}], [{name} PY]))"),
]
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All generated variants inherit the source measure’s FormatString, including the ratio measure {name} Δ PY %. Since that measure returns a percentage, inheriting a currency/decimal format string will typically display incorrect units. Consider applying a percent format string for the % measure (and potentially a separate format for the absolute variance measures) instead of reusing fmt for all variants.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +70
Creates measures from columns based on their SummarizeBy property.

For each column where SummarizeBy is not "None", a measure is created
using the appropriate aggregation (SUM, COUNT, MIN, MAX, etc.).
The source column is hidden after measure creation.

Parameters
----------
dataset : str | uuid.UUID
Name or ID of the semantic model.
workspace : str | uuid.UUID, default=None
The Fabric workspace name or ID.
target_table : str, default=None
Table to place new measures in. If None, measures are added to
the same table as the source column.
scan_only : bool, default=False
If True, only reports what would be created without making changes.

Returns
-------
int
Number of measures created (or that would be created in scan mode).
"""
from sempy_labs.tom import connect_semantic_model

created = 0

with connect_semantic_model(
dataset=dataset, readonly=scan_only, workspace=workspace
) as tom:
# Resolve target table if specified
measures_table = None
if target_table:
measures_table = tom.model.Tables.Find(target_table)
if measures_table is None:
print(f"{icons.red_dot} Target table '{target_table}' not found.")
return 0
else:
# Auto-detect measure table by name
for t in tom.model.Tables:
if "measure" in t.Name.lower():
measures_table = t
print(f"{icons.info} Auto-detected measure table: '{t.Name}'")
break

for table in tom.model.Tables:
for col in table.Columns:
summarize_by = str(col.SummarizeBy) if hasattr(col, "SummarizeBy") else "None"
if summarize_by == "None" or summarize_by == "Default":
continue

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says measures are created for columns where SummarizeBy is not "None", but the implementation also skips SummarizeBy == "Default". Either include Default in the supported behavior (e.g., map it to the column’s default summarization) or update the docstring so it matches what the function actually does.

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +74
for table in tom.model.Tables:
for col in table.Columns:
summarize_by = str(col.SummarizeBy) if hasattr(col, "SummarizeBy") else "None"
if summarize_by == "None" or summarize_by == "Default":
continue

agg_fn = summarize_by.upper()
measure_name = col.Name
dax_expr = f"{agg_fn}('{table.Name}'[{col.Name}])"
dest_table = measures_table or table
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SummarizeBy values like "Count" are translated directly into COUNT('Table'[Column]). In DAX, COUNT only works for numeric/date columns; for text/boolean columns it needs COUNTA, and for distinct counts DISTINCTCOUNT is correct. Without checking col.DataType and choosing the correct aggregation function, this can generate measures that are invalid or error at query time for many common column types.

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +99
tom.add_measure(
table_name=dest_table.Name,
measure_name=measure_name,
expression=dax_expr,
format_string="0.0",
description=(
f"Auto-created {agg_fn} measure from column "
f"'{table.Name}'[{col.Name}]"
),
display_folder=table.Name,
)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All auto-created measures are assigned format_string="0.0", regardless of aggregation type or source column formatting. This can mis-format counts (should be integer) and overwrite existing numeric/currency/date formats that users expect. Consider deriving the format string from the source column (or choosing per-aggregation defaults) instead of hardcoding "0.0".

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +56
print(f"{icons.red_dot} Target table '{target_table}' not found.")
return 0
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to other invalid-parameter cases in this repo, returning 0 after printing an error (e.g., when target_table is not found) makes it hard for callers to distinguish “no work needed” from “failed due to bad input”. Consider raising ValueError for invalid target_table instead of printing + returning 0.

Suggested change
print(f"{icons.red_dot} Target table '{target_table}' not found.")
return 0
raise ValueError(
f"Target table '{target_table}' not found."
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants