Add auto-measure generators: MeasuresFromColumns and PY measures#1149
Add auto-measure generators: MeasuresFromColumns and PY measures#1149KornAlexander wants to merge 1 commit intomicrosoft:mainfrom
Conversation
… and add PY time intelligence measures
There was a problem hiding this comment.
Pull request overview
Adds new semantic model “fixer” utilities to auto-generate measures via TOM/XMLA: one based on column SummarizeBy settings and one generating Prior Year (PY) time-intelligence variants for existing measures.
Changes:
- Added
add_measures_from_columns()to create aggregation measures from columns and optionally hide source columns. - Added
add_py_measures()to generate PY, variance, and highlight measures usingCALCULATE+SAMEPERIODLASTYEAR. - (Per PR description) intended to update semantic_model exports/usage surface, but current codebase state still needs alignment.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| src/sempy_labs/semantic_model/_Add_MeasuresFromColumns.py | New function to generate measures from column summarization metadata. |
| src/sempy_labs/semantic_model/_Add_PYMeasures.py | New function to generate PY measure variants for selected/all measures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @log | ||
| def add_py_measures( | ||
| dataset: str | UUID, | ||
| workspace: Optional[str | UUID] = None, | ||
| measures: Optional[List[str]] = None, | ||
| calendar_table: Optional[str] = None, | ||
| date_column: Optional[str] = None, | ||
| target_table: Optional[str] = None, | ||
| scan_only: bool = False, |
There was a problem hiding this comment.
The PR description shows usage as labs.semantic_model.add_py_measures(...), but the current semantic_model/__init__.py does not export add_py_measures / add_measures_from_columns (and sempy_labs.__init__ doesn’t import the semantic_model subpackage). As-is, the documented import path won’t work unless callers import the module directly; please either update the relevant __init__.py exports/imports or adjust the documentation/usage examples accordingly.
| calendar_table : str, default=None | ||
| Name of the calendar/date table. If None, auto-detects | ||
| by looking for a table with DataCategory="Time" or IsKey column. | ||
| date_column : str, default=None | ||
| Name of the date column. If None, auto-detects the key column. | ||
| target_table : str, default=None |
There was a problem hiding this comment.
calendar_table auto-detection in the docstring mentions falling back to a table with an IsKey column, but the implementation only checks DataCategory == "Time". Either implement the documented behavior (e.g., prefer tom.all_date_tables() / look for tables with a key DateTime column) or update the docstring so users aren’t misled.
| else: | ||
| # Auto-detect measure table by name | ||
| for t in tom.model.Tables: | ||
| if "measure" in t.Name.lower(): | ||
| dest_table_obj = t | ||
| print(f"{icons.info} Auto-detected measure table: '{t.Name}'") | ||
| break |
There was a problem hiding this comment.
Default target_table behavior in the docstring says new measures are added to the source measure’s table, but when target_table is None the code auto-detects a "measure" table and uses it for all variants. Please make the behavior match the docstring (or update the docstring).
| else: | |
| # Auto-detect measure table by name | |
| for t in tom.model.Tables: | |
| if "measure" in t.Name.lower(): | |
| dest_table_obj = t | |
| print(f"{icons.info} Auto-detected measure table: '{t.Name}'") | |
| break | |
| # When target_table is not provided, leave dest_table_obj as None so | |
| # downstream logic uses each source measure's table by default. |
| # Auto-detect date column | ||
| dt_col = None | ||
| if date_column: | ||
| dt_col = date_column | ||
| else: | ||
| for c in cal_table.Columns: | ||
| if getattr(c, "IsKey", False): | ||
| dt_col = c.Name | ||
| break | ||
| if dt_col is None: | ||
| for c in cal_table.Columns: | ||
| if "date" in c.Name.lower(): | ||
| dt_col = c.Name | ||
| break |
There was a problem hiding this comment.
When date_column is provided, it’s assigned directly to dt_col without verifying that the column exists on cal_table (or that it’s a DateTime/key column). This can create measures that reference a non-existent column and only fail later at query time; validate the column (and ideally its data type) and raise a clear error if invalid.
| if cal_table is None: | ||
| print(f"{icons.red_dot} No calendar table found. Specify calendar_table parameter.") | ||
| return 0 |
There was a problem hiding this comment.
Error cases (e.g., no calendar table found) currently print(...) and return 0. In the rest of the codebase, invalid user input typically raises ValueError (often prefixed with icons.red_dot), which is easier for callers to detect/handle programmatically than a sentinel return value; consider raising instead of returning 0 here.
| variants = [ | ||
| (f"{name} PY", f"CALCULATE([{name}], SAMEPERIODLASTYEAR('{cal_name}'[{dt_col}]))"), | ||
| (f"{name} \u0394 PY", f"[{name}] - [{name} PY]"), | ||
| (f"{name} \u0394 PY %", f"DIVIDE([{name}] - [{name} PY], [{name}])"), | ||
| (f"{name} Max Green PY", f"IF([{name} \u0394 PY] > 0, MAX([{name}], [{name} PY]))"), | ||
| (f"{name} Max Red AC", f"IF([{name} \u0394 PY] < 0, MAX([{name}], [{name} PY]))"), | ||
| ] |
There was a problem hiding this comment.
All generated variants inherit the source measure’s FormatString, including the ratio measure {name} Δ PY %. Since that measure returns a percentage, inheriting a currency/decimal format string will typically display incorrect units. Consider applying a percent format string for the % measure (and potentially a separate format for the absolute variance measures) instead of reusing fmt for all variants.
| Creates measures from columns based on their SummarizeBy property. | ||
|
|
||
| For each column where SummarizeBy is not "None", a measure is created | ||
| using the appropriate aggregation (SUM, COUNT, MIN, MAX, etc.). | ||
| The source column is hidden after measure creation. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| dataset : str | uuid.UUID | ||
| Name or ID of the semantic model. | ||
| workspace : str | uuid.UUID, default=None | ||
| The Fabric workspace name or ID. | ||
| target_table : str, default=None | ||
| Table to place new measures in. If None, measures are added to | ||
| the same table as the source column. | ||
| scan_only : bool, default=False | ||
| If True, only reports what would be created without making changes. | ||
|
|
||
| Returns | ||
| ------- | ||
| int | ||
| Number of measures created (or that would be created in scan mode). | ||
| """ | ||
| from sempy_labs.tom import connect_semantic_model | ||
|
|
||
| created = 0 | ||
|
|
||
| with connect_semantic_model( | ||
| dataset=dataset, readonly=scan_only, workspace=workspace | ||
| ) as tom: | ||
| # Resolve target table if specified | ||
| measures_table = None | ||
| if target_table: | ||
| measures_table = tom.model.Tables.Find(target_table) | ||
| if measures_table is None: | ||
| print(f"{icons.red_dot} Target table '{target_table}' not found.") | ||
| return 0 | ||
| else: | ||
| # Auto-detect measure table by name | ||
| for t in tom.model.Tables: | ||
| if "measure" in t.Name.lower(): | ||
| measures_table = t | ||
| print(f"{icons.info} Auto-detected measure table: '{t.Name}'") | ||
| break | ||
|
|
||
| for table in tom.model.Tables: | ||
| for col in table.Columns: | ||
| summarize_by = str(col.SummarizeBy) if hasattr(col, "SummarizeBy") else "None" | ||
| if summarize_by == "None" or summarize_by == "Default": | ||
| continue | ||
|
|
There was a problem hiding this comment.
The docstring says measures are created for columns where SummarizeBy is not "None", but the implementation also skips SummarizeBy == "Default". Either include Default in the supported behavior (e.g., map it to the column’s default summarization) or update the docstring so it matches what the function actually does.
| for table in tom.model.Tables: | ||
| for col in table.Columns: | ||
| summarize_by = str(col.SummarizeBy) if hasattr(col, "SummarizeBy") else "None" | ||
| if summarize_by == "None" or summarize_by == "Default": | ||
| continue | ||
|
|
||
| agg_fn = summarize_by.upper() | ||
| measure_name = col.Name | ||
| dax_expr = f"{agg_fn}('{table.Name}'[{col.Name}])" | ||
| dest_table = measures_table or table |
There was a problem hiding this comment.
SummarizeBy values like "Count" are translated directly into COUNT('Table'[Column]). In DAX, COUNT only works for numeric/date columns; for text/boolean columns it needs COUNTA, and for distinct counts DISTINCTCOUNT is correct. Without checking col.DataType and choosing the correct aggregation function, this can generate measures that are invalid or error at query time for many common column types.
| tom.add_measure( | ||
| table_name=dest_table.Name, | ||
| measure_name=measure_name, | ||
| expression=dax_expr, | ||
| format_string="0.0", | ||
| description=( | ||
| f"Auto-created {agg_fn} measure from column " | ||
| f"'{table.Name}'[{col.Name}]" | ||
| ), | ||
| display_folder=table.Name, | ||
| ) |
There was a problem hiding this comment.
All auto-created measures are assigned format_string="0.0", regardless of aggregation type or source column formatting. This can mis-format counts (should be integer) and overwrite existing numeric/currency/date formats that users expect. Consider deriving the format string from the source column (or choosing per-aggregation defaults) instead of hardcoding "0.0".
| print(f"{icons.red_dot} Target table '{target_table}' not found.") | ||
| return 0 |
There was a problem hiding this comment.
Similar to other invalid-parameter cases in this repo, returning 0 after printing an error (e.g., when target_table is not found) makes it hard for callers to distinguish “no work needed” from “failed due to bad input”. Consider raising ValueError for invalid target_table instead of printing + returning 0.
| print(f"{icons.red_dot} Target table '{target_table}' not found.") | |
| return 0 | |
| raise ValueError( | |
| f"Target table '{target_table}' not found." | |
| ) |
Add Auto-Generated Measures
Two functions that auto-generate measures from a semantic model's structure: one creates SUM/COUNT/etc. measures from columns based on their
SummarizeByproperty, and the other creates Prior Year (PY) time intelligence measures for existing measures.Functions Added
add_measures_from_columns(dataset, workspace=None, target_table=None, scan_only=False)SUM([Column])measure).add_py_measures(dataset, workspace=None, measures=None, calendar_table=None, date_column=None, target_table=None, scan_only=False)Files
src/sempy_labs/semantic_model/_Add_MeasuresFromColumns.py(new file)src/sempy_labs/semantic_model/_Add_PYMeasures.py(new file)src/sempy_labs/semantic_model/__init__.py(updated exports)Usage
PBI Fixer Contribution — Overview
This PR is part of the PBI Fixer contribution to semantic-link-labs — an interactive ipywidgets-based UI for scanning and fixing Power BI reports and semantic models directly in Microsoft Fabric Notebooks.
The PBI Fixer provides a tabbed ipywidgets interface (Semantic Model Explorer, Report Explorer, Perspective Editor, Vertipaq Analyzer) that lets users interactively scan, inspect, and fix Power BI artifacts without leaving the notebook. All underlying fixer functions also work as standalone API calls, so users can integrate them into scripts and pipelines without the UI.
Contribution Structure
The full contribution (~17K lines across 68 files) is split into 22 focused PRs across 6 phases to keep each PR reviewable and self-contained. Only new files are added in Phases 1–4 and 6 — no existing SLL code is modified.
/)..Find()fixes and expression capture (tom/_model.py), Vertipaq analyzer enhancements with memory/column-level analysis (_vertipaq.py, ~1000 lines changed), and various small fixes across_items.py,_item_recovery.py,_helper_functions.py,_export_report.py,_sql.py, andadmin/_tenant.py. These carry higher merge conflict risk and may need closer review or discussion.Dependencies & Review Order
sempy_labs.report.fix_piecharts(...)orsempy_labs.semantic_model.add_calculated_calendar(...).