Skip to content

Fix TOM model: .Find() fixes and expression capture#1158

Open
KornAlexander wants to merge 1 commit intomicrosoft:mainfrom
KornAlexander:feature/tom-model-fixes
Open

Fix TOM model: .Find() fixes and expression capture#1158
KornAlexander wants to merge 1 commit intomicrosoft:mainfrom
KornAlexander:feature/tom-model-fixes

Conversation

@KornAlexander
Copy link
Copy Markdown

TOM Model Fixes

⚠️ This PR modifies an existing SLL file (tom/_model.py). Unlike Phases 1–4 which only add new files, this changes existing code and may carry merge conflict risk.

Fixes and improvements to the TOM (Tabular Object Model) wrapper class in tom/_model.py. These changes were needed to support reliable model inspection and modification by the PBI Fixer UI and SM fixers.

Changes (~67 lines)

  • .Find() method fixes: Uses .Find() instead of bracket indexing for safer collection lookups that don't throw on missing items.
  • Expression capture: Captures and exposes table/partition expressions for display in the SM Explorer tree.
  • Minor robustness improvements: Better handling of edge cases in collection iteration.

Files

  • src/sempy_labs/tom/_model.py (modified)

Why These Changes Are Needed

The SM fixers and PBI Fixer UI rely on safely inspecting model objects (tables, columns, measures, relationships). The upstream .Find() pattern is more robust than bracket indexing when items may not exist. Expression capture is needed for the SM Explorer to display partition expressions and table definitions.


PBI Fixer Contribution — Overview

This PR is part of the PBI Fixer contribution to semantic-link-labs — an interactive ipywidgets-based UI for scanning and fixing Power BI reports and semantic models directly in Microsoft Fabric Notebooks.

The PBI Fixer provides a tabbed ipywidgets interface (Semantic Model Explorer, Report Explorer, Perspective Editor, Vertipaq Analyzer) that lets users interactively scan, inspect, and fix Power BI artifacts without leaving the notebook. All underlying fixer functions also work as standalone API calls, so users can integrate them into scripts and pipelines without the UI.

Contribution Structure

The full contribution (~17K lines across 68 files) is split into 22 focused PRs across 6 phases to keep each PR reviewable and self-contained. Only new files are added in Phases 1–4 and 6 — no existing SLL code is modified.

Phase Focus PRs Description
1 Report Fixers 7 Standalone functions that programmatically fix common Power BI report issues: replace pie charts with bar charts, standardize page sizes to Full HD, apply chart formatting best practices, migrate slicers to slicerbars, hide visual-level filters, clean up unused custom visuals, align visuals, migrate report-level measures, and upgrade reports to PBIR format. Each function operates on PBIR-format report definitions.
2 Semantic Model Fixers 4 Functions that fix and enhance semantic models via XMLA/TOM: add calculated calendar and measure tables, add calculation groups for units and time intelligence, discourage implicit measures, and 19 BPA auto-fixers covering formatting conventions, naming standards, data types, column visibility, sort order, and DAX patterns (e.g., use DIVIDE instead of /).
3 SM Setup & Analysis 3 Setup utilities: configure cache warming queries, set up incremental refresh policies, and prepare semantic models for AI/Copilot integration (descriptions, metadata enrichment).
4 Report Utilities 3 Report-level utilities: auto-generate report page prototypes from a semantic model's structure, extract and apply report themes, and generate IBCS-compliant variance charts.
5 Upstream Enhancements 3 ⚠️ These PRs modify existing SLL code (unlike Phases 1–4 which only add new files). Changes include TOM model .Find() fixes and expression capture (tom/_model.py), Vertipaq analyzer enhancements with memory/column-level analysis (_vertipaq.py, ~1000 lines changed), and various small fixes across _items.py, _item_recovery.py, _helper_functions.py, _export_report.py, _sql.py, and admin/_tenant.py. These carry higher merge conflict risk and may need closer review or discussion.
6 PBI Fixer UI 2 The interactive UI layer: shared UI components (theme, icons, tree builders, layout helpers), BPA scan runners, report helpers, and the main PBI Fixer application with its tabbed interface (SM Explorer, Report Explorer, Perspective Editor, Vertipaq Analyzer). Depends on Phases 1–5 but uses lazy imports to degrade gracefully if individual fixers aren't yet merged.

Dependencies & Review Order

  • Phases 1–4 are fully independent — they only add new files and can be reviewed/merged in any order.
  • Phase 5 is also independent but modifies existing code, so it may benefit from early discussion.
  • Phase 6 (the UI) ties everything together. It depends on the earlier phases but works standalone via lazy imports.
  • All fixer functions work without the UI — they can be called directly as sempy_labs.report.fix_piecharts(...) or sempy_labs.semantic_model.add_calculated_calendar(...).

Copilot AI review requested due to automatic review settings April 9, 2026 16:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modifies the TOM model wrapper (src/sempy_labs/tom/_model.py), primarily affecting how Vertipaq statistics are persisted as annotations and how incremental refresh rewrites M partition expressions.

Changes:

  • Reduced the set of Vertipaq annotations written to tables/columns/relationships.
  • Updated incremental refresh logic to rewrite M partition expressions via p.Source.Expression using a regex-based insertion of a new step.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 3214 to 3220
row = dfT_filt.iloc[0]
rowCount = str(row["Row Count"])
totalSize = str(row["Total Size"])
dict_size = str(row["Dictionary Size"])
data_size = str(row["Data Size"])
hierarchy_size = str(row["Hierarchy Size"])
user_hierarchy_size = str(row["User Hierarchy Size"])
relationship_size = str(row["Relationship Size"])
pct_db = str(row["% DB"])

self.set_annotation(object=t, name="Vertipaq_RowCount", value=rowCount)
self.set_annotation(
object=t, name="Vertipaq_TotalSize", value=totalSize
)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_vertipaq_annotations() no longer sets several table-level Vertipaq annotations (e.g., Vertipaq_DataSize, Vertipaq_DictionarySize, Vertipaq_HierarchySize, Vertipaq_RelationshipSize, Vertipaq_%DB). semantic_model/_vertipaq_analyzer.py reads these annotations for its output, so this change will regress analyzer results (values become None/0). Please restore these annotations here or update the analyzer to not depend on them.

Copilot uses AI. Check for mistakes.
Comment on lines 3226 to 3232
row = dfC_filt.iloc[0]
totalSize = str(row["Total Size"])
dataSize = str(row["Data Size"])
dictSize = str(row["Dictionary Size"])
hierSize = str(row["Hierarchy Size"])
is_resident = str(row["Is Resident"])
temp = str(row["Temperature"])
card = str(row["Column Cardinality"])
last_accessed = str(row["Last Accessed"])
self.set_annotation(
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_vertipaq_annotations() stopped writing column annotations Vertipaq_IsResident, Vertipaq_Temperature, and Vertipaq_LastAccessed, but semantic_model/_vertipaq_analyzer.py still expects these when building its column stats. This will lead to missing/incorrect analyzer fields. Either re-add these annotations (if the source data still exists) or adjust the analyzer to handle their absence explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines 3274 to 3278
for r in self.model.Relationships:
dfR_filt = dfR[dfR["Relationship Name"] == r.Name]
if not dfR_filt.empty:
relSize = str(dfR_filt["Used Size"].iloc[0])
mult = str(dfR_filt["Multiplicity"].iloc[0])
self.set_annotation(object=r, name="Vertipaq_UsedSize", value=relSize)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relationship annotation Vertipaq_Multiplicity is no longer set here, but semantic_model/_vertipaq_analyzer.py reads it when generating relationship diagnostics. Please keep setting this annotation (or update the analyzer to not rely on it) to avoid regressing analyzer output.

Copilot uses AI. Check for mistakes.
Comment on lines +4043 to 4046
# Find the last "in <identifier>" block
match = re.search(r"in\s+(\S.*?)$", text, re.DOTALL)
if not match:
raise ValueError(f"{icons.red_dot} Invalid M-partition expression.")
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex used to locate the final in ... clause (re.search(r"in\s+(\S.*?)$", ...)) will match the first occurrence of in in the expression, not the last, which can corrupt the rewritten M query if in appears earlier (common in strings/identifiers). Consider matching the final in keyword explicitly (e.g., anchor to line start and/or use a greedy prefix / last match via finditer) so the inserted step is placed correctly.

Copilot uses AI. Check for mistakes.
Comment on lines 4032 to +4041
# Update partition expression
i = 0
for p in t.Partitions:
if p.SourceType != TOM.PartitionSourceType.M:
raise ValueError(
f"{icons.red_dot} Invalid partition source type. Incremental refresh can only be set up if the table's partition is an M-partition."
)

elif i == 0:
text = p.Expression
text = text.rstrip()
text = p.Source.Expression.rstrip()
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions “.Find() method fixes” and “expression capture”, but the diff in this file only changes Vertipaq annotation writing and incremental refresh M-expression rewriting; there are no .Find(...) call sites or new expression-capture surfaces in tom/_model.py. Please update the PR description to reflect the actual changes, or include the missing intended changes.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants