Skip to content

perf: denormalize ownership org unit onto tracker event for SELECTED mode [DHIS2-21778]#23507

Draft
jason-p-pickering wants to merge 4 commits intomasterfrom
perf-oumode-selected-master
Draft

perf: denormalize ownership org unit onto tracker event for SELECTED mode [DHIS2-21778]#23507
jason-p-pickering wants to merge 4 commits intomasterfrom
perf-oumode-selected-master

Conversation

@jason-p-pickering
Copy link
Copy Markdown
Contributor

@jason-p-pickering jason-p-pickering commented Apr 3, 2026

User story

A school administrator opens the attendance client and selects their school. They expect to see recent attendance records for all students in a given class (represented as an organisation unit) . This is a natural "start from the organisation unit" workflow that makes sense for attendance, outreach, and similar use cases where a specific tracked entity is not the key point of interest, but rather events for all TEIs which are owned by a particular organisation unit.

This differs from the Tracker Capture model, which is enrollment-centric: you find a tracked entity first, then look at their events. The attendance client (and others like it) inverts that. The org unit is the entry point, and the question is "what events are owned by this school/facility?"

Problem

The performance issue at hand here is that tracker events are related to their owning organisation unit via a join table. The tracker event itself has an organisation unit explicitly assigned to it (where the event occurred), but this is not used when exporting events queried for a particular organisation unit for tracker programs. It's the "owning orgunit" of the TEI which matters, not where the event actually occurred.

For tracker programs, effective org unit ownership is stored in trackedentityprogramowner (TPO), not on the event row itself. A recent performance optimization was made for events without registration which allows effective filtering by organisation unit, e.g. "Give me all events for Orgunit X". However, for tracker events, every orgUnitMode=SELECTED query must traverse:

trackerevent → enrollment → trackedentityprogramowner → organisationunit

This join chain prevents index use on the event table and produces query times of many seconds (30+) on production-sized databases. The ceiling exists in master regardless of other optimizations proposed for earlier versions.

Approach

The application layer (TrackerObjectsMapper) is the primary write path and resolves the owner OU from the preheat's TPO map at import time, falling back to the event's own org unit when no TPO record exists.

orgUnitMode=SELECTED can now be evaluated with a direct equality predicate on trackerevent.ownerorganisationunitid, enabling an index scan without the join chain.

Changes

  • Migration V2_43_57: adds ownerorganisationunitid column, backfills from TPO (COALESCE fallback to organisationunitid), adds FK, index.
  • TrackerEvent: new ownerOrganisationUnit field + Hibernate mapping
  • TrackerObjectsMapper: populates ownerOrganisationUnit from preheat TPO on import
  • JdbcTrackerEventStore: SELECTED mode uses ev.ownerorganisationunitid directly instead of the join chain
  • TrackerOrgUnitMergeHandler: migrates ownerOrganisationUnit during org unit merges
  • TrackerObjectsMapperTest: unit tests for the three TPO resolution cases

Performance

PR explain: https://explain.depesz.com/s/qzIR
Master explain: https://explain.depesz.com/s/UHb9

Master includes a partial fix: an org unit predicate applied through the trackedentityprogramowner ownership join (ou.organisationunitid in (:orgUnits)), which can guide the planner toward a TPO-first access path. The limitation is structural.

The OU filter still depends on the join chain (event → enrollment → trackedentityprogramowner → organisationunit), so the planner must traverse every student enrolled at the org unit and scan their full event history before the stage filter can
be applied. Cost scales with enrolled population multiplied by events per enrollment. Critically, it is the event side that grows.

On the production database used for this analysis there are 95.8M events across 2.4M enrollments (a 40:1 ratio), with a median of 37 events per enrollment and a growth rate of approximately 1.2M events per school day. That per-enrollment scan cost increases every day, for every school, regardless of how many results are ultimately returned. At production scale the join-chain predicate also provides no guarantee: when the planner's cost estimates favor a different join order the OU filter cannot be pushed into the access path and a full table scan becomes possible.

This is visible in the query plans: even on the small Sierra Leone database, master reads 138 buffer pages from disk against 40 enrolled students to return 4 events, while this PR reads zero. A direct bitmap index scan on in_trackerevent_owner_organisationunitid resolves all 55 candidates from cache and applies the stage filter immediately. The 1ms timing difference on SL is noise; the structural difference is not.

This PR stores the ownership org unit directly on the event row (ownerorganisationunitid) and indexes it. The OU filter becomes a direct bitmap index scan independent of any join, with cost proportional to matching events at the org unit rather than to the enrolled population or their accumulated history. Enrollment and TEI joins are still evaluated for access control but play no role in locating the events.

The tradeoff is write-path maintenance: ownerorganisationunitid must be updated when program ownership is transferred between org units. Transfers are a relatively infrequent administrative operation and the cost is bounded and one-off....a small price against the continuous daily volume of event writes that the index makes fast.

Performance comparison

Performance testing was conducted against the Gatling tracker test suite comparing this branch to master. The results show an apparent ~20–28% regression in ANC event list queries and ~31% in Birth event searches, but these figures are misleading for two reasons: ANC (lxAQ7Zs9VYR) is a single event program and goes through JdbcSingleEventStore/singleevent — a code path this branch does not touch — so any difference there is noise.

For the Birth event scenario (the actual tracker program SELECTED query), the test database concentrates all synthetic import data into a single org unit (DiszpKrYNg8), giving ownerorganisationunitid ~21% selectivity. At that ratio the planner ignores the new index and falls back to an enrollment-driven nested loop (20,711 per-enrollment lookups on trackerevent), which is marginally slower due to the slightly wider row.

When the TPO join is removed so the planner is free to enter through in_trackerevent_owner_organisationunitid, execution time halves even on this small dataset (53ms → 24ms), confirming the index and query structure
are sound. The real benefit will only be measurable on a production-scale database where events are distributed across thousands of org units and ownerorganisationunitid = X is highly selective. The Gatling test DB test is structurally unsuitable for validating this optimization.

Notes

This is a POC. Portions of this PR were developed with AI 🤖

@jason-p-pickering jason-p-pickering requested a review from a team as a code owner April 3, 2026 13:36
@jason-p-pickering jason-p-pickering marked this pull request as draft April 3, 2026 13:36
@jason-p-pickering jason-p-pickering added run-perf-tests Enables performance tests labels Apr 3, 2026
@jason-p-pickering jason-p-pickering force-pushed the perf-oumode-selected-master branch from 801a823 to 6f01be3 Compare April 3, 2026 17:08
@jason-p-pickering jason-p-pickering added the publish-docker-image Publish to https://hub.docker.com/r/dhis2/core-pr/tags TAG=$PR_NUMBER label Apr 3, 2026
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 7, 2026

@jason-p-pickering jason-p-pickering removed run-perf-tests Enables performance tests labels Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

publish-docker-image Publish to https://hub.docker.com/r/dhis2/core-pr/tags TAG=$PR_NUMBER

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant