perf: denormalize ownership org unit onto tracker event for SELECTED mode [DHIS2-21778]#23507
Draft
jason-p-pickering wants to merge 4 commits intomasterfrom
Draft
perf: denormalize ownership org unit onto tracker event for SELECTED mode [DHIS2-21778]#23507jason-p-pickering wants to merge 4 commits intomasterfrom
jason-p-pickering wants to merge 4 commits intomasterfrom
Conversation
801a823 to
6f01be3
Compare
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



User story
A school administrator opens the attendance client and selects their school. They expect to see recent attendance records for all students in a given class (represented as an organisation unit) . This is a natural "start from the organisation unit" workflow that makes sense for attendance, outreach, and similar use cases where a specific tracked entity is not the key point of interest, but rather events for all TEIs which are owned by a particular organisation unit.
This differs from the Tracker Capture model, which is enrollment-centric: you find a tracked entity first, then look at their events. The attendance client (and others like it) inverts that. The org unit is the entry point, and the question is "what events are owned by this school/facility?"
Problem
The performance issue at hand here is that tracker events are related to their owning organisation unit via a join table. The tracker event itself has an organisation unit explicitly assigned to it (where the event occurred), but this is not used when exporting events queried for a particular organisation unit for tracker programs. It's the "owning orgunit" of the TEI which matters, not where the event actually occurred.
For tracker programs, effective org unit ownership is stored in trackedentityprogramowner (TPO), not on the event row itself. A recent performance optimization was made for events without registration which allows effective filtering by organisation unit, e.g. "Give me all events for Orgunit X". However, for tracker events, every orgUnitMode=SELECTED query must traverse:
trackerevent → enrollment → trackedentityprogramowner → organisationunitThis join chain prevents index use on the event table and produces query times of many seconds (30+) on production-sized databases. The ceiling exists in master regardless of other optimizations proposed for earlier versions.
Approach
The application layer (TrackerObjectsMapper) is the primary write path and resolves the owner OU from the preheat's TPO map at import time, falling back to the event's own org unit when no TPO record exists.
orgUnitMode=SELECTEDcan now be evaluated with a direct equality predicate on trackerevent.ownerorganisationunitid, enabling an index scan without the join chain.Changes
Performance
PR explain: https://explain.depesz.com/s/qzIR
Master explain: https://explain.depesz.com/s/UHb9
Master includes a partial fix: an org unit predicate applied through the trackedentityprogramowner ownership join (ou.organisationunitid in (:orgUnits)), which can guide the planner toward a TPO-first access path. The limitation is structural.
The OU filter still depends on the join chain (event → enrollment → trackedentityprogramowner → organisationunit), so the planner must traverse every student enrolled at the org unit and scan their full event history before the stage filter can
be applied. Cost scales with enrolled population multiplied by events per enrollment. Critically, it is the event side that grows.
On the production database used for this analysis there are 95.8M events across 2.4M enrollments (a 40:1 ratio), with a median of 37 events per enrollment and a growth rate of approximately 1.2M events per school day. That per-enrollment scan cost increases every day, for every school, regardless of how many results are ultimately returned. At production scale the join-chain predicate also provides no guarantee: when the planner's cost estimates favor a different join order the OU filter cannot be pushed into the access path and a full table scan becomes possible.
This is visible in the query plans: even on the small Sierra Leone database, master reads 138 buffer pages from disk against 40 enrolled students to return 4 events, while this PR reads zero. A direct bitmap index scan on
in_trackerevent_owner_organisationunitidresolves all 55 candidates from cache and applies the stage filter immediately. The 1ms timing difference on SL is noise; the structural difference is not.This PR stores the ownership org unit directly on the event row (ownerorganisationunitid) and indexes it. The OU filter becomes a direct bitmap index scan independent of any join, with cost proportional to matching events at the org unit rather than to the enrolled population or their accumulated history. Enrollment and TEI joins are still evaluated for access control but play no role in locating the events.
The tradeoff is write-path maintenance:
ownerorganisationunitidmust be updated when program ownership is transferred between org units. Transfers are a relatively infrequent administrative operation and the cost is bounded and one-off....a small price against the continuous daily volume of event writes that the index makes fast.Performance comparison
Performance testing was conducted against the Gatling tracker test suite comparing this branch to master. The results show an apparent ~20–28% regression in ANC event list queries and ~31% in Birth event searches, but these figures are misleading for two reasons: ANC (lxAQ7Zs9VYR) is a single event program and goes through JdbcSingleEventStore/singleevent — a code path this branch does not touch — so any difference there is noise.
For the Birth event scenario (the actual tracker program SELECTED query), the test database concentrates all synthetic import data into a single org unit (DiszpKrYNg8), giving ownerorganisationunitid ~21% selectivity. At that ratio the planner ignores the new index and falls back to an enrollment-driven nested loop (20,711 per-enrollment lookups on trackerevent), which is marginally slower due to the slightly wider row.
When the TPO join is removed so the planner is free to enter through
in_trackerevent_owner_organisationunitid, execution time halves even on this small dataset (53ms → 24ms), confirming the index and query structureare sound. The real benefit will only be measurable on a production-scale database where events are distributed across thousands of org units and ownerorganisationunitid = X is highly selective. The Gatling test DB test is structurally unsuitable for validating this optimization.
Notes
This is a POC. Portions of this PR were developed with AI 🤖