chore: Improve performance on reserved values [DHIS2-21253] by muilpp · Pull Request #23520 · dhis2/dhis2-core

muilpp · 2026-04-09T10:38:36Z

The REMOVE_USED_OR_EXPIRED_RESERVED_VALUES job previously executed a single HQL DELETE statement that combined expired and used values with a correlated subquery. This caused the query planner to execute a subplan for every row in reservedvalue, resulting in an extremely expensive operation that does not scale and effectively becomes unprocessable on large datasets like the one in DRC, where they have ~11M rows in that table.

Changes

Split the deletion into two separate native SQL queries: One for expired values and one for used values. This removes the correlated OR logic, which previously forced the database to evaluate a subquery join for every non-expired row, leading to millions of repeated lookups against trackedentityattributevalue.
Both queries now process data in batches of 500,000 rows using LIMIT in a subquery, looping until no rows remain.
Each batch runs in its own transaction (PROPAGATION_REQUIRES_NEW), reducing lock duration and replication pressure. Large single transactions previously held row locks for the entire duration. With batching, locks are released frequently and failures only affect a single batch instead of the entire job. That was likely the issue in DRC.
Fixed a bug in saveGeneratedValues where numberOfReservations (the original total) was used instead of the remaining count, which could lead to over-reservation during retries.

Test plan

I replicated the production like dataset from DRC in the reservedvalue table to simulate realistic load conditions.
In total, the dataset contains 11 million reserved values, of which 8 million are expired and the remaining 3 million are still valid.

There are 1M tracked entities and 10M TEAVs (≈10 per tracked entity), distributed as follows:

1M sequential TEAVs for perfTestUid
The first 20,000 values (PERF-1 → PERF-20000) are designed to match reserved values. These are specifically used to trigger deletions in removeUsedValues().
1M random TEAVs, used to simulate realistic noise and non-matching data.
8M regular TEAVs, which form the standard dataset used to simulate production-scale attribute storage.

Performance test

Candidate

Baseline

Candidate: 9 polls → job completed in ~90 seconds
Baseline: 65 polls → job completed in ~650 seconds (~11 minutes)

That's roughly a 7x speedup for deleting 8M expired + 20k used values.

Here's the link to the test.

Pgbench test

Two scenarios were tested to simulate a healthy table (cleanup job runs regularly) and a degraded table (large backlog of expired values). Each scenario was run at two scales: 500k and 5M rows.
The "expired" rows represent values whose expiry date has passed and should be deleted. The "not expired" rows represent values still in use or reserved for future use.

Approaches compared:

Original (master): a single DELETE with an OR condition combining expired and used values via a correlated subquery, no batching, one transaction.
100k batch: two separate queries (expired and used values), each looped with LIMIT 100k per transaction.
500k batch: same structure as 100k but LIMIT 500k per transaction.

The original query degrades badly as the table grows and, counterintuitively, performs worse as fewer rows are expired. That's because the correlated subquery runs once per non-expired row. At 5M rows, it never finishes within the test timeout, regardless of the expired ratio.

Both batched approaches are orders of magnitude faster. The 500k batch outperforms the 100k batch in every scenario because fewer round-trips to the database outweigh the higher cost per individual batch. The difference is more pronounced at 5M rows (27s vs. 37s at 90% expired, 10.4s vs. 14.5s at 20% expired), where the number of iterations matters more.

The 20% expired scenario is consistently faster than 90% for the batched approaches because there are fewer rows to delete overall. For the original query, the opposite is true, more non-expired rows mean more correlated subquery executions.

This PR focuses only on the deletion part. I’ll create a separate PR to handle the generation of reserved values.

...dhis-service-core/src/main/java/org/hisp/dhis/reservedvalue/DefaultReservedValueService.java

ameenhere · 2026-04-14T07:40:38Z

...dhis-service-core/src/main/java/org/hisp/dhis/reservedvalue/DefaultReservedValueService.java

+    do {
+      deleted =
+          requireNonNullElse(
+              transactionTemplate.execute(s -> reservedValueStore.removeExpiredValues()), 0);


Also wondering if this transactionTemplate is necessary? Can we set the @transactional on the store to achieve the same behaviour?

We could, and it would behave the same way.
I just didn’t do it because I thought our convention was to handle transactions at the service level, not at the store level.

ameenhere · 2026-04-14T07:46:27Z

...-service-core/src/main/resources/org/hisp/dhis/reservedvalue/hibernate/ReservedValue.hbm.xml


    </class>

    <sql-query name="getRandomGeneratedValuesNotAvailableNamedQuery">


Is this query used somewhere? I feel this may not scale that well. Probably not something for this PR anyway.

It is used, and I’m working on it.
I forgot to mention it in the PR description, but this PR focuses solely on the deletion of reserved values.
I’m now working on the generation and reservation parts.
Doing everything together felt too large to explain in a single PR.

ameenhere

The job changes looks good to me. The evidence of performance test results is also clear.

enricocolasante · 2026-04-14T12:20:51Z

...ce-core/src/main/java/org/hisp/dhis/reservedvalue/hibernate/HibernateReservedValueStore.java


-    log.info("... Completed deleting expired or used reserved values");
+  @Override
+  public int removeUsedValues() {


I am wondering if we should remove this completely.
The only way to add a trackedEntityAttributeValue is through Tracker Importer and we are calling useReservedValue when persisting an attribute.
This query should always return 0 rows.
I cannot think in which case we would have used values that were not removed.

In theory that's true, but I'm not sure whether that still holds for implementations that have been running for some years.
To be honest, I'm not comfortable enough with this feature to confidently remove it.

enricocolasante · 2026-04-14T12:30:50Z

...dhis-service-core/src/main/java/org/hisp/dhis/reservedvalue/DefaultReservedValueService.java

+      total += deleted;
+    } while (deleted > 0);
+
+    log.info("Deleted {} reserved values", total);


This log was here before so maybe it is not for this PR but it would be good if we could delegate this info to the job itself.
When you run the job you can complete a stage passing some information that will be logged in the status of the job.

sonarqubecloud · 2026-04-14T14:02:32Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

teleivo · 2026-04-14T14:12:14Z

...dhis-service-core/src/main/java/org/hisp/dhis/reservedvalue/DefaultReservedValueService.java

+    } catch (TimeoutException ex) {
+      log.warn(
+          String.format(
+              "Generation and reservation of values for %s wih uid %s timed out. %s values was reserved. You might be running low on available values",


typo in message and use https://www.baeldung.com/slf4j-parameterized-logging

teleivo · 2026-04-14T14:15:38Z

...ce-core/src/main/java/org/hisp/dhis/reservedvalue/hibernate/HibernateReservedValueStore.java

+            + "SELECT rv.reservedvalueid FROM reservedvalue rv "
+            + "JOIN trackedentityattribute tea ON rv.owneruid = tea.uid "
+            + "JOIN trackedentityattributevalue teav ON teav.trackedentityattributeid = tea.trackedentityattributeid "
+            + "AND lower(teav.value) = lower(rv.value) "


This was r.value = teav.plainValue was it wrong before?

How do large TEAV tables perform?

The existing index on TEAV is (trackedentityinstanceid, trackedentityattributeid, lower(value)) which cannot be used here as trackedentityinstanceid is the leading column and isn't in this join. There's also in_trackedentityattributevalue_attributeid on (trackedentityattributeid) which helps with the first join column, but the lower(value) comparison then requires a filter, not an index lookup.

teleivo · 2026-04-14T14:22:40Z

...dhis-service-core/src/main/java/org/hisp/dhis/reservedvalue/DefaultReservedValueService.java

+          requireNonNullElse(
+              transactionTemplate.execute(s -> reservedValueStore.removeExpiredValues()), 0);
+      total += deleted;
+    } while (deleted > 0);


If the batch returns fewer than 500k rows, there's nothing left to delete, so comparing against DELETE_BATCH_SIZE instead of 0 avoids the final full table scan that only confirms there's no more work.

teleivo · 2026-04-14T14:25:14Z

...ce-core/src/main/java/org/hisp/dhis/reservedvalue/hibernate/HibernateReservedValueStore.java

+  public int removeExpiredValues() {
+    return jdbcTemplate.update(
+        "DELETE FROM reservedvalue WHERE reservedvalueid IN "
+            + "(SELECT reservedvalueid FROM reservedvalue WHERE expirydate < now() LIMIT ?)",


I think there is no index on expirydate. I don't know this reserved value feature so don't know if that scenario is plausible but:

When the job runs regularly and the table is mostly clean, so only a small fraction of rows are expired. An index on expirydate would let Postgres jump straight to the expired rows instead of scanning millions of non-expired ones to find the few that need deleting.

muilpp added 6 commits April 9, 2026 12:38

chore: Improve performance on reserved values [DHIS2-21253]

97e7444

Merge remote-tracking branch 'origin/master' into DHIS2-21253

537ad01

chore: Fix unit test [DHIS2-21253]

dce87ce

chore: Use new transaction on every delete iteration [DHIS2-21253]

6b9797f

chore: Increase batch size to 500k [DHIS2-21253]

5f282d6

chore: Remove index on expiry date [DHIS2-21252_perf_test]

768669a

muilpp marked this pull request as ready for review April 10, 2026 20:42

muilpp requested a review from a team April 10, 2026 20:42

ameenhere reviewed Apr 14, 2026

View reviewed changes

...dhis-service-core/src/main/java/org/hisp/dhis/reservedvalue/DefaultReservedValueService.java Show resolved Hide resolved

ameenhere reviewed Apr 14, 2026

View reviewed changes

ameenhere approved these changes Apr 14, 2026

View reviewed changes

muilpp requested a review from enricocolasante April 14, 2026 08:33

enricocolasante approved these changes Apr 14, 2026

View reviewed changes

chore: Remove log and mark completed job [DHIS2-21253]

12debba

teleivo reviewed Apr 14, 2026

View reviewed changes


		</class>

		<sql-query name="getRandomGeneratedValuesNotAvailableNamedQuery">

Conversation

muilpp commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Test plan

Performance test

Pgbench test

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ameenhere Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ameenhere left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Apr 14, 2026

Quality Gate passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

muilpp commented Apr 9, 2026 •

edited

Loading

ameenhere Apr 14, 2026 •

edited

Loading