perf: replace linear scans with indexed lookups in tracker import DHIS2-21287#23590
Open
perf: replace linear scans with indexed lookups in tracker import DHIS2-21287#23590
Conversation
961ce6b to
55f353c
Compare
14e3096 to
79726d3
Compare
TrackerBundle.findById does a linear stream scan on every call, invoked 6+ times per enrollment from different validators. With 500 TEs per request this means ~3,000 linear scans. Replace with lazily-built Map<UID, T> indexes that are invalidated when lists change. Also replace the linear TE scan in UniqueAttributesSupplier getEntityForEnrollment with a pre-built Map lookup. DHIS2-21287
validateRequiredProperties and validateMandatoryAttributes rebuild the set of mandatory program/TE-type attributes per entity via stream scans of program.getProgramAttributes() and trackedEntityType .getTrackedEntityTypeAttributes(). With 500 TEs and 20 attributes per request, this causes thousands of redundant stream operations. Cache the mandatory attribute sets lazily in TrackerPreheat keyed by program/TE-type UID. The set is computed once on first access and reused for all entities of the same program/type. DHIS2-21287
TrackerBundle does not implement Serializable so transient has no effect. The private getter methods prevent Lombok from generating public getters, so Jackson will not see these fields and JsonIgnore is also unnecessary. DHIS2-21287
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Replace O(n) linear stream scans with O(1) indexed lookups in tracker import validation and preheat. Profiling shows validation at 6.7% CPU. The main costs:
TrackerBundle.findById: linear scan of the entity list on every call, invoked 6+ times per enrollment from different validators. With 500 TEs per request that is ~3,000 linear scans.UniqueAttributesSupplier.getEntityForEnrollment: linear scan of the TE list per enrollment during preheat.AttributeValidator.validateRequiredProperties: streams through program attributes per attribute per enrollment to check if mandatory.Changes
TrackerBundle: add lazily-built
Map<UID, T>indexes for each entity type.findXxxByUiddoes a map lookup instead of a stream scan. Maps are invalidated when entity lists change (after validation, after rule engine).UniqueAttributesSupplier: build a
Map<UID, TrackedEntity>once from the payload and use it ingetEntityForEnrollmentinstead of scanning the list per enrollment.TrackerPreheat: cache mandatory attribute sets per program and per tracked entity type.
getMandatoryProgramAttributesandgetMandatoryTrackedEntityTypeAttributescompute the set once on first access and reuse it for all entities of the same program/type.Cleanup: remove unused
@JsonProperty/@JsonIgnoreannotations fromTrackerBundle.Performance
CPU profile: 1 user, 5 min,
skipRuleEngine=true&skipSideEffects=true, 500 TEs/request.Baseline run 24357292097. PR cpu run 24394901884, alloc run 24398234369.
Validation halved (6.7% -> 3.3%).
findByIdeliminated. No regressions. Throughput improved ~5% (220 -> 231 imports in 5 min) though these are single-user profile runs with profiler overhead, not load tests.