Skip to content

Optimize World::get_entity_mut for large entity slices#23740

Open
CrazyRoka wants to merge 2 commits intobevyengine:mainfrom
CrazyRoka:optimize-entity-fetch-mut
Open

Optimize World::get_entity_mut for large entity slices#23740
CrazyRoka wants to merge 2 commits intobevyengine:mainfrom
CrazyRoka:optimize-entity-fetch-mut

Conversation

@CrazyRoka
Copy link
Copy Markdown
Contributor

@CrazyRoka CrazyRoka commented Apr 9, 2026

Objective

Optimize World::get_entity_mut (and the underlying WorldEntityFetch trait) when a slice of entities is passed in.
The previous duplicate-entity check used nested loops (O(N²)) and showed up as a CPU hotspot in real workloads with thousands of entities.

This PR makes the check O(N) while preserving exact behaviour and error semantics.

Solution

  • Replaced the nested for i in 0..len { for j in 0..i } duplicate check with a single-pass EntityHashSet in both &[Entity] and &[Entity; N] implementations of WorldEntityFetch::fetch_mut.
  • Added a dedicated Criterion benchmark (get_entity_mut_slice) that exercises the hot path with slices up to 2000 entities.

Testing

  • Ran the new benchmark before and after the change using cargo bench -p benches --bench ecs -- get_entity_mut_slice.
  • Tested on Linux (x86_64).

Showcase

Benchmark results before:

Baseline

Benchmark results after:

Optimized

Criterion table summary:

get_entity_mut_slice/size/10
                        time:   [56.617 ns 56.850 ns 57.099 ns]
                        change: [−4.0898% −3.4947% −2.9431%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/20
                        time:   [142.19 ns 142.94 ns 143.75 ns]
                        change: [−9.3294% −8.3968% −7.4851%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/40
                        time:   [450.04 ns 453.50 ns 457.16 ns]
                        change: [−3.7575% −2.6321% −1.4291%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/60
                        time:   [855.45 ns 856.69 ns 857.96 ns]
                        change: [−5.6733% −5.1486% −4.6343%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/80
                        time:   [1.1187 µs 1.1206 µs 1.1227 µs]
                        change: [−20.884% −20.493% −20.138%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/100
                        time:   [1.4017 µs 1.4034 µs 1.4050 µs]
                        change: [−31.999% −31.427% −30.927%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/200
                        time:   [2.7256 µs 2.7365 µs 2.7481 µs]
                        change: [−58.082% −57.776% −57.488%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/400
                        time:   [5.4878 µs 5.5039 µs 5.5213 µs]
                        change: [−76.805% −76.556% −76.311%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/600
                        time:   [8.2935 µs 8.3305 µs 8.3663 µs]
                        change: [−83.367% −83.196% −83.039%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/800
                        time:   [10.978 µs 11.020 µs 11.068 µs]
                        change: [−87.050% −86.950% −86.857%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/1000
                        time:   [13.534 µs 13.551 µs 13.570 µs]
                        change: [−89.591% −89.525% −89.465%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/2000
                        time:   [26.999 µs 27.039 µs 27.083 µs]
                        change: [−94.705% −94.666% −94.631%] (p = 0.00 < 0.05)
                        Performance has improved.

@kfc35 kfc35 added C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward A-ECS Entities, components, systems, and events labels Apr 9, 2026
@github-project-automation github-project-automation bot moved this to Needs SME Triage in ECS Apr 9, 2026
@Victoronz
Copy link
Copy Markdown
Contributor

Good observation that EntityHashSet is more efficient for greater entity quantities!
These impls were originally only intended for low N, so that use case should not see regression.
The regression can be addressed by using the previous duplicate check below a certain N.

I'll also note that the vast, vast majority of arrays are small, so the EntityHashSet branch there should practically never be hit.

As an isolated change this makes sense (and we can merge it in the meantime)!
Overall though, I am still of the opinion that implicit get_entity_mut for slices is iffy design as was previously discussed here (and its follow-up PR).

I am curious, in the use case you've seen, is the entire result always used, or only partially iterated/consumed?

If only partial use is needed, then an iteration-based duplication check would be more performant.

Additionally, if the source slices/arrays are not mutated between each get_entity_mut call, then placing the check after each mutation (by turning it into an EntitySet) could also reduce unnecessary work.

@Victoronz Victoronz added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Apr 9, 2026
Use HashSet for O(N) duplicate entity detection if number of entities
exceeds the threshold.
This improves performance significantly for larger entity lists.
@CrazyRoka CrazyRoka force-pushed the optimize-entity-fetch-mut branch from d8831de to 9f9c0a9 Compare April 10, 2026 09:45
@CrazyRoka
Copy link
Copy Markdown
Contributor Author

@Victoronz Thanks for the feedback. I updated the benchmarks to include smaller values (20, 40, 60, 80) and added small threshold (40) to fallback for O(N^2) comparisons. There are no performance regressions anymore (check the PR description, it was updated today)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged

Projects

Status: Needs SME Triage

Development

Successfully merging this pull request may close these issues.

3 participants