Union-Find (also called Disjoint-set) is a data-structure not found in Java per default, and neither our dependencies (e.g. Guava). It stores a collection of disjoint sets, and is useful when it comes to graph-based operations/implementations. It defines 2 key operations find(x), returning a canonized element of the set that x is in, and union(x, y), merging the sets of x and y. Deleting elements is often useless in practice, especially if there are immutable versions available, so this operation is not a focus for optimization, but should be included as far as possible. Since there are graph-based applications/implementations in CPAchecker (i.e. SMGs), this data-structure can be helpful for us. Hence we should add it.
Union-Find can be optimized in many ways. Insertion can have constant (O(1)) worst case time complexity. Search up to inverse-Ackermann amortized worst case time-complexity (O(α(n))). Meaning not every search operation is within O(α(n)) due to re-balancing (see below), but re-balancing causes subsequent operations to be faster, leading to an amortized worst case time-complexity of O(α(n)). In practice this can lead to find() and union() only needing near constant time.
Optimally we want a mutable, immutable and a persistent version of union-find.
Mutable and immutable implementations may also be persistent for union-find, as the cost is minimal in practice.
Persistence can be achieved with path compression using trees by utilizing either union by rank or weight, leading to search operations having O(log n) worst case time complexity. This is often extended by splitting by rank or weight, reducing needed (amortized) time further.
Alternatively also by tree compression (up to an optimal worst case time of O(log n / log log n)) during union [3].
Often a tree based disjoint-set forest is used in practice. However, these are commonly parent pointer trees. We need to explore whether our LLRB-tree can still be used, or is a good idea (due to its auto-balancing).
In any case; the implementation should first define a new interface for the common data-structure, implementing commonalities from Javas interfaces that make sense, followed by sub-interfaces like an ordered version.
Then, an "easy" (non-optimized) implementation that is easily verifiable as correct should be added (there are lots of examples online. Be careful with licenses! Either we need to import the original license of used code, or better use a license free one).
This non-optimized union-find should then be used to show base principles of the implementation, define tests upon etc. This is particularly helpful to evaluate against (and potentially as test-oracle).
More complex implementations can then either be based upon the non-optimized union-find or more complex ideas.
The iteration order in sets, as well as over sets should be consistent!
Union-Find (also called Disjoint-set) is a data-structure not found in Java per default, and neither our dependencies (e.g. Guava). It stores a collection of disjoint sets, and is useful when it comes to graph-based operations/implementations. It defines 2 key operations
find(x), returning a canonized element of the set that x is in, andunion(x, y), merging the sets of x and y. Deleting elements is often useless in practice, especially if there are immutable versions available, so this operation is not a focus for optimization, but should be included as far as possible. Since there are graph-based applications/implementations in CPAchecker (i.e. SMGs), this data-structure can be helpful for us. Hence we should add it.Union-Find can be optimized in many ways. Insertion can have constant (
O(1)) worst case time complexity. Search up to inverse-Ackermann amortized worst case time-complexity (O(α(n))). Meaning not every search operation is withinO(α(n))due to re-balancing (see below), but re-balancing causes subsequent operations to be faster, leading to an amortized worst case time-complexity ofO(α(n)). In practice this can lead tofind()andunion()only needing near constant time.Optimally we want a mutable, immutable and a persistent version of union-find.
Mutable and immutable implementations may also be persistent for union-find, as the cost is minimal in practice.
Persistence can be achieved with path compression using trees by utilizing either union by rank or weight, leading to search operations having
O(log n)worst case time complexity. This is often extended by splitting by rank or weight, reducing needed (amortized) time further.Alternatively also by tree compression (up to an optimal worst case time of O(log n / log log n)) during union [3].
Often a tree based disjoint-set forest is used in practice. However, these are commonly parent pointer trees. We need to explore whether our LLRB-tree can still be used, or is a good idea (due to its auto-balancing).
In any case; the implementation should first define a new interface for the common data-structure, implementing commonalities from Javas interfaces that make sense, followed by sub-interfaces like an ordered version.
Then, an "easy" (non-optimized) implementation that is easily verifiable as correct should be added (there are lots of examples online. Be careful with licenses! Either we need to import the original license of used code, or better use a license free one).
This non-optimized union-find should then be used to show base principles of the implementation, define tests upon etc. This is particularly helpful to evaluate against (and potentially as test-oracle).
More complex implementations can then either be based upon the non-optimized union-find or more complex ideas.
The iteration order in sets, as well as over sets should be consistent!