Skip to content

Add fill_multi_label: efficient multi-label void filling#13

Closed
donglaiw wants to merge 1 commit into
seung-lab:masterfrom
PytorchConnectomics:master
Closed

Add fill_multi_label: efficient multi-label void filling#13
donglaiw wants to merge 1 commit into
seung-lab:masterfrom
PytorchConnectomics:master

Conversation

@donglaiw

Copy link
Copy Markdown

Summary

Implements the multi-label algorithm described in README.md ("Multi-Label Concept") as a new top-level function, fill_voids.fill_multi_label. Instead of contour tracing, the implementation works on the region adjacency graph (RAG) of the volume, which produces the same result far more efficiently: the expensive voxel-scale passes (connected components, adjacency extraction) run once, and reasoning then happens on a small graph whose size scales with #labels + #voids rather than with voxel count.

What it does

import fill_voids

filled = fill_voids.fill_multi_label(labels)                      # 2D or 3D
filled, n = fill_voids.fill_multi_label(labels, return_fill_count=True)
fill_voids.fill_multi_label(labels, in_place=True)                # save memory

Semantics:

  • A background voxel (value 0) is filled with label L iff L is the outermost enclosing label of the void it belongs to --- i.e. the adjacent foreground label whose chain of label-to-label adjacencies back to the image exterior is shortest.
  • If two or more adjacent labels tie for outermost (the void sits between distinct shells), the void is left unfilled. This matches step 4 of the README's multi-label concept.
  • Inner-label islands are handled correctly (README step 6): a void with a small island of label B inside an outer shell A fills with A and leaves B intact.
  • On an effectively binary input, the result equals fill_voids.fill.

How it works

  1. cc3d.connected_components on the background mask labels each void.
  2. A single vectorized axis-shift scan collects every (void-component, foreground-label) adjacency pair.
  3. Faces of the image are scanned to identify exterior-touching voids and labels.
  4. cc3d.region_graph on the foreground labels gives the label-to-label adjacency graph (cheap: it has at most #labels nodes).
  5. A BFS from the "exterior" nodes through that label graph assigns each foreground label a level = shortest label-chain distance to the exterior. Labels that don't appear in any enclosure chain get level = inf.
  6. For each interior void component: one adjacent label → fill with it; multiple adjacent → fill with the unique minimum-level label (else leave unfilled).
  7. A single labels[lut[bg_cc] != 0] = lut[bg_cc][...] pass applies every fill.

Total work is O(N) for the voxel-scale steps + O(|label_graph|) for the dominator-style analysis. Crucially, the runtime does not scale with the number of labels, unlike the naive for L in np.unique(labels): binary_fill_holes(labels == L) loop which is O(K·N).

Performance

Measured on (200, 200, 200) volumes:

Image fill_multi_label naive per-label
99 dense labels, 20 isolated voids 1.0 s 47.7 s
50 labels, 2% sparse speckle BG 3.0 s 23.9 s

Tests

automated_test.py gains 14 new tests under test_multi_label_*:

  • Constructed semantic cases: simple donut (2D), inner island (2D/3D), nested shells (2D/3D), gap between labels, two separate voids within one label.
  • Binary-equivalence: result matches fill_voids.fill when only one foreground label is present.
  • dtype preservation across int8..uint64.
  • in_place behavior (true mutates, false preserves).
  • Empty / all-zero / no-BG edge cases.
  • Naive-oracle agreement on well-posed 2D and 3D images whose voids are all uniquely enclosed.
  • Perf regression guard: on a 120³ 64-cube volume, fill_multi_label must be ≥3x faster than the naive oracle (it is typically 10x-50x on a laptop).

New dependency

connected-components-3d (cc3d) is imported at call time only; the binary fill_voids.fill is unaffected if cc3d is unavailable. Happy to move cc3d into install_requires or extras_require if that matches your preference.

Test plan

  • python -m pytest automated_test.py -k "multi_label or dimension or zero or dtype or return or 2d_3d_differ" -> 34 passed locally
  • Binary regression: test_multi_label_matches_binary_fill verifies parity with fill_voids.fill on a non-trivial 3D input
  • Manual benchmark vs naive per-label loop (numbers above)

🤖 Generated with Claude Code

Implements the multi-label algorithm described in README using the
region adjacency graph of BG components + foreground labels, rather
than the naive per-label binary_fill_holes loop. Expensive voxel-scale
passes (cc3d connected components, axis-shift adjacency extraction)
run once on the whole volume; reasoning then happens on a graph whose
size scales with #labels + #voids, not voxel count.

Semantics: a void is filled with the outermost enclosing adjacent
label (shortest label-chain to exterior); ties (in-between shells)
are left unfilled; inner-label islands are handled correctly; on an
effectively binary input the result matches fill_voids.fill.

Typical speedup vs naive per-label loop is 10x-50x on volumes with
many labels, and the runtime does not scale with label count.

Added 14 tests covering constructed semantic cases, dtype handling,
in_place behavior, binary equivalence, and a perf regression guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@william-silversmith

Copy link
Copy Markdown
Contributor

Hi Donglai!

Thanks for this contribution! I should point out that I already implemented a sophisticated multi-label fill in seung-lab/fastmorph#10

Fill voids is an older package with limited dependencies that I intended (in my own brain) to keep for binary images exclusively for simplicity. Would you mind taking a look at fastmorph and seeing if it has the features you need?

@william-silversmith

Copy link
Copy Markdown
Contributor

Going to close this due to perceived redundancy with fastmorph. If you see something I missed please reopen! Thanks for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants