Skip to content

Data loaders: prefer nuclei_seg, fall back to nuclear_seg#15

Open
mark-a-potts wants to merge 1 commit into
royerlab:mainfrom
mark-a-potts:feature/nuclei-seg-fallback
Open

Data loaders: prefer nuclei_seg, fall back to nuclear_seg#15
mark-a-potts wants to merge 1 commit into
royerlab:mainfrom
mark-a-potts:feature/nuclei-seg-fallback

Conversation

@mark-a-potts

Copy link
Copy Markdown
Contributor

Summary

Updates the two cell-extraction data paths to try the new native-20x nuclei_seg label first (produced by submit_nuclei_segmentation_jobs in ops_process PR #113), with fall-through to the legacy 5x-upscaled nuclear_seg from segment_and_stitch_pheno.

Files touched (each is a 9-line addition, identical pattern):

  • src/ops_model/data/data_loader.py:466CellProfileDataset.__getitem__
  • src/ops_model/features/cp_extraction.py:1019 — bulk CP feature read

Both labels are 20x-shaped at level 0 in phenotyping_v3.zarr (the legacy step segmented at 5x and then 4× nearest-neighbor upscaled to 20x for storage), so bbox slicing is unchanged. The existing min_h = min(...) clip block at data_loader.py:498 handles the residual ~3 px shape diff between native-20x and 5x-upscaled-to-20x masks.

Test plan / measured impact

Per-cell A/B comparison over 500 sampled cells from A1_linked_pheno_iss_cp.csv (ops0094_20251217):

metric legacy mean new mean Δ
nuc area (px) 2179.6 2140.4 −1.80%
nuc/cell ratio 0.200 0.197 −1.78%

Per-cell nuclear-mask IoU (NEW vs LEGACY, within the cell mask):

  • mean: 0.845, median: 0.856
  • IoU < 0.5: 0.8% of cells (large disagreement — over/under-seg cases)
  • IoU < 0.7: 2.6% of cells
  • IoU ≥ 0.9: 16.6% of cells (near-identical)

Migration semantics

  • Per-experiment migration: experiments that have only run the legacy step continue to work; experiments that have run the new step transparently pick up the better masks.
  • No re-run of legacy experiments needed.
  • Re-training of models that consume these masks is optional but recommended once a critical mass of experiments has been re-segmented (a model trained on legacy masks will see a small distribution shift inferring on new masks: ~1.8% mean feature shift, ~0.8% of cells with meaningfully different features).

Cross-PR

Pairs with: royerlab/ops_process#113

🤖 Generated with Claude Code

Updates the two cell-extraction data paths to try the new native-20x
`nuclei_seg` label first (produced by `submit_nuclei_segmentation_jobs`
in ops_process), with fall-through to the legacy 5x-upscaled
`nuclear_seg` from `segment_and_stitch_pheno`.

  * src/ops_model/data/data_loader.py:466 (CellProfileDataset)
  * src/ops_model/features/cp_extraction.py:1019 (bulk CP feature read)

Both labels are 20x-shaped at level 0 in phenotyping_v3.zarr, so bbox
slicing is unchanged. Measured impact on per-cell features over 500
sampled cells from ops0094 A/1/0:

  * Mean nuc area:    -1.80%
  * Mean nuc/cell:    -1.78%
  * Per-cell nuc IoU: mean 0.845, median 0.856
  * Outlier cells (IoU < 0.5): 0.8%

Pairs with the ops_process PR (royerlab/ops_process#113) that introduces
the new segmentation step. Migration is per-experiment: experiments that
have only run the legacy step continue to work; experiments that have
run the new step transparently pick up the better masks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant