Skip to content

Fix jaccard scaling#700

Merged
Intron7 merged 3 commits into
mainfrom
fix-jaccard-scaling
Jun 28, 2026
Merged

Fix jaccard scaling#700
Intron7 merged 3 commits into
mainfrom
fix-jaccard-scaling

Conversation

@Intron7

@Intron7 Intron7 commented Jun 26, 2026

Copy link
Copy Markdown
Member

No description provided.

Intron7 added 2 commits June 26, 2026 23:02
Signed-off-by: Intron7 <severin.dicks@icloud.com>
Signed-off-by: Intron7 <severin.dicks@icloud.com>
@Intron7

Intron7 commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes
    • Fixed pp.neighbors with method="jaccard" so it no longer crashes on large datasets with CUDA illegal address errors.
    • Jaccard neighbor graph computation now works more reliably on GPU-backed workloads.

Walkthrough

Adds a CUDA jaccard_shared_counts kernel and module, registers _jaccard_cuda in the build and loader, and switches _get_connectivities_jaccard to use the new binding. The release notes add the corresponding bug-fix entry.

Changes

Jaccard CUDA path

Layer / File(s) Summary
CUDA kernel and binding
src/rapids_singlecell/_cuda/jaccard/kernels_jaccard.cuh, src/rapids_singlecell/_cuda/jaccard/jaccard.cu
Defines jaccard_shared_counts_kernel and exposes jaccard_shared_counts from _jaccard_cuda.
Build and export wiring
CMakeLists.txt, src/rapids_singlecell/_cuda/__init__.py
Adds _jaccard_cuda to the CUDA module build and to the extension loader export list.
Jaccard preprocessing switch
src/rapids_singlecell/preprocessing/_neighbors/_neighbors.py, docs/release-notes/0.16.0.md
_get_connectivities_jaccard now calls jaccard_shared_counts, and the release notes add the crash-fix entry.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No pull request description was provided, so there is nothing to evaluate for relevance. Add a brief description that explains the bug fix and the main implementation changes.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the Jaccard bug fix and scaling-related CUDA changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-jaccard-scaling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/rapids_singlecell/preprocessing/_neighbors/_neighbors.py (1)

234-247: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Import the module through rapids_singlecell._cuda to keep the lazy-loading contract.

from rapids_singlecell._cuda._jaccard_cuda import jaccard_shared_counts imports the compiled submodule directly, bypassing the __getattr__ lazy loader in _cuda/__init__.py that maps "absent → None" and "present-but-failed → contextual ImportError". Prefer the package-level import so behavior stays consistent with the rest of the codebase.

♻️ Suggested import style
-    from rapids_singlecell._cuda._jaccard_cuda import jaccard_shared_counts
+    from rapids_singlecell._cuda import _jaccard_cuda as _jaccard
@@
-    jaccard_shared_counts(
+    _jaccard.jaccard_shared_counts(
         knn,
         n_obs=n_obs,
         k=n_neighbors,
         jaccard_vals=jaccard_vals,
         stream=cp.cuda.get_current_stream().ptr,
     )

As per coding guidelines: "Import _cuda modules via rapids_singlecell._cuda."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rapids_singlecell/preprocessing/_neighbors/_neighbors.py` around lines
234 - 247, The Jaccard CUDA import in the neighbors preprocessing path bypasses
the `_cuda` package lazy-loading contract. Update the import in the code that
calls `jaccard_shared_counts` to go through `rapids_singlecell._cuda` instead of
importing `._jaccard_cuda` directly, so `_cuda.__getattr__` continues to control
“absent vs failed import” behavior consistently across the codebase.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/rapids_singlecell/_cuda/jaccard/jaccard.cu`:
- Around line 10-19: The jaccard shared-counts launch currently trusts n_obs and
k without validating that knn actually has n_obs * k elements, which can lead to
out-of-bounds GPU reads. Add a pre-launch size check in
launch_jaccard_shared_counts and the caller/binding that passes knn into
jaccard_shared_counts_kernel, using the existing n_obs/k contract to verify the
array length before launching. If the size does not match, fail fast with a
clear validation error rather than invoking the kernel.

In `@src/rapids_singlecell/_cuda/jaccard/kernels_jaccard.cuh`:
- Around line 33-34: The Jaccard kernel in the computation that writes to
jaccard_vals needs a zero-denominator guard and the stray empty statement should
be removed. Update the division logic so the result is only computed when the
denominator from the k/c terms is nonzero, and otherwise write a safe zero value
to keep NaN from propagating into the later mask and COO output. Make the fix in
the kernel body around jaccard_vals and keep the arithmetic path numerically
stable.

---

Nitpick comments:
In `@src/rapids_singlecell/preprocessing/_neighbors/_neighbors.py`:
- Around line 234-247: The Jaccard CUDA import in the neighbors preprocessing
path bypasses the `_cuda` package lazy-loading contract. Update the import in
the code that calls `jaccard_shared_counts` to go through
`rapids_singlecell._cuda` instead of importing `._jaccard_cuda` directly, so
`_cuda.__getattr__` continues to control “absent vs failed import” behavior
consistently across the codebase.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b86a86c0-4aeb-4dd1-be39-70e4edb49285

📥 Commits

Reviewing files that changed from the base of the PR and between 8840275 and 9f77a10.

📒 Files selected for processing (6)
  • CMakeLists.txt
  • docs/release-notes/0.16.0.md
  • src/rapids_singlecell/_cuda/__init__.py
  • src/rapids_singlecell/_cuda/jaccard/jaccard.cu
  • src/rapids_singlecell/_cuda/jaccard/kernels_jaccard.cuh
  • src/rapids_singlecell/preprocessing/_neighbors/_neighbors.py

Comment thread src/rapids_singlecell/_cuda/jaccard/jaccard.cu
Comment thread src/rapids_singlecell/_cuda/jaccard/kernels_jaccard.cuh Outdated
@codecov-commenter

codecov-commenter commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.30%. Comparing base (8840275) to head (564bf52).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #700      +/-   ##
==========================================
- Coverage   90.31%   90.30%   -0.01%     
==========================================
  Files         104      104              
  Lines        9570     9565       -5     
==========================================
- Hits         8643     8638       -5     
  Misses        927      927              
Files with missing lines Coverage Δ
src/rapids_singlecell/_cuda/__init__.py 65.00% <ø> (ø)
..._singlecell/preprocessing/_neighbors/_neighbors.py 98.64% <100.00%> (-0.09%) ⬇️

Signed-off-by: Intron7 <severin.dicks@icloud.com>
@Intron7 Intron7 enabled auto-merge (squash) June 28, 2026 13:44
@Intron7 Intron7 merged commit aed9008 into main Jun 28, 2026
19 checks passed
@Intron7 Intron7 deleted the fix-jaccard-scaling branch June 28, 2026 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants