Skip to content

feat: experimental vectorized and numba parallelized implementation#44

Merged
noamteyssier merged 14 commits into
ArcInstitute:mainfrom
drbh:main
Aug 15, 2025
Merged

feat: experimental vectorized and numba parallelized implementation#44
noamteyssier merged 14 commits into
ArcInstitute:mainfrom
drbh:main

Conversation

@drbh

@drbh drbh commented Aug 6, 2025

Copy link
Copy Markdown
Contributor

This PR contains a experimental implementation of parallel_differential_expression that uses numpy vectorization, numbda.prange and @njit to try to squeeze perf out of the CPU. With some empirical testing this sped up some operations by an order of magnitude.

The changes include a USE_EXPERIMENTAL env var to enable opt-in usage and transparently replace the parallel_differential_expression, and a new bench_expr.py that compares the reference with the experimental impl.

Running benches

uv run python -m pytest tests/bench_expr.py
Screenshot 2025-08-06 at 7 37 40 PM

current limitations: only the wilcoxon metric is implemented in parallel_differential_expression_vec

More realistic workload

In a slightly bigger example this reduces the compute time for a dataset of 100,000 cells, 18,080 genes and 150 perturbations from ~5 mins to ~25 seconds on my MacBook M3.

**(ref is using num_workers=16 and batch_size=100)

uv run compare.py
============================================================
Benchmarking with 100000 cells, 18080 genes, 150 perturbations
============================================================

1. Reference implementation (batch processing):
INFO:pdex._single_cell:Precomputing masks for each target gene
Identifying target masks: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 151/151 [00:00<00:00, 451.40it/s]
INFO:pdex._single_cell:Precomputing variable indices for each feature
Identifying variable indices: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18080/18080 [00:00<00:00, 7455807.33it/s]
INFO:pdex._single_cell:Creating shared memory memory matrix for parallel computing
INFO:pdex._single_cell:Creating generator of all combinations: N=2730080
INFO:pdex._single_cell:Creating generator of all batches: N=27301
INFO:pdex._single_cell:Initializing parallel processing pool
INFO:pdex._single_cell:Processing batches
Processing batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 27301/27301 [04:50<00:00, 94.10it/s]
INFO:pdex._single_cell:Flattening results
INFO:pdex._single_cell:Closing shared memory pool
   Time: 299.028 seconds

2. Vectorized implementation:
INFO:pdex._single_cell:vectorized processing: 151 targets, 18080 genes
INFO:pdex._single_cell:Processing 150 targets
Processing targets: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:19<00:00,  7.59it/s]
   Time: 25.581 seconds
   Speedup: 11.7x

============================================================
Correctness Verification:
============================================================
✅ vec: Column 'target_mean' values match within 1e-06 tolerance
✅ vec: Column 'reference_mean' values match within 1e-06 tolerance
✅ vec: Column 'percent_change' values match within 0.01 tolerance
✅ vec: Column 'fold_change' values match within 1e-06 tolerance
✅ vec: Results match reference

============================================================
Performance Summary:
============================================================
Implementation                 Time (s)     Speedup
----------------------------------------------------
reference                      299.028      1.0       x
vec                            25.581       11.7      x

@noamteyssier

Copy link
Copy Markdown
Collaborator

This is awesome, thanks @drbh !

I’ll do some testing and try to get this merged asap.

@noamteyssier noamteyssier merged commit eef6f3c into ArcInstitute:main Aug 15, 2025
@noamteyssier

Copy link
Copy Markdown
Collaborator

thanks for the PR @drbh !

I'm going to test this more in a few different contexts and will eventually just make this the stable execution path for wilcoxon.

cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants