Skip to content
Open
Show file tree
Hide file tree
Changes from 69 commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
c5591c8
feat: allow exponentiation post agg for log-fold-change
ilan-gold Apr 7, 2026
7aca4b3
feat: add `illico`
ilan-gold Apr 7, 2026
c2f3738
Merge branch 'main' into ig/exp_post_agg
ilan-gold Apr 7, 2026
c80958a
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 7, 2026
72318fb
fix: bump numba
ilan-gold Apr 7, 2026
97b4f7c
Merge branch 'ig/illico' of github.com:scverse/scanpy into ig/illico
ilan-gold Apr 7, 2026
b9c8257
chore: probably not either
ilan-gold Apr 7, 2026
5394d2b
chore: now pandas
ilan-gold Apr 7, 2026
8928dfd
fix: anndata
ilan-gold Apr 7, 2026
897a646
fix: just stable then
ilan-gold Apr 7, 2026
af1f523
fix: pin rc
ilan-gold Apr 13, 2026
40d5946
fix: agg name
ilan-gold Apr 13, 2026
74b6d87
fix: only consider scores and pvals
ilan-gold Apr 13, 2026
8352445
chore: p values and z scores only
ilan-gold Apr 13, 2026
1cad431
fix: point an low-vers safe version
ilan-gold Apr 13, 2026
f0d78b4
Merge branch 'main' into ig/exp_post_agg
ilan-gold Apr 13, 2026
d9ad811
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 13, 2026
c83f82b
fix: make a copy of vectors for writing
ilan-gold Apr 14, 2026
8f201ba
Merge branch 'ig/illico' of github.com:scverse/scanpy into ig/illico
ilan-gold Apr 14, 2026
3ca2957
fix: remove warning filter + `use_rust`
ilan-gold Apr 14, 2026
0bf0976
Merge branch 'main' into ig/exp_post_agg
ilan-gold Apr 14, 2026
a375d31
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 14, 2026
d0f0db0
chore: add explicit test
ilan-gold Apr 14, 2026
e1c43a1
Merge branch 'ig/exp_post_agg' of github.com:scverse/scanpy into ig/e…
ilan-gold Apr 14, 2026
53a2f74
chore: relnote
ilan-gold Apr 14, 2026
5e2ee5e
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 15, 2026
fa454d7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 15, 2026
ee037e6
fix: clarify usage of categories
ilan-gold Apr 15, 2026
6557bcf
fix: order
ilan-gold Apr 15, 2026
9299b23
Merge branch 'ig/illico' of github.com:scverse/scanpy into ig/illico
ilan-gold Apr 15, 2026
480a57a
fix: decrease absolute tolerance
ilan-gold Apr 15, 2026
c755a14
fix: `rest` instead of `None`
ilan-gold Apr 16, 2026
ac76f90
fix: respect `groups` arg
ilan-gold Apr 16, 2026
259d8f3
chore: add note
ilan-gold Apr 16, 2026
7373f55
move comment to correct location
flying-sheep Apr 16, 2026
739fffb
add explanation
flying-sheep Apr 16, 2026
db85c7c
typo
flying-sheep Apr 16, 2026
92a2953
fix type
flying-sheep Apr 16, 2026
f66ca68
note
flying-sheep Apr 16, 2026
9a62053
ternary
flying-sheep Apr 16, 2026
9f16597
Merge branch 'main' into ig/exp_post_agg
ilan-gold Apr 16, 2026
0b319b8
fix: LFC unfiorm
ilan-gold Apr 17, 2026
3971219
Merge branch 'main' into ig/exp_post_agg
ilan-gold Apr 17, 2026
585aa87
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 17, 2026
3dac202
fix; dont make list multiple times
ilan-gold Apr 17, 2026
7fcdac7
Merge branch 'main' into ig/exp_post_agg
flying-sheep Apr 17, 2026
ef5ead2
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 20, 2026
023aa0f
illico bound
ilan-gold Apr 21, 2026
9fe02a3
fix: re-disallow direct references
ilan-gold Apr 21, 2026
6aca9ca
chore: `mean_in_log_space` instead of `exp_post_agg`
ilan-gold Apr 22, 2026
0437475
Merge branch 'main' into ig/exp_post_agg
ilan-gold Apr 22, 2026
d15aede
docs: use upstream APIs (#4083)
flying-sheep Apr 23, 2026
d6f2c51
perf: Combat perf improvements (#4070)
ilaykav Apr 24, 2026
44cfc6e
docs: clarify method vs transformer in sc.pp.neighbors (#4079)
CuiweiG Apr 24, 2026
b448347
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 24, 2026
2b60935
fix: limit Numba threads in Wilcoxon path of rank_genes_groups (#4082)
JhonatanFelix Apr 24, 2026
3f902dd
Merge branch 'main' into ig/exp_post_agg
ilan-gold Apr 24, 2026
2467433
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 24, 2026
d1aa5fb
Merge branch 'main' into ig/exp_post_agg
ilan-gold Apr 27, 2026
2585d2c
Update pyproject.toml
ilan-gold Apr 27, 2026
6a1f917
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 27, 2026
152c344
Merge branch 'main' into ig/exp_post_agg
ilan-gold Apr 27, 2026
c24ebd0
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 27, 2026
e13cb6a
fix: mean_in_log_space pass through
ilan-gold Apr 28, 2026
62e6c87
fix: no defaults internall on `_compute_statistics`
ilan-gold Apr 28, 2026
bd5f675
Merge branch 'main' into ig/exp_post_agg
ilan-gold Apr 28, 2026
a9d9f0a
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Apr 28, 2026
aac3189
Merge branch 'main' into ig/exp_post_agg
ilan-gold May 4, 2026
487be58
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold May 4, 2026
977e254
Zb/illico fixes (#4102)
zboldyga May 11, 2026
36c5039
Merge branch 'main' into ig/exp_post_agg
ilan-gold May 16, 2026
36acd5b
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold May 18, 2026
9678e40
Merge branch 'main' into ig/exp_post_agg
ilan-gold Jun 8, 2026
168de8b
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Jun 16, 2026
c1f0fbc
Merge branch 'main' into ig/exp_post_agg
ilan-gold Jun 16, 2026
5cc6647
Merge branch 'ig/exp_post_agg' into ig/illico
ilan-gold Jun 16, 2026
7ab15f9
feat: use `groups` argument
ilan-gold Jun 16, 2026
3d19c95
Merge branch 'main' into ig/illico
ilan-gold Jun 16, 2026
2363ee7
chore: p-value alteration
ilan-gold Jun 22, 2026
b522c71
Update tests/test_rank_genes_groups.py
ilan-gold Jun 29, 2026
dbbe64d
fix: address comments
ilan-gold Jun 29, 2026
4216aad
Merge branch 'main' into ig/illico
ilan-gold Jun 29, 2026
98e3d71
test
ilan-gold Jun 30, 2026
264c683
Merge branch 'ig/illico' of github.com:scverse/scanpy into ig/illico
ilan-gold Jun 30, 2026
c99dd41
Merge branch 'main' into ig/illico
ilan-gold Jun 30, 2026
0c33493
Merge branch 'main' into ig/illico
ilan-gold Jul 1, 2026
59b9119
Merge branch 'main' into ig/illico
ilan-gold Jul 2, 2026
192d3ea
Merge branch 'main' into ig/illico
ilan-gold Jul 2, 2026
4479fc0
feat: illico as default v2
ilan-gold Jul 2, 2026
1492048
fix: no illico 0.6.0
ilan-gold Jul 2, 2026
e808e69
pin illico
ilan-gold Jul 3, 2026
ef18687
chore: relnote
ilan-gold Jul 3, 2026
a623d40
Merge branch 'main' into ig/illico
ilan-gold Jul 3, 2026
0940df2
intersphinx
ilan-gold Jul 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/release-notes/4037.feat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add `mean_in_log_space` argument to {func}`scanpy.tl.rank_genes_groups` for customizing how log-fold-change is calculated {user}`ilan-gold`
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ classifiers = [
]
dynamic = [ "version" ]
dependencies = [
"anndata>=0.10.8",
"anndata>=0.11",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which feature does this PR need?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from illico but it seemed as good a time as any to require this. I will split this out into a separate PR though, fair point, there are probably version checks floating around our code base

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason for illico to require 0.11. I can lower the lower bound to 0.10.8 in the next release (v0.6.0).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, really, We should upgrade to 0.11 anyway for the next scanpy minor release. I'd rather keep the community moving on this front.

"certifi",
"fast-array-utils[accel,sparse]>=1.4",
"h5py>=3.11",
Expand Down Expand Up @@ -97,6 +97,7 @@ scanorama = [ "scanorama" ]
scrublet = [ "scikit-image>=0.23.1" ]
# highly_variable_genes method 'seurat_v3'
skmisc = [ "scikit-misc>=0.5.1" ]
illico = [ "illico>=0.5.1" ]
scanpy2 = [ "igraph>=0.10.8", "scikit-misc>=0.5.1" ]

[dependency-groups]
Expand All @@ -107,6 +108,7 @@ dev = [
test = [
"scanpy[dask-ml]",
"scanpy[dask]",
"scanpy[illico]",
"scanpy[leiden]",
"scanpy[plotting]",
"scanpy[scrublet]",
Expand Down
11 changes: 8 additions & 3 deletions src/scanpy/_settings/presets.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@
]


type DETest = Literal["logreg", "t-test", "wilcoxon", "t-test_overestim_var"]
type DETest = Literal[
"logreg", "t-test", "wilcoxon", "wilcoxon_illico", "t-test_overestim_var"
]
type HVGFlavor = Literal["seurat", "cell_ranger", "seurat_v3", "seurat_v3_paper"]
type LeidenFlavor = Literal["leidenalg", "igraph"]

Expand Down Expand Up @@ -81,6 +83,7 @@ class PcaPreset(NamedTuple):
class RankGenesGroupsPreset(NamedTuple):
method: DETest
mask_var: str | None
mean_in_log_space: bool


class ScalePreset(NamedTuple):
Expand Down Expand Up @@ -185,9 +188,11 @@ def pca() -> Mapping[Preset, PcaPreset]:
def rank_genes_groups() -> Mapping[Preset, RankGenesGroupsPreset]:
"""Correlation method for :func:`~scanpy.tl.rank_genes_groups`."""
return {
Preset.ScanpyV1: RankGenesGroupsPreset(method="t-test", mask_var=None),
Preset.ScanpyV1: RankGenesGroupsPreset(
method="t-test", mask_var=None, mean_in_log_space=True
),
Preset.ScanpyV2Preview: RankGenesGroupsPreset(
method="wilcoxon", mask_var=None
method="wilcoxon", mask_var=None, mean_in_log_space=False
),
}

Expand Down
116 changes: 97 additions & 19 deletions src/scanpy/tools/_rank_genes_groups.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import numba
import numpy as np
import pandas as pd
from anndata import AnnData
from fast_array_utils.numba import njit
from fast_array_utils.stats import mean_var
from scipy import sparse
Expand All @@ -28,7 +29,6 @@
from collections.abc import Generator, Iterable
from typing import Literal

from anndata import AnnData
from numpy.typing import NDArray


Expand Down Expand Up @@ -141,6 +141,7 @@ def __init__(
self.expm1_func = lambda x: np.expm1(x * np.log(base))
else:
self.expm1_func = np.expm1
self.group_col = adata.obs[groupby].array

self.groups_order, self.groups_masks_obs = _utils.select_groups(
adata, groups, groupby
Expand Down Expand Up @@ -202,7 +203,7 @@ def __init__(
self.grouping_mask = adata.obs[groupby].isin(self.groups_order)
self.grouping = adata.obs.loc[self.grouping_mask, groupby]

def _basic_stats(self) -> None:
def _basic_stats(self, *, exponentiate_values: bool = False) -> None:
"""Set self.{means,vars,pts}{,_rest} depending on X."""
n_genes = self.X.shape[1]
n_groups = self.groups_masks_obs.shape[0]
Expand All @@ -218,6 +219,8 @@ def _basic_stats(self) -> None:
else:
mask_rest = self.groups_masks_obs[self.ireference]
x_rest = self.X[mask_rest]
if exponentiate_values:
x_rest = self.expm1_func(x_rest)
self.means[self.ireference], self.vars[self.ireference] = mean_var(
x_rest, axis=0, correction=1
)
Expand All @@ -231,6 +234,8 @@ def _basic_stats(self) -> None:

for group_index, mask_obs in enumerate(self.groups_masks_obs):
x_mask = self.X[mask_obs]
if exponentiate_values:
x_mask = self.expm1_func(x_mask)

if self.comp_pts:
self.pts[group_index] = get_nonzeros(x_mask) / x_mask.shape[0]
Expand All @@ -245,6 +250,8 @@ def _basic_stats(self) -> None:
if self.ireference is None:
mask_rest = ~mask_obs
x_rest = self.X[mask_rest]
if exponentiate_values:
x_rest = self.expm1_func(x_rest)
(
self.means_rest[group_index],
self.vars_rest[group_index],
Expand All @@ -260,8 +267,6 @@ def t_test(
) -> Generator[tuple[int, NDArray[np.floating], NDArray[np.floating]], None, None]:
from scipy import stats

self._basic_stats()

for group_index, (mask_obs, mean_group, var_group) in enumerate(
zip(self.groups_masks_obs, self.means, self.vars, strict=True)
):
Expand Down Expand Up @@ -313,8 +318,6 @@ def wilcoxon(
) -> Generator[tuple[int, NDArray[np.floating], NDArray[np.floating]], None, None]:
from scipy import stats

self._basic_stats()

n_genes = self.X.shape[1]
# First loop: Loop over all genes
if self.ireference is not None:
Expand Down Expand Up @@ -422,27 +425,88 @@ def logreg(
if len(self.groups_order) <= 2:
break

def compute_statistics( # noqa: PLR0912
def compute_statistics( # noqa: PLR0912, PLR0915
self,
method: DETest,
*,
corr_method: _CorrMethod = "benjamini-hochberg",
n_genes_user: int | None = None,
rankby_abs: bool = False,
tie_correct: bool = False,
corr_method: _CorrMethod,
n_genes_user: int | None,
rankby_abs: bool,
tie_correct: bool,
mean_in_log_space: bool,
**kwds,
) -> None:
if method in {"t-test", "t-test_overestim_var"}:
self._basic_stats(exponentiate_values=False)
generate_test_results = self.t_test(method)
elif method == "wilcoxon":
generate_test_results = self.wilcoxon(tie_correct=tie_correct)
if not mean_in_log_space:
# If we are not exponentiating after the mean aggregation, we need to recalculate the stats.
self._basic_stats(exponentiate_values=True)
elif "wilcoxon" in method:
if "illico" in method:
from illico import asymptotic_wilcoxon

illico_df = asymptotic_wilcoxon(
AnnData(
X=self.X,
var=pd.DataFrame(index=self.var_names),
obs=pd.DataFrame(
index=pd.RangeIndex(self.X.shape[0]).astype("str"),
# This self.group_col means illico will run tests against *all* data
# instead of what's in self.groups_order as controlled by the `groups` arg.
# TODO: Only run the subset once illico supports a `groups` argument
data={"group": self.group_col},
),
),
reference=self.groups_order[self.ireference]
if self.ireference is not None
else None,
group_keys="group",
return_as_scanpy=False,
is_log1p=True,
tie_correct=tie_correct,
use_continuity=False,
alternative="two-sided",
use_rust=False,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilan-gold curious why this is hardcoded?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what @remydubois said on zulip: https://scverse.zulipchat.com/#narrow/channel/315570-random/topic/Article.20on.20speeding.20up.20computation/near/581587388 it is basically not worth the headache. Furthermore, if we were to ever adopt the codebase, it would be the numba version, not the rust version. His observations about Rust largely match ours (the performance is almost always identical to numba)

)
# Generate a lookup of category -> result excluding the refernece if it is present.
generate_test_results_map = {
group_cat: (
group["z_score"].to_numpy(copy=True),
group["p_value"].to_numpy(copy=True),
)
for (_, group) in illico_df.groupby(level="pert")
if (
group_cat := np.unique(
group.index.get_level_values("pert").to_numpy(copy=True)
).item()
)
!= (
None
if self.ireference is None
else self.groups_order[self.ireference]
)
}
# Create the iterator that is expected by the other method-branches.
groups_order_list = self.groups_order.tolist()
generate_test_results = (
(
groups_order_list.index(group_cat),
*generate_test_results_map[group_cat],
)
for group_cat in self.groups_order
if group_cat in generate_test_results_map
)
else:
generate_test_results = self.wilcoxon(tie_correct=tie_correct)
# If we're not exponentiating after the mean aggregation, then do it now.
self._basic_stats(exponentiate_values=not mean_in_log_space)
elif method == "logreg":
generate_test_results = self.logreg(**kwds)

self.stats = None

n_genes = self.X.shape[1]

for group_index, scores, pvals in generate_test_results:
group_name = str(self.groups_order[group_index])

Expand Down Expand Up @@ -482,9 +546,12 @@ def compute_statistics( # noqa: PLR0912
mean_rest = self.means_rest[group_index]
else:
mean_rest = self.means[self.ireference]
foldchanges = (self.expm1_func(mean_group) + 1e-9) / (
self.expm1_func(mean_rest) + 1e-9
) # add small value to remove 0's
foldchanges = (
(self.expm1_func(mean_group) + 1e-9)
/ (self.expm1_func(mean_rest) + 1e-9)
if mean_in_log_space
else (mean_group + 1e-9) / (mean_rest + 1e-9)
) # add small value to avoid zeros
self.stats[group_name, "logfoldchanges"] = np.log2(
foldchanges[global_indices]
)
Expand Down Expand Up @@ -512,9 +579,12 @@ def rank_genes_groups( # noqa: PLR0912, PLR0913, PLR0915
corr_method: _CorrMethod = "benjamini-hochberg",
tie_correct: bool = False,
layer: str | None = None,
mean_in_log_space: bool | Default = Default(
preset=("rank_genes_groups", "mean_in_log_space")
),
**kwds,
) -> AnnData | None:
"""Rank genes for characterizing groups.
r"""Rank genes for characterizing groups.

Expects logarithmized data.

Expand Down Expand Up @@ -575,6 +645,11 @@ def rank_genes_groups( # noqa: PLR0912, PLR0913, PLR0915
The key in `adata.uns` information is saved to.
copy
Whether to copy `adata` or modify it inplace.
mean_in_log_space
Whether to do :math:`\log(\operatorname{mean}(e^x))` (`False`)
or :math:`\log(e^{\operatorname{mean}(x)})` (`True`).
The former is accurate, while the latter is a faster approximation
that underestimates this accurate result in the presence of many outliers.
kwds
Are passed to test methods. Currently this affects only parameters that
are passed to :class:`sklearn.linear_model.LogisticRegression`.
Expand All @@ -597,7 +672,7 @@ def rank_genes_groups( # noqa: PLR0912, PLR0913, PLR0915
Structured array to be indexed by group id storing the log2
fold change for each gene for each group. Ordered according to
scores. Only provided if method is 't-test' like.
Note: this is an approximation calculated from mean-log values.
Note: if `mean_in_log_space=True`, this is an approximation calculated from mean-log values.
`adata.uns['rank_genes_groups' | key_added]['pvals']` : structured :class:`numpy.ndarray` (dtype `float`)
p-values.
`adata.uns['rank_genes_groups' | key_added]['pvals_adj']` : structured :class:`numpy.ndarray` (dtype `float`)
Expand Down Expand Up @@ -627,6 +702,8 @@ def rank_genes_groups( # noqa: PLR0912, PLR0913, PLR0915

if isinstance(mask_var, Default):
mask_var = settings.preset.rank_genes_groups.mask_var
if isinstance(mean_in_log_space, Default):
mean_in_log_space = settings.preset.rank_genes_groups.mean_in_log_space
if method is None or isinstance(method, Default):
method = settings.preset.rank_genes_groups.method

Expand Down Expand Up @@ -716,6 +793,7 @@ def rank_genes_groups( # noqa: PLR0912, PLR0913, PLR0915
n_genes_user=n_genes_user,
rankby_abs=rankby_abs,
tie_correct=tie_correct,
mean_in_log_space=mean_in_log_space,
**kwds,
)

Expand Down
1 change: 1 addition & 0 deletions src/testing/scanpy/_pytest/marks.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ def _generate_next_value_(
skimage = "scikit-image"
skmisc = "scikit-misc"
zarr = auto()
illico = auto()
# external
bbknn = auto()
harmony = "harmonyTS"
Expand Down
Loading
Loading