Rename fold_change column to log2_fold_change (deprecate alias)#74
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the package to version 0.2.2, renaming the fold_change metric to log2_fold_change and marking the original name as deprecated. The changes include documentation updates, internal logic adjustments to support both column names for backward compatibility, and the introduction of a deprecation warning. Feedback recommends updating docstring formulas to include the epsilon parameter for accuracy and using FutureWarning instead of DeprecationWarning to ensure users are properly notified of the upcoming breaking change.
| ``log2(target_mean / ref_mean)`` and ``percent_change`` is | ||
| ``(target_mean - ref_mean) / ref_mean``. The MWU ``p_value`` and | ||
| ``(target_mean - ref_mean) / ref_mean``. |
There was a problem hiding this comment.
The formulas for log2_fold_change and percent_change in the docstring should include the epsilon parameter to be accurate and consistent with the updated CLAUDE.md. Currently, they represent the case where epsilon=0, which may be misleading when a pseudocount is applied.
| ``log2(target_mean / ref_mean)`` and ``percent_change`` is | |
| ``(target_mean - ref_mean) / ref_mean``. The MWU ``p_value`` and | |
| ``(target_mean - ref_mean) / ref_mean``. | |
| ``log2((target_mean + epsilon) / (ref_mean + epsilon))`` and ``percent_change`` is | |
| ``(target_mean - ref_mean) / (ref_mean + epsilon)``. |
| ``fold_change`` is a **deprecated** alias for ``log2_fold_change`` | ||
| (identical values). It is retained for one release to ease migration | ||
| and will be removed in pdex 0.3.0. New code should read | ||
| ``log2_fold_change`` directly. A :class:`DeprecationWarning` is emitted |
There was a problem hiding this comment.
| "The `fold_change` column in pdex output is deprecated and will be " | ||
| "removed in pdex 0.3.0. Use `log2_fold_change` instead — it contains " | ||
| "the same values (`log2(target_mean / ref_mean)`).", | ||
| DeprecationWarning, |
There was a problem hiding this comment.
Consider using FutureWarning instead of DeprecationWarning. DeprecationWarning is filtered out by default in many non-interactive environments, whereas FutureWarning is intended for end-users to see upcoming breaking changes. Given the goal is to ensure users migrate before the 0.3.0 release, FutureWarning provides better visibility. Additionally, I've simplified the message to avoid the inaccurate formula when epsilon > 0.
| "The `fold_change` column in pdex output is deprecated and will be " | |
| "removed in pdex 0.3.0. Use `log2_fold_change` instead — it contains " | |
| "the same values (`log2(target_mean / ref_mean)`).", | |
| DeprecationWarning, | |
| "The `fold_change` column in pdex output is deprecated and will be " | |
| "removed in pdex 0.3.0. Use `log2_fold_change` instead — it contains " | |
| "the same values.", | |
| FutureWarning, |
Summary
The
fold_changecolumn in pdex output has heldlog2(target_mean / ref_mean)since 0.2.0 despite its name suggesting a linear ratio. This silently broke downstream consumers (notablycell-eval≤ 0.7.0,which double-logged the column and zeroed out roughly half of all DE values).
Discussion and decision: ArcInstitute/cell-eval#232. The agreed plan is a two-release deprecation:
log2_fold_changecolumn with the correct semantics; keepfold_changeas a duplicate alias for one release; emit aDeprecationWarningon everypdex(...)call.fold_changealias.Changes
src/pdex/_math.py— renamed the internalfold_changefunction tolog2_fold_change(the docstring already said "log2-fold change"; pure clarity rename).src/pdex/__init__.py— outputpl.DataFrames now carry bothfold_changeandlog2_fold_change(identical values);pdex()emits aDeprecationWarningdirecting callers to migrate.pyproject.toml— version bump0.2.1→0.2.2.README.md,CLAUDE.md— output-schema tables updated; column order matches the actual DataFrame.mode="ref"andmode="all") assertinglog2_fold_change == log2(target_mean / ref_mean)on finite rows.Migration