normalization without log can sometimes make really small deviations from the target value so it's not guaranteed to be all zero.
Instead we should just use a heuristic:
sum_rows(subset) -> exp1m -> mean -> check if >10M
Exponentiated non-log counts will grow massively and should be easily separable from about 10K-100K expectation
normalization without log can sometimes make really small deviations from the target value so it's not guaranteed to be all zero.
Instead we should just use a heuristic:
Exponentiated non-log counts will grow massively and should be easily separable from about 10K-100K expectation