Skip to content

BUG in the logarithm detection algorithm - unreliable DES score #55

@geroldcsendes

Description

@geroldcsendes

The issue originates from the guess_is_log function of pdex. This function checks whether the sum of preprocessed counts is below 15, under the assumption that e¹⁵ − 1 ≈ 3.26M counts per cell is an unlikely value.

However, the function incorrectly assumes that the sum of counts is log-transformed, whereas in reality, each gene’s count is log-transformed individually.

Example:
Assume your median UMI count per cell is 10k (the challenge uses 50k+). If a cell has 1 count for each gene and is median-normalized, you get 1 normalized count per gene. Applying log1p yields 0.69 for each gene. Summing these up—as guess_is_log does—returns 6931, which is (incorrectly) detected as non-log-transformed data. Consequently, your log-transformed data is treated as non-log-transformed. Even if you submit count data through cell-eval, as cell-eval correctly guesses it is on count level (it uses its own function) and log-normalizes it and pushes it to so to pdex.

Impact:
This will almost certainly distort fold-change estimates—and, crucially, their rankings. This is particularly harmful when your model detects more DEGs than the ground truth, as the top N DEGs (ranked by absolute fold change) will be misordered. Such models are unfairly penalized in the DES score due to this bug.

This problem was reported in previous issues:

We identify that the guess_is_log function is not just not optimal but rather totally inappropriate. Also, this is especially a problem for the VCC challenge, where the cell-eval is used for evaluation and one has no way of directly specifying pdex-kwargs because the submissions are evaluated on the server.

Thanks a lot for looking into our bug report. We will report this to cell-eval as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions