Skip to content

Gini coefficient calculation does not deal with all-zero inputs #86

@Hu-JIN

Description

@Hu-JIN

In preprocessing,

def gini(x, input_sorted=True):
will output nan when the input is all zeros. This is not desired. Imagine an exposure vector where there is only one outlier sample with nonzero exposure, and all the other samples have zero exposures. Then this outlier will not be captured by remove_samples_based_on_gini, because of the nan issue. Care needs to be taken for these cases because not all cases like this indicate an outlier -- it depends on how big that nonzero exposure is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions