In preprocessing,
|
def gini(x, input_sorted=True): |
will output nan when the input is all zeros. This is not desired. Imagine an exposure vector where there is only one outlier sample with nonzero exposure, and all the other samples have zero exposures. Then this outlier will not be captured by
remove_samples_based_on_gini, because of the nan issue. Care needs to be taken for these cases because not all cases like this indicate an outlier -- it depends on how big that nonzero exposure is.
In preprocessing,
MuSiCal/musical/preprocessing.py
Line 19 in a2d8c58
remove_samples_based_on_gini, because of the nan issue. Care needs to be taken for these cases because not all cases like this indicate an outlier -- it depends on how big that nonzero exposure is.