Skip to content

Question regarding the difference of running glmGamPoi on a subset of data or on the whole data. #72

@MarianoRuzJurado

Description

@MarianoRuzJurado

Hi @const-ae,

first of all thank you for this amazing package and the straightforward usage of it for single cell RNA data.
I noticed a behaviour in the testing where I struggle to understand the underlying differences in the concept of the design matrix.

I was running: fit <- glmGamPoi::glm_gp(sce_object_pb, design = ~ annotation + condition + condition:annotation - 1) on a data set with multiple cell types with the goal to find the differentially regulated genes per cell type. Following the tutorial, I extracted the cell type specific DEGs by defining the contrast for each cell type in a loop:

contrast <- paste0( "cond(", annotation_col, "='", celltype, "', ",group_by, "='", grp, "') - ", "cond(", annotation_col, "='", celltype, "', ",group_by, "='", ident_ctrl, "')")
de_res <- glmGamPoi::test_de(fit, contrast = contrast)

This works perfectly and I was provided with a p-value for each of my genes for each cell type.
Now comes the part where I struggle to understand the difference: When I first subset my data by a cell type of interest and then run the fit with fit <- glmGamPoi::glm_gp(sce_object_sub, design = ~ condition .
I am presented with different (often lower) p-values for the genes in that cell type.

But conceptually shouldn't both approaches show me the DEGs in a cell type specific manner? Or does the first approach considers more than I am aware of based on the design I chose?

Many thanks in advance,
Mariano

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions