Hi @const-ae,
first of all thank you for this amazing package and the straightforward usage of it for single cell RNA data.
I noticed a behaviour in the testing where I struggle to understand the underlying differences in the concept of the design matrix.
I was running: fit <- glmGamPoi::glm_gp(sce_object_pb, design = ~ annotation + condition + condition:annotation - 1) on a data set with multiple cell types with the goal to find the differentially regulated genes per cell type. Following the tutorial, I extracted the cell type specific DEGs by defining the contrast for each cell type in a loop:
contrast <- paste0( "cond(", annotation_col, "='", celltype, "', ",group_by, "='", grp, "') - ", "cond(", annotation_col, "='", celltype, "', ",group_by, "='", ident_ctrl, "')")
de_res <- glmGamPoi::test_de(fit, contrast = contrast)
This works perfectly and I was provided with a p-value for each of my genes for each cell type.
Now comes the part where I struggle to understand the difference: When I first subset my data by a cell type of interest and then run the fit with fit <- glmGamPoi::glm_gp(sce_object_sub, design = ~ condition .
I am presented with different (often lower) p-values for the genes in that cell type.
But conceptually shouldn't both approaches show me the DEGs in a cell type specific manner? Or does the first approach considers more than I am aware of based on the design I chose?
Many thanks in advance,
Mariano
Hi @const-ae,
first of all thank you for this amazing package and the straightforward usage of it for single cell RNA data.
I noticed a behaviour in the testing where I struggle to understand the underlying differences in the concept of the design matrix.
I was running:
fit <- glmGamPoi::glm_gp(sce_object_pb, design = ~ annotation + condition + condition:annotation - 1)on a data set with multiple cell types with the goal to find the differentially regulated genes per cell type. Following the tutorial, I extracted the cell type specific DEGs by defining the contrast for each cell type in a loop:This works perfectly and I was provided with a p-value for each of my genes for each cell type.
Now comes the part where I struggle to understand the difference: When I first subset my data by a cell type of interest and then run the fit with
fit <- glmGamPoi::glm_gp(sce_object_sub, design = ~ condition.I am presented with different (often lower) p-values for the genes in that cell type.
But conceptually shouldn't both approaches show me the DEGs in a cell type specific manner? Or does the first approach considers more than I am aware of based on the design I chose?
Many thanks in advance,
Mariano