Skip to content

Error in handle_design_parameter(design, data, col_data, reference_level) : Too few replicates / too many coefficients to fit model. #42

@sopenaml

Description

@sopenaml

Hi Constantin,

Thank you very much for glmGamPoi! I've been trying to get it work on a dataset I have, where I have 9 clusters for 2 genotypes, 3 replicate for each genotype. I have tried to run:

fit <- glm_gp( bm.mat_subset,
               design = ~ genotype + seurat_clusters + genotype:seurat_clusters - 1,
               col_data = metadat,
               subsample = TRUE,
               on_disk = FALSE,
               reference_level = "WT" )

c0 <- test_de(fit,  
                  contrast = genotypeKO - genotypeWT,
                  pseudobulk_by = mouse , 
         sort_by = pval, decreasing = FALSE)

Error in handle_design_parameter(design, data, col_data, reference_level) : 
  The model_matrix has more columns (18) than the there are samples in the data matrix (6 columns).
Too few replicates / too many coefficients to fit model.
The head of the design matrix: 
     genotypeWT genotypeKO seurat_clusters1 seurat_clusters2 seurat_clusters3 seurat_clusters4 seurat_clusters5 seurat_clusters6 seurat_clusters7 seurat_clusters8 

I've looked at your example and I get the same error if I don't pre-filter the data to few clusters (NK cells, B cells and T cells), as it's done in the example. The resulting fit has 16 coeficients and the data has 16 samples ( ind + stim) so it produces the same error too few replicates/too many coeficients to fit the model. Is this a bug of do we always have to prefilter the data so the number of coeficients is less than the number of samples you "pseudobulk_by"?

would you say that doing the following would be a good way to overcome the issues above?

de_res <- test_de(fit, contrast = `stimstim` + `cellCD4 T cells:stimstim`, 
                  pseudobulk_by = paste(stim,  ind,  cell, sep="_" )) 

Thank you very much for your help

Miriam


sce_subset <- sce[rowSums(counts(sce)) > 100, 
                  sample(which(! is.na(sce$cell)), 1000)]
 counts(sce_subset) <- as.matrix(counts(sce_subset))
 sce_subset$cell <- droplevels(sce_subset$cell)
fit <- glm_gp(sce_subset, design = ~ cell + stim +  stim:cell - 1,
              reference_level = "NK cells")
fit
glmGamPoiFit object:
The data had 9727 rows and 1000 columns.
A model with 16 coefficient was fitted.
> de_res <- test_de(fit, contrast = `stimstim` + `cellCD4 T cells:stimstim`, 
                  pseudobulk_by = paste0(stim, "-", ind)) 

Error in handle_design_parameter(design, data, col_data, reference_level) : 
  The model_matrix has more columns (16) than the there are samples in the data matrix (16 columns).
Too few replicates / too many coefficients to fit model.
The head of the design matrix: 
          cellNK cells cellB cells cellCD14+ Monocytes cellCD4 T cells cellCD8 T cells cellDendritic cells cellFCGR3A+ Monocytes cellMegakaryocytes stimstim cellB cells:stimstim cellCD14+ Monocytes:stimstim cellCD4 T cells:stimstim cellCD8 T cells:stimstim cellDendritic cells:stimstim cellFCGR3A+ Monocytes:stimstim cellMegakaryocytes:stimstim
 ctrl-101   0.05714286  0.11428571           0.4000000       0.2285714      0.14285714          0.00000000            0.02857143         0.02857143        0                    0                            0                        0                        0                            0                              0                           0
ctrl-1015   0.04761905  0.20000000           0.2761905       0.3333333      0.03809524          0.01904762      

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions