Skip to content

Where can I find the exact 2000 HVG gene list for ST-HVG-Tahoe? #268

@Shoichi-eisei

Description

@Shoichi-eisei

Dear Arc Institute STATE team,

I am currently trying to use ST-HVG-Tahoe for fine-tuning and external validation with DILImap data.

I would like to know where I can find the exact 2000 HVG gene list corresponding to obsm["X_hvg"] in ST-HVG-Tahoe.

I checked both the few-shot and zero-shot model files, including:

  • config.yaml
  • var_dims.pkl
  • data_module.torch
  • hparams.yaml
  • generalization.toml
  • batch_onehot_map.pkl
  • cell_type_onehot_map.pkl
  • pert_onehot_map.pt
  • best.ckpt, final.ckpt, and last.ckpt
  • evaluation adata_real.h5ad, adata_pred.h5ad, and real_de.csv files

All model files indicate that input_dim, hvg_dim, and output_dim are 2000, and that embed_key is X_hvg. However, gene_names contains 62,710 genes, and I could not find a length-2000 gene name list. In the evaluation h5ad and DE csv files, the features appear to be stored only as indices 0–1999.

Could you please let me know where the exact X_hvg feature gene list and order can be found?

I need this information to align external datasets such as DILImap for fine-tuning and DEG overlap analysis.

Thank you very much for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions