Skip to content

Which SE checkpoint was used to generate X_state for ST-SE-Tahoe? #269

@carloruggeri

Description

@carloruggeri

Hi Arc team,

I am using arcinstitute/ST-SE-Tahoe (from HF) for downstream inference/search experiments, and I am trying to determine which State Embedding checkpoint was used to generate the X_state embeddings used as input and target during ST-SE training.

From the public files, I can see that the few-shot ST-SE-Tahoe run uses:

  • embed_key: X_state
  • input_dim: 2058
  • output_dim: 2058
  • toml_config_path: /data/tahoe_se/generalization.toml

I checked config.yaml, version_0/hparams.yaml, var_dims.pkl, data_module.torch, and strings in checkpoints/final.ckpt, but I could not find the exact SE checkpoint/revision.

Could you clarify whether X_state for ST-SE-Tahoe was generated using:

  1. arcinstitute/SE-600M, and if so which file/revision, e.g. model.safetensors, se600m_epoch16.ckpt, se600m_epoch4.ckpt; or
  2. an internal/preprint SE checkpoint associated with Preprint-SE-167M-Human / SE-167M-Human; or
  3. another checkpoint?

This matters because I want to generate compatible X_state embeddings for new AnnData files and compare ST-SE predicted embeddings to SE target embeddings in the same space.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions