Skip to content

How are sequence, structure, and text files matched in FAISS index? #16

Description

@ShenAoAO

Hi, thanks for releasing this great work!

I’m currently exploring the FAISS index at:

faiss_index/SwissProt/ProTrek_650M_UniRef50/

Inside this directory, I noticed that:

  • sequence/ids.tsv contains UniProt IDs, and each line corresponds to a protein sequence.
  • Similarly, structure/ids.tsv also contains UniProt IDs for protein structures.
  • There’s also a text/ folder, which seems to contain textual annotations.

My question is:
How are these three parts (sequence, structure, and text) aligned with each other?
Is the matching done through a pointer (e.g.,ids.tsv.pointer.npy)?

I tried checking the correspondence by comparing line indices — for example, line 0 in sequence/ids.tsv vs. line 0 in text/ids.tsv — but they don’t seem to match.

Could you please clarify:

  1. How to correctly align entries between sequence, structure, and text?
  2. If a mapping file or pointer is used, where can I find it?

Thanks a lot for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions